Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 7 days ago • 44
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3, 2024 • 96
view article Article Train 400x faster Static Embedding Models with Sentence Transformers 6 days ago • 113
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model Paper • 2501.01028 • Published 19 days ago • 12
view article Article Recipe: Preparing Multilingual Speech Datasets for TTS Training By PHBJT • Nov 4, 2024 • 15
view article Article Deploying Language Models on Azure Kubernetes: A Complete Beginner's Guide By vpkprasanna • Nov 11, 2024 • 3
view article Article Unlocking the Power of Reasoning: Introducing CriticalThinker-LLaMA-3.1-8B-GGUF and Its Groundbreaking Dataset By theeseus-ai • 25 days ago • 1
view article Article Fine-tune a SmolLM on domain-specific synthetic data from a LLM By davidberenstein1957 • 18 days ago • 31
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching Paper • 2311.11284 • Published Nov 19, 2023 • 17
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 125
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 55
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 108
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models Paper • 2407.03181 • Published Jul 3, 2024 • 1
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17, 2024 • 37