Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation Jun 20, 2024 • 12
Synthetic dataset generation techniques: generating custom sentence similarity data May 23, 2024 • 16
Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 • 71
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 28
Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub Aug 2, 2023 • 1
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien • 9 days ago • 10
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 14 days ago • 113
Granite 3.1 Language Models Collection A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 8 items • Updated 14 days ago • 41
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 13 days ago • 107
Hf-native ColVision Models Collection Models that can be used with the native transformers 🤗 implementation instead of colpali-engine. • 2 items • Updated 24 days ago • 2
OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages Paper • 2412.09587 • Published 20 days ago • 3
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 19 days ago • 120
Open Image Preferences Collection Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated 13 days ago • 6
ShieldGemma: Generative AI Content Moderation Based on Gemma Paper • 2407.21772 • Published Jul 31, 2024 • 14
On Limitations of LLM as Annotator for Low Resource Languages Paper • 2411.17637 • Published Nov 26, 2024 • 2
view article Article Use Models from the Hugging Face Hub in LM Studio By yagilb • Nov 28, 2024 • 127
view article Article Fine-Tuning 1B LLaMA 3.2: A Comprehensive Step-by-Step Guide with Code By ImranzamanML • Oct 2, 2024 • 42
view article Article Let’s make a generation of amazing image generation models By burtenshaw • Nov 26, 2024 • 33
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled • Oct 14, 2024 • 61