Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 58
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 48
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21, 2024 • 19
view article Article MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR By abhinand • Oct 20, 2024 • 34
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled • Oct 14, 2024 • 61
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9, 2024 • 46
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention Aug 21, 2024 • 25
Synthesizing Text-to-SQL Data from Weak and Strong LLMs Paper • 2408.03256 • Published Aug 6, 2024 • 11
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1, 2024 • 24
ShieldGemma Release Collection A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications. • 3 items • Updated 24 days ago • 11
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19, 2024 • 39
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated about 11 hours ago • 60
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Paper • 2407.14482 • Published Jul 19, 2024 • 26
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21, 2024 • 69