ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 25 days ago • 72
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Paper • 1610.02424 • Published Oct 7, 2016 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 17 days ago • 116
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 15 days ago • 112
Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 7
Self-Consistency Improves Chain of Thought Reasoning in Language Models Paper • 2203.11171 • Published Mar 21, 2022 • 3
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 12 days ago • 197
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21, 2024 • 59
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published Oct 7, 2024 • 31
Critique-out-Loud Reward Models Collection Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5, 2024 • 3
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23, 2024 • 13
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19, 2024 • 75
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13, 2024 • 21
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 56