Top papers ⭐
updated
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
259
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
232
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
124
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
139
Paper
•
2412.08905
•
Published
•
103
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
88
Paper
•
2412.15115
•
Published
•
340