PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 15 days ago • 42
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 12 days ago • 88
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 10 days ago • 279
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 10 days ago • 62
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 5 days ago • 22
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 4 days ago • 25
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 4 days ago • 53
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 3 days ago • 38
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 16 days ago • 66
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 19 days ago • 89
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 18 days ago • 271
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 23 days ago • 87