Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published Sep 27, 2024 • 28
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published Nov 25, 2024 • 37
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 16 days ago • 36
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 13 days ago • 34