LLM paper - a MinakamiYuki Collection

MinakamiYuki 's Collections

LLM paper

updated 8 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 136
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Paper • 2409.18943 • Published Sep 27, 2024 • 28
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Paper • 2411.16594 • Published Nov 25, 2024 • 37
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published 16 days ago • 36
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 13 days ago • 34