Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29, 2024 • 16
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published Oct 25, 2024 • 17
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Paper • 2411.00412 • Published Nov 1, 2024 • 9
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning Paper • 2410.02052 • Published Oct 2, 2024 • 9
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2, 2024 • 24
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published Oct 4, 2024 • 11