-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 104 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 47 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 46 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04792
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 340 -
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 17 -
477📈
Scaling test-time compute
-
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 20
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 52 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 13 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13
-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 16 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 52 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
-
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper • 2402.12366 • Published • 3 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 58 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Paper • 2401.08417 • Published • 34
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 41 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 20 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 37
-
Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 10 -
Policy Improvement using Language Feedback Models
Paper • 2402.07876 • Published • 6 -
Direct Language Model Alignment from Online AI Feedback
Paper • 2402.04792 • Published • 30 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 64