-
KTO: Model Alignment as Prospect Theoretic Optimization
Paper • 2402.01306 • Published • 16 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 50 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 11 -
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Paper • 2408.06266 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2409.02392
-
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 89 -
FuzzCoder: Byte-level Fuzzing Test via Large Language Model
Paper • 2409.01944 • Published • 45 -
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper • 2409.02392 • Published • 15 -
Statically Contextualizing Large Language Models with Typed Holes
Paper • 2409.00921 • Published • 4
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 39 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Automating Thought of Search: A Journey Towards Soundness and Completeness
Paper • 2408.11326 • Published • 1 -
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper • 2409.02392 • Published • 15
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 66 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 36 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 61 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 41 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 88 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 65