-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 66 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2405.08448
-
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 15 -
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Paper • 2405.19107 • Published • 14 -
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 31
-
xLSTM: Extended Long Short-Term Memory
Paper • 2405.04517 • Published • 12 -
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 10 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 127
-
Iterative Reasoning Preference Optimization
Paper • 2404.19733 • Published • 47 -
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 73 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 64 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14 -
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Paper • 2405.17428 • Published • 17
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 64 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 29
-
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper • 2402.12366 • Published • 3 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 57 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 183 -
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Paper • 2401.08417 • Published • 34
-
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 12 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 605 -
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper • 2402.16840 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 114
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20