Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.06657

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 52
Towards Efficient and Exact Optimization of Language Model Alignment

Paper • 2402.00856 • Published Feb 1, 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 13
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 12
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 243

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 2
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 24
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

Efficient RLHF: Reducing the Memory Usage of PPO

Paper • 2309.00754 • Published Sep 1, 2023 • 13
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Aligning Large Multimodal Models with Factually Augmented RLHF

Paper • 2309.14525 • Published Sep 25, 2023 • 30
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 9

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 19
Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 37
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Paper • 2309.10150 • Published Sep 18, 2023 • 24

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 42
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 65
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 53

Efficient RLHF: Reducing the Memory Usage of PPO

Paper • 2309.00754 • Published Sep 1, 2023 • 13
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Paper • 2309.07462 • Published Sep 14, 2023 • 4
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 9

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 7
Layerwise Recurrent Router for Mixture-of-Experts

Paper • 2408.06793 • Published Aug 13, 2024 • 32
Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs