Yedson54
's Collections
Reinforcement Learning (RL / RLHF)
updated
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
Understanding and Diagnosing Deep Reinforcement Learning
Paper
•
2406.16979
•
Published
•
9
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
Iterative Nash Policy Optimization: Aligning LLMs with General
Preferences via No-Regret Learning
Paper
•
2407.00617
•
Published
•
7
Offline Regularised Reinforcement Learning for Large Language Models
Alignment
Paper
•
2405.19107
•
Published
•
14
DogeRM: Equipping Reward Models with Domain Knowledge through Model
Merging
Paper
•
2407.01470
•
Published
•
5
Understanding the performance gap between online and offline alignment
algorithms
Paper
•
2405.08448
•
Published
•
14
Value-Incentivized Preference Optimization: A Unified Approach to Online
and Offline RLHF
Paper
•
2405.19320
•
Published
•
10
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Dataset Reset Policy Optimization for RLHF
Paper
•
2404.08495
•
Published
•
8
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper
•
2406.11827
•
Published
•
14
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Paper
•
2406.20095
•
Published
•
17
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous
Reinforcement Learning
Paper
•
2406.11896
•
Published
•
18
Measuring memorization in RLHF for code completion
Paper
•
2406.11715
•
Published
•
6
Artificial Generational Intelligence: Cultural Accumulation in
Reinforcement Learning
Paper
•
2406.00392
•
Published
•
12
Gradient Boosting Reinforcement Learning
Paper
•
2407.08250
•
Published
•
10
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
•
2403.10704
•
Published
•
57
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
•
2402.08609
•
Published
•
34
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
and Style
Paper
•
2410.16184
•
Published
•
24