SAIL-Sailor/sailor2_8B_sft_500step_hf_1102_longxu_dpo_417step_zichen Text Generation • Updated Nov 25, 2024 • 15
Locality Sensitive Sparse Encoding for Learning World Models Online Paper • 2401.13034 • Published Jan 23, 2024
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 6
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 38
💡 DICE Collection Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 35
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 38