RLHF - a huanbin11 Collection

huanbin11 's Collections

RLHF

RLHF

updated Oct 10, 2024

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Paper • 2410.02743 • Published Oct 3, 2024 • 7
Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 15