SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Paper • 2410.05248 • Published Oct 7, 2024 • 8
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8, 2024 • 23
WPO: Enhancing RLHF with Weighted Preference Optimization Paper • 2406.11827 • Published Jun 17, 2024 • 14
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published Jun 17, 2024 • 37