Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Paper • 2407.00079 • Published Jun 24, 2024 • 5
RRM: Robust Reward Model Training Mitigates Reward Hacking Paper • 2409.13156 • Published Sep 20, 2024 • 4
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 135
Power-LM Collection Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13, 2024 • 21
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 117
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Paper • 2405.21046 • Published May 31, 2024 • 4
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 87
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published Jun 14, 2024 • 23
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published Jun 10, 2024 • 12
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 64
Chronos Models & Datasets Collection Collection of artifacts related to Chronos pretrained models for time series forecasting. • 12 items • Updated Nov 26, 2024 • 37