swami2004
's Collections
Papers to Read
updated
mDPO: Conditional Preference Optimization for Multimodal Large Language
Models
Paper
•
2406.11839
•
Published
•
37
Pandora: Towards General World Model with Natural Language Actions and
Video States
Paper
•
2406.09455
•
Published
•
15
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper
•
2406.11827
•
Published
•
14
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper
•
2406.11194
•
Published
•
15
Breaking the Attention Bottleneck
Paper
•
2406.10906
•
Published
•
4
Deep Bayesian Active Learning for Preference Modeling in Large Language
Models
Paper
•
2406.10023
•
Published
•
2
RVT-2: Learning Precise Manipulation from Few Demonstrations
Paper
•
2406.08545
•
Published
•
7
Paper
•
2406.09414
•
Published
•
95
Transformers meet Neural Algorithmic Reasoners
Paper
•
2406.09308
•
Published
•
43
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
Paper
•
2406.07522
•
Published
•
37
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
•
2406.05338
•
Published
•
39
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper
•
2406.06469
•
Published
•
24
RePLan: Robotic Replanning with Perception and Language Models
Paper
•
2401.04157
•
Published
•
3
Generative Expressive Robot Behaviors using Large Language Models
Paper
•
2401.14673
•
Published
•
5
Scaling Laws for Reward Model Overoptimization in Direct Alignment
Algorithms
Paper
•
2406.02900
•
Published
•
11
PLaD: Preference-based Large Language Model Distillation with
Pseudo-Preference Pairs
Paper
•
2406.02886
•
Published
•
8
Self-Improving Robust Preference Optimization
Paper
•
2406.01660
•
Published
•
18
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
•
2405.20340
•
Published
•
19
Offline Regularised Reinforcement Learning for Large Language Models
Alignment
Paper
•
2405.19107
•
Published
•
14
Value-Incentivized Preference Optimization: A Unified Approach to Online
and Offline RLHF
Paper
•
2405.19320
•
Published
•
10
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
87
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Octo: An Open-Source Generalist Robot Policy
Paper
•
2405.12213
•
Published
•
24
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Paper
•
2405.10315
•
Published
•
10
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
Self-Play Preference Optimization for Language Model Alignment
Paper
•
2405.00675
•
Published
•
25
Iterative Reasoning Preference Optimization
Paper
•
2404.19733
•
Published
•
47
KAN: Kolmogorov-Arnold Networks
Paper
•
2404.19756
•
Published
•
108
A Multimodal Automated Interpretability Agent
Paper
•
2404.14394
•
Published
•
20
Learning H-Infinity Locomotion Control
Paper
•
2404.14405
•
Published
•
6
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual
Alignment
Paper
•
2404.12318
•
Published
•
14
Scaling Instructable Agents Across Many Simulated Worlds
Paper
•
2404.10179
•
Published
•
27
Learn Your Reference Model for Real Good Alignment
Paper
•
2404.09656
•
Published
•
82
Dataset Reset Policy Optimization for RLHF
Paper
•
2404.08495
•
Published
•
8
UniFL: Improve Stable Diffusion via Unified Feedback Learning
Paper
•
2404.05595
•
Published
•
23
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
Robust Gaussian Splatting
Paper
•
2404.04211
•
Published
•
8
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
•
2404.03673
•
Published
•
14