DylanASHillier
's Collections
Learning from feedback dir
updated
Suppressing Pink Elephants with Direct Principle Feedback
Paper
•
2402.07896
•
Published
•
9
Policy Improvement using Language Feedback Models
Paper
•
2402.07876
•
Published
•
5
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
29
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
•
2402.11450
•
Published
•
21
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
•
2402.10893
•
Published
•
10
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
•
2402.14830
•
Published
•
24
Iterative Length-Regularized Direct Preference Optimization: A Case
Study on Improving 7B Language Models to GPT-4 Level
Paper
•
2406.11817
•
Published
•
12
Bootstrapping Language Models with DPO Implicit Rewards
Paper
•
2406.09760
•
Published
•
38
Artificial Generational Intelligence: Cultural Accumulation in
Reinforcement Learning
Paper
•
2406.00392
•
Published
•
12
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
30
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
•
2406.19227
•
Published
•
24
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
•
2406.18629
•
Published
•
41
Can LLMs Learn by Teaching? A Preliminary Study
Paper
•
2406.14629
•
Published
•
19
Teaching Embodied Reinforcement Learning Agents: Informativeness and
Diversity of Language Use
Paper
•
2410.24218
•
Published
•
5
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
•
2412.05718
•
Published
•
4
Moto: Latent Motion Token as the Bridging Language for Robot
Manipulation
Paper
•
2412.04445
•
Published
•
21