Papers
arxiv:2409.14674

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Published on Sep 23, 2024
· Submitted by tnlin on Sep 24, 2024
#2 Paper of the day
Authors:

Abstract

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.

Community

Paper author

In this paper, we found that visuomotor policies trained with rich language instructions and failure recovery behaviors demonstrate superior robustness and adaptability.

Rich language instructions provide more comprehensive details for failure recovery, such as failure analysis, spatial movements, target object attributes, and the expected outcome, and can guide the policy with more accurate control while serving as a form of regularization to prevent overfitting and improve generalization.

Our proposed model, RACER, not only surpasses previous state-of-the-art baselines on standard RLbench tasks, but also excels in handling dynamic task goal changes, zero-shot transfer to unseen tasks, and real-world scenarios.

Paper submitter

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.

This comment has been hidden

just out of curiosity, what made you choose that acronym "RACER" specifically? When i first read the title of the paper i was really struggling to see how you got from "Rich Language-Guided Failure Recovery Policies" to "RACER" lol

personally i'd prefer a more boring acronym that's better connected to what it stands for, instead of forcing a cool sounding one. at least actually using the beginnings of the words. Since it's a VLM that does the supervising, you could call it something like "Vision-Language guided error recovery / correction" and shorten it to "VL-GER" / "VL-GEC", just for example.

Maybe not as cool sounding as racer but at least the readers don't have to do mental acrobatics to go from "racer" to "Rich Language-Guided Failure Recovery Policies" everytime they read it

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.14674 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.14674 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.14674 in a Space README.md to link it from this page.

Collections including this paper 4