Video-Guided Foley Sound Generation with Multimodal Controls Paper • 2411.17698 • Published Nov 26, 2024 • 7
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use Paper • 2410.24218 • Published Oct 31, 2024 • 5
DANLI: Deliberative Agent for Following Natural Language Instructions Paper • 2210.12485 • Published Oct 22, 2022
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets Paper • 2007.03626 • Published Jul 7, 2020
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published Jun 7, 2024 • 27
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Paper • 2309.12311 • Published Sep 21, 2023 • 17
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection Paper • 2301.01767 • Published Jan 4, 2023
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Paper • 2401.18084 • Published Jan 31, 2024
Images that Sound: Composing Images and Sounds on a Single Canvas Paper • 2405.12221 • Published May 20, 2024 • 1
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation Paper • 2303.11329 • Published Mar 20, 2023 • 1