BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 13
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 13
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? Paper • 2403.07718 • Published Mar 12, 2024 • 1
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks Paper • 2407.05291 • Published Jul 7, 2024 • 2
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3, 2024 • 29
Multimodal foundation world models for generalist embodied agents Paper • 2406.18043 • Published Jun 26, 2024 • 1
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17, 2024 • 17
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels Paper • 2209.12016 • Published Sep 24, 2022
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory Paper • 2306.11941 • Published Jun 20, 2023
Capture the Flag: Uncovering Data Insights with Large Language Models Paper • 2312.13876 • Published Dec 21, 2023 • 1
Choreographer: Learning and Adapting Skills in Imagination Paper • 2211.13350 • Published Nov 23, 2022
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation Paper • 2003.14166 • Published Mar 23, 2020
Reducing hallucination in structured outputs via Retrieval-Augmented Generation Paper • 2404.08189 • Published Apr 12, 2024 • 1