AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 22 days ago • 26
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 29 days ago • 54
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 46
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 31
DiT: Self-supervised Pre-training for Document Image Transformer Paper • 2203.02378 • Published Mar 4, 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Paper • 2012.14740 • Published Dec 29, 2020 • 1
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Paper • 2104.08836 • Published Apr 18, 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Paper • 2110.08518 • Published Oct 16, 2021 • 1
LayoutLM: Pre-training of Text and Layout for Document Image Understanding Paper • 1912.13318 • Published Dec 31, 2019 • 2