OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12, 2024 • 41
Generative Agents: Interactive Simulacra of Human Behavior Paper • 2304.03442 • Published Apr 7, 2023 • 12
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Paper • 2310.04406 • Published Oct 6, 2023 • 8
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation Paper • 2312.13010 • Published Dec 20, 2023 • 4
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19, 2024 • 41
Scaling Instructable Agents Across Many Simulated Worlds Paper • 2404.10179 • Published Mar 13, 2024 • 27
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 46
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents Paper • 2404.05902 • Published Apr 8, 2024 • 20
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 82
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4, 2024 • 24
Voyager: An Open-Ended Embodied Agent with Large Language Models Paper • 2305.16291 • Published May 25, 2023 • 9
LASER: LLM Agent with State-Space Exploration for Web Navigation Paper • 2309.08172 • Published Sep 15, 2023 • 11
The Rise and Potential of Large Language Model Based Agents: A Survey Paper • 2309.07864 • Published Sep 14, 2023 • 7
Reflexion: Language Agents with Verbal Reinforcement Learning Paper • 2303.11366 • Published Mar 20, 2023 • 4
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20, 2024 • 28
OpenVLA: An Open-Source Vision-Language-Action Model Paper • 2406.09246 • Published Jun 13, 2024 • 36
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks Paper • 2305.17390 • Published May 27, 2023 • 2
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18, 2024 • 40
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 33
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 12
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 24
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30, 2024 • 23
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation Paper • 2408.00764 • Published Aug 1, 2024 • 1
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13, 2024 • 40
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 117
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Paper • 2408.14354 • Published Aug 26, 2024 • 40
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments Paper • 2405.07960 • Published May 13, 2024 • 1
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12, 2024 • 66
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published Sep 9, 2024 • 10
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 31
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 16 days ago • 47
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published 22 days ago • 31
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 7 days ago • 63
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 4 days ago • 16
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published 12 days ago • 12
Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published 16 days ago • 11
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 22 days ago • 26