UGround

osunlp 's Collections

updated about 21 hours ago

UGround: Universal GUI Visual Grounding for GUI Agents

osunlp/UGround-V1-2B

Image-Text-to-Text • Updated 1 day ago • 116 • 5

Note Based on Qwen2-VL-2B-Instruct
osunlp/UGround-V1-7B

Image-Text-to-Text • Updated 1 day ago • 70 • 1

Note Based on Qwen2-VL-7B-Instruct
osunlp/UGround-V1-72B-Preview

Image-Text-to-Text • Updated 24 minutes ago • 2

Note Based on Qwen2-VL-72B-Instruct
osunlp/UGround

Image-Text-to-Text • Updated 1 day ago • 16.3k • 19

Note The initial model. Based on the modified LLaVA arch (CLIP + Vicuna-7B) describe in the paper
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Paper • 2410.05243 • Published Oct 7, 2024 • 18

Note Low-cost, scalable and effective data synthesis pipeline for GUI visaul grounding; SOTA GUI visual grounding model UGround; purely vision-only (modular) GUI agent framework SeeAct-V; first time demonstrating SOTA performance of vision-only GUI agents.
Paused

13

📱💻🌐

UGround

Note Paused. Will open a new one for Qwen2-VL-based UGround
Paused

1

📱💻🌐

UGround-V1-2B

Note Paused. Trying to figure out how to accelerate the inference. And will open a new one for UGround-V1.1