Question Aware Vision Transformer for Multimodal Reasoning Paper • 2402.05472 • Published Feb 8, 2024 • 8
ScreenAI: A Vision-Language Model for UI and Infographics Understanding Paper • 2402.04615 • Published Feb 7, 2024 • 40
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8, 2024 • 38