MLLM - a imjliao Collection

imjliao 's Collections

Agent

Prompt

Entity

Information Retrieval

QA

Document Information Extraction

MLLM

AIF

Models

MLLM

updated Feb 11, 2024

Question Aware Vision Transformer for Multimodal Reasoning

Paper • 2402.05472 • Published Feb 8, 2024 • 8
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 40
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 38
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3, 2024 • 51