Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published 9 days ago • 12
Document AI Collection All the papers that can fundementally help in creating a true open-source processing pipeline. • 1 item • Updated Nov 11, 2024 • 1
Focus Anywhere for Fine-grained Multi-page Document Understanding Paper • 2405.14295 • Published May 23, 2024 • 1
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated about 21 hours ago • 53
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 54
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token Paper • 2404.09987 • Published Apr 15, 2024 • 2
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23, 2024 • 32
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 20
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 24