Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Paper • 2407.06723 • Published Jul 9, 2024 • 11
Understanding Visual Feature Reliance through the Lens of Complexity Paper • 2407.06076 • Published Jul 8, 2024 • 5