SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20, 2023 • 8
Woodpecker: Hallucination Correction for Multimodal Large Language Models Paper • 2310.16045 • Published Oct 24, 2023 • 15
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 3
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories Paper • 2305.15028 • Published May 24, 2023 • 1
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 19
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 20
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? Paper • 2311.00047 • Published Oct 31, 2023 • 8
OpenVLA: An Open-Source Vision-Language-Action Model Paper • 2406.09246 • Published Jun 13, 2024 • 36
TroL: Traversal of Layers for Large Language and Vision Models Paper • 2406.12246 • Published Jun 18, 2024 • 34
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 59
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24, 2024 • 25
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 32