-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 15 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 8 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2309.11419
-
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 12 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 11 -
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 9
-
MEGA: Multilingual Evaluation of Generative AI
Paper • 2303.12528 • Published -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 13 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
A Unified View of Masked Image Modeling
Paper • 2210.10615 • Published
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Paper • 2306.14824 • Published • 34 -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Paper • 2310.02992 • Published • 4 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 55
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 9 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 87 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6