-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper • 2405.09215 • Published • 20 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2405.14129
-
Vript: A Video Is Worth Thousands of Words
Paper • 2406.06040 • Published • 26 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 73 -
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper • 2406.01574 • Published • 44 -
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Paper • 2405.21075 • Published • 22
-
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 10 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 12
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 41 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45