Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 26 days ago • 136
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 61
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 188
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10, 2024 • 105
How Far Are We from Intelligent Visual Deductive Reasoning? Paper • 2403.04732 • Published Mar 7, 2024 • 19
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation Paper • 2402.17245 • Published Feb 27, 2024 • 10
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities Paper • 2308.02490 • Published Aug 4, 2023 • 16
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition Paper • 2307.13269 • Published Jul 25, 2023 • 31
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53