Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.12966

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 5
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 182
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Top Vision-Language Papers 🖼️💬📝

A curated list of papers on vision-language models, with the most influential ones at the top.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 43
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25

Running on CPU Upgrade

1.66k

🏢

Anychat
Running

258

🐢

Qwen2.5 Coder Artifacts
Running

873

🔍

QwQ-32B-Preview

QwQ-32B-Preview
Running on CPU Upgrade

12.4k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

Multimodal Papers

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 16
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 15
MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21, 2024 • 16

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 43
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25

Vision-Language Model

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9

Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

Paper • 2309.15915 • Published Sep 27, 2023 • 2
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Paper • 2310.00653 • Published Oct 1, 2023 • 3
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Paper • 2309.09958 • Published Sep 18, 2023 • 18

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
Running

197

📷🎨👀

Qwen-VL-Plus

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Paper • 2401.10208 • Published Jan 18, 2024 • 1
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Paper • 2305.11172 • Published May 18, 2023 • 1
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Paper • 2302.00402 • Published Feb 1, 2023
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 50
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 45
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 23

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs