Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.12226

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41
openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12, 2024 • 4.09M • • 3.9k

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Paper • 2407.04078 • Published Jul 4, 2024 • 17
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 59

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41

Diffusion Model

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41

Models - Multimodal

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

Paper • 2401.11649 • Published Jan 22, 2024 • 3
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 21
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 190

about 7 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 40
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 20

image reconation

liuhaotian/LLaVA-13b-delta-v0

Text Generation • Updated Apr 25, 2023 • 300 • 220
MILVLG/imp-v1-3b

Text Generation • Updated May 26, 2024 • 11k • 203
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41

stabilityai/StableBeluga2

Text Generation • Updated Sep 18, 2023 • 702 • 884
migtissera/Tess-34B-v1.5b

Text Generation • Updated Jan 28, 2024 • 72 • 14
RWKV/v5-Eagle-7B-pth

Updated Feb 24, 2024 • 199
lmsys/lmsys-chat-1m

Viewer • Updated Jul 27, 2024 • 1M • 2.78k • 624

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 41
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 30

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs