Collections
Discover the best community collections!
Collections including paper arxiv:2402.12226
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 41 -
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Paper • 2407.04078 • Published • 17 -
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Paper • 2412.04424 • Published • 59
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 41 -
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Paper • 2401.11649 • Published • 3 -
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Paper • 2402.15504 • Published • 21 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 190
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 41 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 30