-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 38 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 80 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2405.09798
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 28 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper • 2401.01974 • Published • 6 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 3 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 17 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 17 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 13