FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 19 days ago • 13
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published 20 days ago • 8
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 18 days ago • 23
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper • 2412.14123 • Published 18 days ago • 11
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Paper • 2412.12953 • Published 20 days ago • 11
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 19 days ago • 18
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 18 days ago • 48
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 19 days ago • 116
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 21 days ago • 41
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 24 days ago • 81
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 20 days ago • 23
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 22 days ago • 26
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 26 days ago • 35
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 21 days ago • 33
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper • 2412.06673 • Published 28 days ago • 11
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Paper • 2412.03548 • Published Dec 4, 2024 • 17
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published 26 days ago • 30
Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation Paper • 2412.07797 • Published Dec 5, 2024 • 11