Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 28
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17, 2024 • 32
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 42
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published Sep 24, 2024 • 16
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30, 2024 • 37
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published Aug 20, 2024 • 11
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 58
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15, 2024 • 45
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 53
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16, 2024 • 54
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 53