-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 116 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2405.21048
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 65 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 13 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 25 -
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Paper • 2410.08159 • Published • 25
-
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 13 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 37 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 65
-
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 68 -
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Paper • 2405.21048 • Published • 13
-
EdgeFusion: On-Device Text-to-Image Generation
Paper • 2404.11925 • Published • 21 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 44 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47 -
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Paper • 2404.07724 • Published • 13
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 27 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 33 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 7
-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 31 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 21 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11