Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.21048

Paper - Multimodal

Paper related to Multimodal Model - Research for a : Modular, Multimodal, Multi-Stream, Mixture of Expert, Universal Transformer, Matryoshka embedding

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 15 days ago • 25
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 19 days ago • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 17 days ago • 116
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 16 days ago • 14

Image-Gen Autoregressive

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 65
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 13
Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 25
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

Paper • 2410.08159 • Published Oct 10, 2024 • 25

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 13

Research papers

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 13
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 37
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 65

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 13

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 108
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 68
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31, 2024 • 13

Image Generation

about 16 hours ago

EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18, 2024 • 21
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17, 2024 • 44
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 47
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11, 2024 • 13

Vision and language

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Paper • 2404.04125 • Published Apr 4, 2024 • 27
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 33
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 11
3D Congealing: 3D-Aware Image Alignment in the Wild

Paper • 2404.02125 • Published Apr 2, 2024 • 7

APISR: Anime Production Inspired Real-World Anime Super-Resolution

Paper • 2403.01598 • Published Mar 3, 2024 • 2
AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19, 2024 • 17
Running on Zero

1.55k

🔥

DALLE 3 XL v2
Running on Zero

263

💻

CosXL

Diffusion Models

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 31
ODIN: A Single Model for 2D and 3D Perception

Paper • 2401.02416 • Published Jan 4, 2024 • 11
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1, 2024 • 21
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3, 2024 • 11

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs