Papers
arxiv:2401.01808

aMUSEd: An Open MUSE Reproduction

Published on Jan 3, 2024
· Submitted by akhaliq on Jan 4, 2024
#1 Paper of the day
Authors:
,
,

Abstract

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

Community

An old man was standing on an old bridge. He remembered how he had dived when he was young. He wanted to dive, but he didn't dare

Meet aMUSEd: A Lightweight Revolution in Text-to-Image Generation

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.01808 in a dataset README.md to link it from this page.

Spaces citing this paper 4

Collections including this paper 7