Abstract
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BitNet a4.8: 4-bit Activations for 1-bit LLMs (2024)
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models (2024)
- Qua$^2$SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models (2024)
- DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations (2024)
- Language-Guided Image Tokenization for Generation (2024)
- LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization (2024)
- ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Am I missing something or is this paper very light on details, aka completely lacking even a hint of what they're actually doing?
Where is the model?
Out of curiosity - how all this quantized approaches behave with control nets? Are base control nets supported out of the box or CNs need to be retrained too? Will CNs suffer from quantization simplifications or this is orthogonal things?
Without control nets AI imaging is not that useful in real-life tasks, huge crowd still using SDXL (and even SD1.5) purely because of how control nets are effective on UNet archs
give us WEIGHTS
or be GONE!
shoooo
shooo
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper