The Superposition of Diffusion Models Using the ItΓ΄ Density Estimator
Abstract
The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable It\^o density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, and improved unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion
Community
Have you ever wanted to combine different pre-trained diffusion models but don't have time or data to retrain a new, bigger model?
π Introducing SuperDiff π¦ΉββοΈ β a principled method for efficiently combining multiple pre-trained diffusion models solely during inference!
We provide a new approach for estimating density without touching the divergence. This gives us the control to easily interpolate concepts (logical AND) or mix densities (logical OR), allowing us to create one-of-a-kind generations! β‘ππ€
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Accelerating Video Diffusion Models via Distribution Matching (2024)
- Nested Diffusion Models Using Hierarchical Latent Priors (2024)
- Learning on Less: Constraining Pre-trained Model Learning for Generalizable Diffusion-Generated Image Detection (2024)
- Inference-Time Diffusion Model Distillation (2024)
- InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models (2024)
- Arbitrary-steps Image Super-resolution via Diffusion Inversion (2024)
- Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper