Improving Diffusion Models for Virtual Try-on
Abstract
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- StableGarment: Garment-Centric Generation via Stable Diffusion (2024)
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis (2024)
- Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing (2024)
- Direct Consistency Optimization for Compositional Text-to-Image Personalization (2024)
- PFDM: Parser-Free Virtual Try-on via Diffusion Model (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
hey I tried using IDM-VTON it works very well. is it possible to have it do the same for furniture and appliances for a room?
Models citing this paper 7
Browse 7 models citing this paperDatasets citing this paper 0
No dataset linking this paper