--- license: apache-2.0 language: - en tags: - instancediffusion - layout-to-image library_name: diffusers --- # Diffusers 🧨 port of [InstanceDiffusion: Instance-level Control for Image Generation (CVPR 2024)](https://arxiv.org/abs/2402.03290) - Original authors: Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra - Original github repo by authors: https://github.com/frank-xwang/InstanceDiffusion - Converted to Diffusers: Kyeongryeol Go # Checkpoint - original checkpoint: https://huggingface.co/xudongw/InstanceDiffusion/resolve/main/instancediffusion_sd15.pth - original configuration yaml: https://github.com/frank-xwang/InstanceDiffusion/blob/main/configs/test_sd15.yaml # Install StableDiffusionINSTDIFFPipeline is yet merged into diffusers. Please refer to the forked version. ```bash git clone -b instancediffusion https://github.com/gokyeongryeol/diffusers.git cd diffusers & pip install -e . ``` # Example Usage ```python import torch from diffusers import StableDiffusionINSTDIFFPipeline pipe = StableDiffusionINSTDIFFPipeline.from_pretrained( "kyeongry/instancediffusion_sd15", # variant="fp16", torch_dtype=torch.float16, ) pipe = pipe.to("cuda") prompt = "a yellow American robin, brown Maltipoo dog, a gray British Shorthair in a stream, alongside with trees and rocks" negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality" # normalized (xmin,ymin,xmax,ymax) boxes = [ [0.0, 0.099609375, 0.349609375, 0.548828125], [0.349609375, 0.19921875, 0.6484375, 0.498046875], [0.6484375, 0.19921875, 0.998046875, 0.697265625], [0.0, 0.69921875, 1.0, 0.998046875], ] phrases = [ "a gray British Shorthair standing on a rock in the woods", "a yellow American robin standing on the rock", "a brown Maltipoo dog standing on the rock", "a close up of a small waterfall in the woods", ] image = pipe( prompt=prompt, negative_prompt=negative_prompt, instdiff_phrases=phrases, instdiff_boxes=boxes, instdiff_scheduled_sampling_alpha=0.8, # proportion of using gated-self-attention instdiff_scheduled_sampling_beta=0.36, # proportion of using multi-instance sampler guidance_scale=7.5, output_type="pil", num_inference_steps=50, ).images[0] image.save("./instancediffusion-sd15-layout2image-generation.jpg") ``` # Sample Output ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/640f071006c3b5ca883ea2d6/G1YVfIhmr0OABbzmPAc91.jpeg)