This unofficial repository hosts a diffusers-compatible float16 checkpoint of the WDXL base UNet.

For convenience (i.e. for use in a StableDiffusionXLPipeline) we include mirrors of other models (please adhere to their terms of usage):

Usage (diffusers)

StableDiffusionXLPipeline

Diffusers' StableDiffusionXLPipeline convention handles text encoders + UNet + VAE for you:

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
import torch
from torch import Generator
from PIL import Image
from typing import List

# scheduler args documented here:
# https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L98
scheduler: DPMSolverMultistepScheduler = DPMSolverMultistepScheduler.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  subfolder='scheduler',
  algorithm_type='sde-dpmsolver++',
  solver_order=2,
  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
  solver_type='midpoint',
  use_karras_sigmas=True,
)

# pipeline args documented here:
# https://github.com/huggingface/diffusers/blob/95b7de88fd0dffef2533f1cbaf9ffd9d3c6d04c8/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L548
pipe: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  scheduler=scheduler,
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16'
)
pipe.to('cuda')

# StableDiffusionXLPipeline is hardcoded to cast the VAE to float32, but Ollin's VAE works fine in float16
pipe.vae.to(torch.float16)

prompt = 'masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck'
negative_prompt = 'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name'

out: StableDiffusionXLPipelineOutput = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  num_inference_steps=25,
  guidance_scale=12.,
  original_size=(4096, 4096),
  target_size=(1024, 1024),
  height=1024,
  width=1024,
  generator=Generator().manual_seed(48),
)

images: List[Image.Image] = out.images
img, *_ = images

img.save('waifu.png')

You should get a picture like this:

UNet2DConditionModel

If you just want the UNet, you can load it like so:

import torch
from diffusers import UNet2DConditionModel

base_unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
  'Birchlabs/waifu-diffusion-xl-unofficial',
  torch_dtype=torch.float16,
  use_safetensors=True,
  variant='fp16',
  subfolder='unet',
).eval().to(torch.device('cuda'))

How it was converted

I used Kohya's converter script, to convert the official (hakurei/waifu-diffusion-xl) wdxl-aesthetic-0.9.safetensors. See this commit.

I forked kohya's converter script, making one for SDXL.

I invoked it like so:

python scripts/convert_diffusers20_original_sdxl.py \
--fp16 \
--use_safetensors \
--reference_model stabilityai/stable-diffusion-xl-base-0.9 \
in/wdxl-aesthetic-0.9.safetensors \
out/wdxl-diffusers

NOTE: The work here is a Work in Progress! Nothing in this repository is final.

waifu-diffusion-xl - Diffusion for Rich Weebs

waifu-diffusion-xl is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning StabilityAI's SDXL 0.9 model provided as a research preview.

image

masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck

Model Description(s)

  • wdxl-aesthetic-0.9 is a checkpoint that has been finetuned against our in-house aesthetic dataset which was created with the help of 15k aesthetic labels collected by volunteers. This model also used Stability.AI's SDXL 0.9 checkpoint as the base model for finetuning.

License

This model has been released under the SDXL 0.9 RESEARCH LICENSE AGREEMENT due to the repository containing the SDXL 0.9 weights before an official release. We have been given permission to release this model.

Downstream Uses

This model can be used for entertainment purposes and as a generative art assistant.

Team Members and Acknowledgements

This project would not have been possible without the incredible work by Stability AI and Novel AI.

In order to reach us, you can join our Discord server.

Discord Server

Downloads last month
48
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.