|
--- |
|
license: mit |
|
datasets: |
|
- huggan/anime-faces |
|
pipeline_tag: unconditional-image-generation |
|
tags: |
|
- art |
|
--- |
|
# Abstract |
|
|
|
**DDPM** model trained on [huggan/anime-faces](https://huggingface.co/datasets/huggan/anime-faces) dataset. |
|
|
|
## Training Arguments |
|
|
|
| Argument | Value | |
|
| :-------------------------: | :----: | |
|
| image_size | 64 | |
|
| train_batch_size | 16 | |
|
| eval_batch_size | 16 | |
|
| num_epochs | 50 | |
|
| gradient_accumulation_steps | 1 | |
|
| learning_rate | 1e-4 | |
|
| lr_warmup_steps | 500 | |
|
| mixed_precision | "fp16" | |
|
|
|
For training code, please refer to [this link](https://github.com/LittleNyima/code-snippets/blob/master/ddpm-tutorial/ddpm_training.py). |
|
|
|
# Inference |
|
|
|
This project aims to implement DDPM from scratch, so `DDPMScheduler` is not used. Instead, I use only `UNet2DModel` and implement a simple scheduler myself. The inference code is: |
|
|
|
```python |
|
import torch |
|
from tqdm import tqdm |
|
from diffusers import UNet2DModel |
|
|
|
class DDPM: |
|
def __init__( |
|
self, |
|
num_train_timesteps:int = 1000, |
|
beta_start: float = 0.0001, |
|
beta_end: float = 0.02, |
|
): |
|
self.num_train_timesteps = num_train_timesteps |
|
self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32) |
|
self.alphas = 1.0 - self.betas |
|
self.alphas_cumprod = torch.cumprod(self.alphas, dim=0) |
|
self.timesteps = torch.arange(num_train_timesteps - 1, -1, -1) |
|
|
|
def add_noise( |
|
self, |
|
original_samples: torch.Tensor, |
|
noise: torch.Tensor, |
|
timesteps: torch.Tensor, |
|
): |
|
alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device ,dtype=original_samples.dtype) |
|
noise = noise.to(original_samples.device) |
|
timesteps = timesteps.to(original_samples.device) |
|
|
|
# \sqrt{\bar\alpha_t} |
|
sqrt_alpha_prod = alphas_cumprod[timesteps].flatten() ** 0.5 |
|
while len(sqrt_alpha_prod.shape) < len(original_samples.shape): |
|
sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1) |
|
|
|
# \sqrt{1 - \bar\alpha_t} |
|
sqrt_one_minus_alpha_prod = (1.0 - alphas_cumprod[timesteps]).flatten() ** 0.5 |
|
while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape): |
|
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1) |
|
|
|
return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise |
|
|
|
@torch.no_grad() |
|
def sample( |
|
self, |
|
unet: UNet2DModel, |
|
batch_size: int, |
|
in_channels: int, |
|
sample_size: int, |
|
): |
|
betas = self.betas.to(unet.device) |
|
alphas = self.alphas.to(unet.device) |
|
alphas_cumprod = self.alphas_cumprod.to(unet.device) |
|
timesteps = self.timesteps.to(unet.device) |
|
images = torch.randn((batch_size, in_channels, sample_size, sample_size), device=unet.device) |
|
for timestep in tqdm(timesteps, desc='Sampling'): |
|
pred_noise: torch.Tensor = unet(images, timestep).sample |
|
|
|
# mean of q(x_{t-1}|x_t) |
|
alpha_t = alphas[timestep] |
|
alpha_cumprod_t = alphas_cumprod[timestep] |
|
sqrt_alpha_t = alpha_t ** 0.5 |
|
one_minus_alpha_t = 1.0 - alpha_t |
|
sqrt_one_minus_alpha_cumprod_t = (1 - alpha_cumprod_t) ** 0.5 |
|
mean = (images - one_minus_alpha_t / sqrt_one_minus_alpha_cumprod_t * pred_noise) / sqrt_alpha_t |
|
|
|
# variance of q(x_{t-1}|x_t) |
|
if timestep > 1: |
|
beta_t = betas[timestep] |
|
one_minus_alpha_cumprod_t_minus_one = 1.0 - alphas_cumprod[timestep - 1] |
|
one_divided_by_sigma_square = alpha_t / beta_t + 1.0 / one_minus_alpha_cumprod_t_minus_one |
|
variance = (1.0 / one_divided_by_sigma_square) ** 0.5 |
|
else: |
|
variance = torch.zeros_like(timestep) |
|
|
|
epsilon = torch.randn_like(images) |
|
images = mean + variance * epsilon |
|
images = (images / 2.0 + 0.5).clamp(0, 1).cpu().permute(0, 2, 3, 1).numpy() |
|
return images |
|
|
|
model = UNet2DModel.from_pretrained('ddpm-animefaces-64').cuda() |
|
ddpm = DDPM() |
|
images = ddpm.sample(model, 32, 3, 64) |
|
|
|
from diffusers.utils import make_image_grid, numpy_to_pil |
|
image_grid = make_image_grid(numpy_to_pil(images), rows=4, cols=8) |
|
image_grid.save('ddpm-sample-results.png') |
|
``` |
|
|
|
This can also be found in [this link](https://github.com/LittleNyima/code-snippets/blob/master/ddpm-tutorial/ddpm_sampling.py). |