Iceclear commited on
Commit
299f9c7
·
1 Parent(s): a52f4c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -8
README.md CHANGED
@@ -8,7 +8,6 @@ This model card focuses on the models associated with the StableSR, available [h
8
  ## Model Details
9
  - **Developed by:** Jianyi Wang
10
  - **Model type:** Diffusion-based image super-resolution model
11
- - **Language(s):** English
12
  - **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
13
  - **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
14
  - **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
@@ -39,7 +38,7 @@ Such strong conditions make our model less likely to be affected.
39
  ## Training
40
 
41
  **Training Data**
42
- The model developers used the following dataset for training the model:
43
 
44
  - Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
45
  - We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
@@ -47,19 +46,17 @@ The model developers used the following dataset for training the model:
47
  **Training Procedure**
48
  StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
49
 
50
- - Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
51
  - The latent representations are fed to the time-aware encoder as guidance.
52
  - The loss is the same as Stable Diffusion.
53
  - After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
54
  - The VQGAN model is fixed and only CFW is trainable.
55
- - The loss is similar to training a VQGAN except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
56
 
57
- We currently provide the following checkpoints, for various versions:
58
 
59
  - `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
60
  - `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
61
 
62
  ## Evaluation Results
63
- See [Paper](https://arxiv.org/abs/2305.07015) for details.
64
-
65
-
 
8
  ## Model Details
9
  - **Developed by:** Jianyi Wang
10
  - **Model type:** Diffusion-based image super-resolution model
 
11
  - **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
12
  - **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
13
  - **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
 
38
  ## Training
39
 
40
  **Training Data**
41
+ The model developer used the following dataset for training the model:
42
 
43
  - Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
44
  - We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
 
46
  **Training Procedure**
47
  StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
48
 
49
+ - Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
50
  - The latent representations are fed to the time-aware encoder as guidance.
51
  - The loss is the same as Stable Diffusion.
52
  - After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
53
  - The VQGAN model is fixed and only CFW is trainable.
54
+ - The loss is similar to training a VQGAN, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
55
 
56
+ We currently provide the following checkpoints:
57
 
58
  - `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
59
  - `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
60
 
61
  ## Evaluation Results
62
+ See [Paper](https://arxiv.org/abs/2305.07015) for details.