diff --git a/diffusers/docs/README.md b/diffusers/docs/README.md deleted file mode 100644 index 739f880f65650b5249bdce7539664e53b51d7496..0000000000000000000000000000000000000000 --- a/diffusers/docs/README.md +++ /dev/null @@ -1,271 +0,0 @@ - - -# Generating the documentation - -To generate the documentation, you first have to build it. Several packages are necessary to build the doc, -you can install them with the following command, at the root of the code repository: - -```bash -pip install -e ".[docs]" -``` - -Then you need to install our open source documentation builder tool: - -```bash -pip install git+https://github.com/huggingface/doc-builder -``` - ---- -**NOTE** - -You only need to generate the documentation to inspect it locally (if you're planning changes and want to -check how they look before committing for instance). You don't have to commit the built documentation. - ---- - -## Previewing the documentation - -To preview the docs, first install the `watchdog` module with: - -```bash -pip install watchdog -``` - -Then run the following command: - -```bash -doc-builder preview {package_name} {path_to_docs} -``` - -For example: - -```bash -doc-builder preview diffusers docs/source/en -``` - -The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives. - ---- -**NOTE** - -The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again). - ---- - -## Adding a new element to the navigation bar - -Accepted files are Markdown (.md or .mdx). - -Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting -the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/diffusers/blob/main/docs/source/_toctree.yml) file. - -## Renaming section headers and moving sections - -It helps to keep the old links working when renaming the section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums, and Social media and it'd make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information. - -Therefore, we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor. - -So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file: - -``` -Sections that were moved: - -[ Section A ] -``` -and of course, if you moved it to another file, then: - -``` -Sections that were moved: - -[ Section A ] -``` - -Use the relative style to link to the new file so that the versioned docs continue to work. - -For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.mdx). - - -## Writing Documentation - Specification - -The `huggingface/diffusers` documentation follows the -[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings, -although we can write them directly in Markdown. - -### Adding a new tutorial - -Adding a new tutorial or section is done in two steps: - -- Add a new file under `docs/source`. This file can either be ReStructuredText (.rst) or Markdown (.md). -- Link that file in `docs/source/_toctree.yml` on the correct toc-tree. - -Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so -depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or four. - -### Adding a new pipeline/scheduler - -When adding a new pipeline: - -- create a file `xxx.mdx` under `docs/source/api/pipelines` (don't hesitate to copy an existing file as template). -- Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.mdx`, along with the link to the paper, and a colab notebook (if available). -- Write a short overview of the diffusion model: - - Overview with paper & authors - - Paper abstract - - Tips and tricks and how to use it best - - Possible an end-to-end example of how to use it -- Add all the pipeline classes that should be linked in the diffusion model. These classes should be added using our Markdown syntax. By default as follows: - -``` -## XXXPipeline - -[[autodoc]] XXXPipeline - - all - - __call__ -``` - -This will include every public method of the pipeline that is documented, as well as the `__call__` method that is not documented by default. If you just want to add additional methods that are not documented, you can put the list of all methods to add in a list that contains `all`. - -``` -[[autodoc]] XXXPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention -``` - -You can follow the same process to create a new scheduler under the `docs/source/api/schedulers` folder - -### Writing source documentation - -Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names -and objects like True, None, or any strings should usually be put in `code`. - -When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool -adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or -function to be in the main package. - -If you want to create a link to some internal class or function, you need to -provide its path. For instance: \[\`pipelines.ImagePipelineOutput\`\]. This will be converted into a link with -`pipelines.ImagePipelineOutput` in the description. To get rid of the path and only keep the name of the object you are -linking to in the description, add a ~: \[\`~pipelines.ImagePipelineOutput\`\] will generate a link with `ImagePipelineOutput` in the description. - -The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\]. - -#### Defining arguments in a method - -Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and -an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon, and its -description: - -``` - Args: - n_layers (`int`): The number of layers of the model. -``` - -If the description is too long to fit in one line, another indentation is necessary before writing the description -after the argument. - -Here's an example showcasing everything so far: - -``` - Args: - input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): - Indices of input sequence tokens in the vocabulary. - - Indices can be obtained using [`AlbertTokenizer`]. See [`~PreTrainedTokenizer.encode`] and - [`~PreTrainedTokenizer.__call__`] for details. - - [What are input IDs?](../glossary#input-ids) -``` - -For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the -following signature: - -``` -def my_function(x: str = None, a: float = 1): -``` - -then its documentation should look like this: - -``` - Args: - x (`str`, *optional*): - This argument controls ... - a (`float`, *optional*, defaults to 1): - This argument is used to ... -``` - -Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even -if the first line describing your argument type and its default gets long, you can't break it on several lines. You can -however write as many lines as you want in the indented description (see the example above with `input_ids`). - -#### Writing a multi-line code block - -Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown: - - -```` -``` -# first line of code -# second line -# etc -``` -```` - -#### Writing a return block - -The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation. -The first line should be the type of the return, followed by a line return. No need to indent further for the elements -building the return. - -Here's an example of a single value return: - -``` - Returns: - `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token. -``` - -Here's an example of a tuple return, comprising several objects: - -``` - Returns: - `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs: - - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` -- - Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss. - - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- - Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). -``` - -#### Adding an image - -Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like -the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference -them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images). -If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images -to this dataset. - -## Styling the docstring - -We have an automatic script running with the `make style` command that will make sure that: -- the docstrings fully take advantage of the line width -- all code examples are formatted using black, like the code of the Transformers library - -This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's -recommended to commit your changes before running `make style`, so you can revert the changes done by that script -easily. - diff --git a/diffusers/docs/TRANSLATING.md b/diffusers/docs/TRANSLATING.md deleted file mode 100644 index 32cd95f2ade9ba90ed6a10b1c54169b26a79d01d..0000000000000000000000000000000000000000 --- a/diffusers/docs/TRANSLATING.md +++ /dev/null @@ -1,57 +0,0 @@ -### Translating the Diffusers documentation into your language - -As part of our mission to democratize machine learning, we'd love to make the Diffusers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏. - -**🗞️ Open an issue** - -To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button. - -Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list. - - -**🍴 Fork the repository** - -First, you'll need to [fork the Diffusers repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo). You can do this by clicking on the **Fork** button on the top-right corner of this repo's page. - -Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows: - -```bash -git clone https://github.com/YOUR-USERNAME/diffusers.git -``` - -**📋 Copy-paste the English version with a new language code** - -The documentation files are in one leading directory: - -- [`docs/source`](https://github.com/huggingface/diffusers/tree/main/docs/source): All the documentation materials are organized here by language. - -You'll only need to copy the files in the [`docs/source/en`](https://github.com/huggingface/diffusers/tree/main/docs/source/en) directory, so first navigate to your fork of the repo and run the following: - -```bash -cd ~/path/to/diffusers/docs -cp -r source/en source/LANG-ID -``` - -Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table. - -**✍️ Start translating** - -The fun part comes - translating the text! - -The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website. - -> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory! - -The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml): - -```yaml -- sections: - - local: pipeline_tutorial # Do not change this! Use the same name for your .md file - title: Pipelines for inference # Translate this! - ... - title: Tutorials # Translate this! -``` - -Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter. - -> 🙋 If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/diffusers/issues) and tag @patrickvonplaten. diff --git a/diffusers/docs/source/_config.py b/diffusers/docs/source/_config.py deleted file mode 100644 index 9a4818ea8b1e19007c9e6440a3a98383031278cb..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/_config.py +++ /dev/null @@ -1,9 +0,0 @@ -# docstyle-ignore -INSTALL_CONTENT = """ -# Diffusers installation -! pip install diffusers transformers datasets accelerate -# To install from source instead of the last release, comment the command above and uncomment the following one. -# ! pip install git+https://github.com/huggingface/diffusers.git -""" - -notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}] \ No newline at end of file diff --git a/diffusers/docs/source/en/_toctree.yml b/diffusers/docs/source/en/_toctree.yml deleted file mode 100644 index dc40d9b142baf2e8b0ab0298a90131d353216c04..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/_toctree.yml +++ /dev/null @@ -1,264 +0,0 @@ -- sections: - - local: index - title: 🧨 Diffusers - - local: quicktour - title: Quicktour - - local: stable_diffusion - title: Effective and efficient diffusion - - local: installation - title: Installation - title: Get started -- sections: - - local: tutorials/tutorial_overview - title: Overview - - local: using-diffusers/write_own_pipeline - title: Understanding models and schedulers - - local: tutorials/basic_training - title: Train a diffusion model - title: Tutorials -- sections: - - sections: - - local: using-diffusers/loading_overview - title: Overview - - local: using-diffusers/loading - title: Load pipelines, models, and schedulers - - local: using-diffusers/schedulers - title: Load and compare different schedulers - - local: using-diffusers/custom_pipeline_overview - title: Load and add custom pipelines - - local: using-diffusers/kerascv - title: Load KerasCV Stable Diffusion checkpoints - title: Loading & Hub - - sections: - - local: using-diffusers/pipeline_overview - title: Overview - - local: using-diffusers/unconditional_image_generation - title: Unconditional image generation - - local: using-diffusers/conditional_image_generation - title: Text-to-image generation - - local: using-diffusers/img2img - title: Text-guided image-to-image - - local: using-diffusers/inpaint - title: Text-guided image-inpainting - - local: using-diffusers/depth2img - title: Text-guided depth-to-image - - local: using-diffusers/reusing_seeds - title: Improve image quality with deterministic generation - - local: using-diffusers/reproducibility - title: Create reproducible pipelines - - local: using-diffusers/custom_pipeline_examples - title: Community Pipelines - - local: using-diffusers/contribute_pipeline - title: How to contribute a Pipeline - - local: using-diffusers/using_safetensors - title: Using safetensors - - local: using-diffusers/stable_diffusion_jax_how_to - title: Stable Diffusion in JAX/Flax - - local: using-diffusers/weighted_prompts - title: Weighting Prompts - title: Pipelines for Inference - - sections: - - local: training/overview - title: Overview - - local: training/unconditional_training - title: Unconditional image generation - - local: training/text_inversion - title: Textual Inversion - - local: training/dreambooth - title: DreamBooth - - local: training/text2image - title: Text-to-image - - local: training/lora - title: Low-Rank Adaptation of Large Language Models (LoRA) - - local: training/controlnet - title: ControlNet - - local: training/instructpix2pix - title: InstructPix2Pix Training - title: Training - - sections: - - local: using-diffusers/rl - title: Reinforcement Learning - - local: using-diffusers/audio - title: Audio - - local: using-diffusers/other-modalities - title: Other Modalities - title: Taking Diffusers Beyond Images - title: Using Diffusers -- sections: - - local: optimization/opt_overview - title: Overview - - local: optimization/fp16 - title: Memory and Speed - - local: optimization/torch2.0 - title: Torch2.0 support - - local: optimization/xformers - title: xFormers - - local: optimization/onnx - title: ONNX - - local: optimization/open_vino - title: OpenVINO - - local: optimization/mps - title: MPS - - local: optimization/habana - title: Habana Gaudi - title: Optimization/Special Hardware -- sections: - - local: conceptual/philosophy - title: Philosophy - - local: using-diffusers/controlling_generation - title: Controlled generation - - local: conceptual/contribution - title: How to contribute? - - local: conceptual/ethical_guidelines - title: Diffusers' Ethical Guidelines - - local: conceptual/evaluation - title: Evaluating Diffusion Models - title: Conceptual Guides -- sections: - - sections: - - local: api/models - title: Models - - local: api/diffusion_pipeline - title: Diffusion Pipeline - - local: api/logging - title: Logging - - local: api/configuration - title: Configuration - - local: api/outputs - title: Outputs - - local: api/loaders - title: Loaders - title: Main Classes - - sections: - - local: api/pipelines/overview - title: Overview - - local: api/pipelines/alt_diffusion - title: AltDiffusion - - local: api/pipelines/audio_diffusion - title: Audio Diffusion - - local: api/pipelines/audioldm - title: AudioLDM - - local: api/pipelines/cycle_diffusion - title: Cycle Diffusion - - local: api/pipelines/dance_diffusion - title: Dance Diffusion - - local: api/pipelines/ddim - title: DDIM - - local: api/pipelines/ddpm - title: DDPM - - local: api/pipelines/dit - title: DiT - - local: api/pipelines/latent_diffusion - title: Latent Diffusion - - local: api/pipelines/paint_by_example - title: PaintByExample - - local: api/pipelines/pndm - title: PNDM - - local: api/pipelines/repaint - title: RePaint - - local: api/pipelines/stable_diffusion_safe - title: Safe Stable Diffusion - - local: api/pipelines/score_sde_ve - title: Score SDE VE - - local: api/pipelines/semantic_stable_diffusion - title: Semantic Guidance - - local: api/pipelines/spectrogram_diffusion - title: "Spectrogram Diffusion" - - sections: - - local: api/pipelines/stable_diffusion/overview - title: Overview - - local: api/pipelines/stable_diffusion/text2img - title: Text-to-Image - - local: api/pipelines/stable_diffusion/img2img - title: Image-to-Image - - local: api/pipelines/stable_diffusion/inpaint - title: Inpaint - - local: api/pipelines/stable_diffusion/depth2img - title: Depth-to-Image - - local: api/pipelines/stable_diffusion/image_variation - title: Image-Variation - - local: api/pipelines/stable_diffusion/upscale - title: Super-Resolution - - local: api/pipelines/stable_diffusion/latent_upscale - title: Stable-Diffusion-Latent-Upscaler - - local: api/pipelines/stable_diffusion/pix2pix - title: InstructPix2Pix - - local: api/pipelines/stable_diffusion/attend_and_excite - title: Attend and Excite - - local: api/pipelines/stable_diffusion/pix2pix_zero - title: Pix2Pix Zero - - local: api/pipelines/stable_diffusion/self_attention_guidance - title: Self-Attention Guidance - - local: api/pipelines/stable_diffusion/panorama - title: MultiDiffusion Panorama - - local: api/pipelines/stable_diffusion/controlnet - title: Text-to-Image Generation with ControlNet Conditioning - - local: api/pipelines/stable_diffusion/model_editing - title: Text-to-Image Model Editing - title: Stable Diffusion - - local: api/pipelines/stable_diffusion_2 - title: Stable Diffusion 2 - - local: api/pipelines/stable_unclip - title: Stable unCLIP - - local: api/pipelines/stochastic_karras_ve - title: Stochastic Karras VE - - local: api/pipelines/text_to_video - title: Text-to-Video - - local: api/pipelines/unclip - title: UnCLIP - - local: api/pipelines/latent_diffusion_uncond - title: Unconditional Latent Diffusion - - local: api/pipelines/versatile_diffusion - title: Versatile Diffusion - - local: api/pipelines/vq_diffusion - title: VQ Diffusion - title: Pipelines - - sections: - - local: api/schedulers/overview - title: Overview - - local: api/schedulers/ddim - title: DDIM - - local: api/schedulers/ddim_inverse - title: DDIMInverse - - local: api/schedulers/ddpm - title: DDPM - - local: api/schedulers/deis - title: DEIS - - local: api/schedulers/dpm_discrete - title: DPM Discrete Scheduler - - local: api/schedulers/dpm_discrete_ancestral - title: DPM Discrete Scheduler with ancestral sampling - - local: api/schedulers/euler_ancestral - title: Euler Ancestral Scheduler - - local: api/schedulers/euler - title: Euler scheduler - - local: api/schedulers/heun - title: Heun Scheduler - - local: api/schedulers/ipndm - title: IPNDM - - local: api/schedulers/lms_discrete - title: Linear Multistep - - local: api/schedulers/multistep_dpm_solver - title: Multistep DPM-Solver - - local: api/schedulers/pndm - title: PNDM - - local: api/schedulers/repaint - title: RePaint Scheduler - - local: api/schedulers/singlestep_dpm_solver - title: Singlestep DPM-Solver - - local: api/schedulers/stochastic_karras_ve - title: Stochastic Kerras VE - - local: api/schedulers/unipc - title: UniPCMultistepScheduler - - local: api/schedulers/score_sde_ve - title: VE-SDE - - local: api/schedulers/score_sde_vp - title: VP-SDE - - local: api/schedulers/vq_diffusion - title: VQDiffusionScheduler - title: Schedulers - - sections: - - local: api/experimental/rl - title: RL Planning - title: Experimental Features - title: API diff --git a/diffusers/docs/source/en/api/configuration.mdx b/diffusers/docs/source/en/api/configuration.mdx deleted file mode 100644 index 2bbb42d9253804170a2312fa01522336a5cd7307..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/configuration.mdx +++ /dev/null @@ -1,25 +0,0 @@ - - -# Configuration - -Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which conveniently takes care of storing all the parameters that are -passed to their respective `__init__` methods in a JSON-configuration file. - -## ConfigMixin - -[[autodoc]] ConfigMixin - - load_config - - from_config - - save_config - - to_json_file - - to_json_string diff --git a/diffusers/docs/source/en/api/diffusion_pipeline.mdx b/diffusers/docs/source/en/api/diffusion_pipeline.mdx deleted file mode 100644 index 280802d6a89ab8bd9181cb02d008d3f8970e220a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/diffusion_pipeline.mdx +++ /dev/null @@ -1,47 +0,0 @@ - - -# Pipelines - -The [`DiffusionPipeline`] is the easiest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) and to use it in inference. - - - - One should not use the Diffusion Pipeline class for training or fine-tuning a diffusion model. Individual - components of diffusion pipelines are usually trained individually, so we suggest to directly work - with [`UNetModel`] and [`UNetConditionModel`]. - - - -Any diffusion pipeline that is loaded with [`~DiffusionPipeline.from_pretrained`] will automatically -detect the pipeline type, *e.g.* [`StableDiffusionPipeline`] and consequently load each component of the -pipeline and pass them into the `__init__` function of the pipeline, *e.g.* [`~StableDiffusionPipeline.__init__`]. - -Any pipeline object can be saved locally with [`~DiffusionPipeline.save_pretrained`]. - -## DiffusionPipeline -[[autodoc]] DiffusionPipeline - - all - - __call__ - - device - - to - - components - -## ImagePipelineOutput -By default diffusion pipelines return an object of class - -[[autodoc]] pipelines.ImagePipelineOutput - -## AudioPipelineOutput -By default diffusion pipelines return an object of class - -[[autodoc]] pipelines.AudioPipelineOutput diff --git a/diffusers/docs/source/en/api/experimental/rl.mdx b/diffusers/docs/source/en/api/experimental/rl.mdx deleted file mode 100644 index 66c8db311b4ef8d34089ddfe31c7496b08be3416..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/experimental/rl.mdx +++ /dev/null @@ -1,15 +0,0 @@ - - -# TODO - -Coming soon! \ No newline at end of file diff --git a/diffusers/docs/source/en/api/loaders.mdx b/diffusers/docs/source/en/api/loaders.mdx deleted file mode 100644 index 1d55bd03c0641d1a63e79e2fd26c444727595b23..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/loaders.mdx +++ /dev/null @@ -1,30 +0,0 @@ - - -# Loaders - -There are many ways to train adapter neural networks for diffusion models, such as -- [Textual Inversion](./training/text_inversion.mdx) -- [LoRA](https://github.com/cloneofsimo/lora) -- [Hypernetworks](https://arxiv.org/abs/1609.09106) - -Such adapter neural networks often only consist of a fraction of the number of weights compared -to the pretrained model and as such are very portable. The Diffusers library offers an easy-to-use -API to load such adapter neural networks via the [`loaders.py` module](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders.py). - -**Note**: This module is still highly experimental and prone to future changes. - -## LoaderMixins - -### UNet2DConditionLoadersMixin - -[[autodoc]] loaders.UNet2DConditionLoadersMixin diff --git a/diffusers/docs/source/en/api/logging.mdx b/diffusers/docs/source/en/api/logging.mdx deleted file mode 100644 index b52c0434f42d06de3085f3816a9093df14ea0212..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/logging.mdx +++ /dev/null @@ -1,98 +0,0 @@ - - -# Logging - -🧨 Diffusers has a centralized logging system, so that you can setup the verbosity of the library easily. - -Currently the default verbosity of the library is `WARNING`. - -To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity -to the INFO level. - -```python -import diffusers - -diffusers.logging.set_verbosity_info() -``` - -You can also use the environment variable `DIFFUSERS_VERBOSITY` to override the default verbosity. You can set it -to one of the following: `debug`, `info`, `warning`, `error`, `critical`. For example: - -```bash -DIFFUSERS_VERBOSITY=error ./myprogram.py -``` - -Additionally, some `warnings` can be disabled by setting the environment variable -`DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like *1*. This will disable any warning that is logged using -[`logger.warning_advice`]. For example: - -```bash -DIFFUSERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py -``` - -Here is an example of how to use the same logger as the library in your own module or script: - -```python -from diffusers.utils import logging - -logging.set_verbosity_info() -logger = logging.get_logger("diffusers") -logger.info("INFO") -logger.warning("WARN") -``` - - -All the methods of this logging module are documented below, the main ones are -[`logging.get_verbosity`] to get the current level of verbosity in the logger and -[`logging.set_verbosity`] to set the verbosity to the level of your choice. In order (from the least -verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are: - -- `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` (int value, 50): only report the most - critical errors. -- `diffusers.logging.ERROR` (int value, 40): only report errors. -- `diffusers.logging.WARNING` or `diffusers.logging.WARN` (int value, 30): only reports error and - warnings. This the default level used by the library. -- `diffusers.logging.INFO` (int value, 20): reports error, warnings and basic information. -- `diffusers.logging.DEBUG` (int value, 10): report all information. - -By default, `tqdm` progress bars will be displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] can be used to suppress or unsuppress this behavior. - -## Base setters - -[[autodoc]] logging.set_verbosity_error - -[[autodoc]] logging.set_verbosity_warning - -[[autodoc]] logging.set_verbosity_info - -[[autodoc]] logging.set_verbosity_debug - -## Other functions - -[[autodoc]] logging.get_verbosity - -[[autodoc]] logging.set_verbosity - -[[autodoc]] logging.get_logger - -[[autodoc]] logging.enable_default_handler - -[[autodoc]] logging.disable_default_handler - -[[autodoc]] logging.enable_explicit_format - -[[autodoc]] logging.reset_format - -[[autodoc]] logging.enable_progress_bar - -[[autodoc]] logging.disable_progress_bar diff --git a/diffusers/docs/source/en/api/models.mdx b/diffusers/docs/source/en/api/models.mdx deleted file mode 100644 index 2361fd4f65972342f105d70d3317af126cd4e14c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/models.mdx +++ /dev/null @@ -1,107 +0,0 @@ - - -# Models - -Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models. -The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$. -The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub. - -## ModelMixin -[[autodoc]] ModelMixin - -## UNet2DOutput -[[autodoc]] models.unet_2d.UNet2DOutput - -## UNet2DModel -[[autodoc]] UNet2DModel - -## UNet1DOutput -[[autodoc]] models.unet_1d.UNet1DOutput - -## UNet1DModel -[[autodoc]] UNet1DModel - -## UNet2DConditionOutput -[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput - -## UNet2DConditionModel -[[autodoc]] UNet2DConditionModel - -## UNet3DConditionOutput -[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput - -## UNet3DConditionModel -[[autodoc]] UNet3DConditionModel - -## DecoderOutput -[[autodoc]] models.vae.DecoderOutput - -## VQEncoderOutput -[[autodoc]] models.vq_model.VQEncoderOutput - -## VQModel -[[autodoc]] VQModel - -## AutoencoderKLOutput -[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput - -## AutoencoderKL -[[autodoc]] AutoencoderKL - -## Transformer2DModel -[[autodoc]] Transformer2DModel - -## Transformer2DModelOutput -[[autodoc]] models.transformer_2d.Transformer2DModelOutput - -## TransformerTemporalModel -[[autodoc]] models.transformer_temporal.TransformerTemporalModel - -## Transformer2DModelOutput -[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput - -## PriorTransformer -[[autodoc]] models.prior_transformer.PriorTransformer - -## PriorTransformerOutput -[[autodoc]] models.prior_transformer.PriorTransformerOutput - -## ControlNetOutput -[[autodoc]] models.controlnet.ControlNetOutput - -## ControlNetModel -[[autodoc]] ControlNetModel - -## FlaxModelMixin -[[autodoc]] FlaxModelMixin - -## FlaxUNet2DConditionOutput -[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput - -## FlaxUNet2DConditionModel -[[autodoc]] FlaxUNet2DConditionModel - -## FlaxDecoderOutput -[[autodoc]] models.vae_flax.FlaxDecoderOutput - -## FlaxAutoencoderKLOutput -[[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput - -## FlaxAutoencoderKL -[[autodoc]] FlaxAutoencoderKL - -## FlaxControlNetOutput -[[autodoc]] models.controlnet_flax.FlaxControlNetOutput - -## FlaxControlNetModel -[[autodoc]] FlaxControlNetModel diff --git a/diffusers/docs/source/en/api/outputs.mdx b/diffusers/docs/source/en/api/outputs.mdx deleted file mode 100644 index 9466f354541d55e66b65ef96ae2567f881d63fc0..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/outputs.mdx +++ /dev/null @@ -1,55 +0,0 @@ - - -# BaseOutputs - -All models have outputs that are instances of subclasses of [`~utils.BaseOutput`]. Those are -data structures containing all the information returned by the model, but that can also be used as tuples or -dictionaries. - -Let's see how this looks in an example: - -```python -from diffusers import DDIMPipeline - -pipeline = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32") -outputs = pipeline() -``` - -The `outputs` object is a [`~pipelines.ImagePipelineOutput`], as we can see in the -documentation of that class below, it means it has an image attribute. - -You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get `None`: - -```python -outputs.images -``` - -or via keyword lookup - -```python -outputs["images"] -``` - -When considering our `outputs` object as tuple, it only considers the attributes that don't have `None` values. -Here for instance, we could retrieve images via indexing: - -```python -outputs[:1] -``` - -which will return the tuple `(outputs.images)` for instance. - -## BaseOutput - -[[autodoc]] utils.BaseOutput - - to_tuple diff --git a/diffusers/docs/source/en/api/pipelines/alt_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/alt_diffusion.mdx deleted file mode 100644 index dbe3b079a201638b0129087b3f0de0de22323551..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/alt_diffusion.mdx +++ /dev/null @@ -1,83 +0,0 @@ - - -# AltDiffusion - -AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu. - -The abstract of the paper is the following: - -*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.* - - -*Overview*: - -| Pipeline | Tasks | Colab | Demo -|---|---|:---:|:---:| -| [pipeline_alt_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py) | *Text-to-Image Generation* | - | - -| [pipeline_alt_diffusion_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | - |- - -## Tips - -- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./api/pipelines/stable_diffusion/overview). - -- *Run AltDiffusion* - -AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](./using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](./using-diffusers/img2img). - -- *How to load and use different schedulers.* - -The alt diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: - -```python ->>> from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler - ->>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("BAAI/AltDiffusion-m9", subfolder="scheduler") ->>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9", scheduler=euler_scheduler) -``` - - -- *How to convert all use cases with multiple or single pipeline* - -If you want to use all possible use cases in a single `DiffusionPipeline` we recommend using the `components` functionality to instantiate all components in the most memory-efficient way: - -```python ->>> from diffusers import ( -... AltDiffusionPipeline, -... AltDiffusionImg2ImgPipeline, -... ) - ->>> text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9") ->>> img2img = AltDiffusionImg2ImgPipeline(**text2img.components) - ->>> # now you can use text2img(...) and img2img(...) just like the call methods of each respective pipeline -``` - -## AltDiffusionPipelineOutput -[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput - - all - - __call__ - -## AltDiffusionPipeline -[[autodoc]] AltDiffusionPipeline - - all - - __call__ - -## AltDiffusionImg2ImgPipeline -[[autodoc]] AltDiffusionImg2ImgPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/audio_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/audio_diffusion.mdx deleted file mode 100644 index 9c7725367e8fd5eedae1fbd1412f43af76a1cf59..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/audio_diffusion.mdx +++ /dev/null @@ -1,98 +0,0 @@ - - -# Audio Diffusion - -## Overview - -[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith. - -Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to -and from mel spectrogram images. - -The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including -training scripts and example notebooks. - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) | - - -## Examples: - -### Audio Diffusion - -```python -import torch -from IPython.display import Audio -from diffusers import DiffusionPipeline - -device = "cuda" if torch.cuda.is_available() else "cpu" -pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device) - -output = pipe() -display(output.images[0]) -display(Audio(output.audios[0], rate=mel.get_sample_rate())) -``` - -### Latent Audio Diffusion - -```python -import torch -from IPython.display import Audio -from diffusers import DiffusionPipeline - -device = "cuda" if torch.cuda.is_available() else "cpu" -pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device) - -output = pipe() -display(output.images[0]) -display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) -``` - -### Audio Diffusion with DDIM (faster) - -```python -import torch -from IPython.display import Audio -from diffusers import DiffusionPipeline - -device = "cuda" if torch.cuda.is_available() else "cpu" -pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device) - -output = pipe() -display(output.images[0]) -display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) -``` - -### Variations, in-painting, out-painting etc. - -```python -output = pipe( - raw_audio=output.audios[0, 0], - start_step=int(pipe.get_default_steps() / 2), - mask_start_secs=1, - mask_end_secs=1, -) -display(output.images[0]) -display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) -``` - -## AudioDiffusionPipeline -[[autodoc]] AudioDiffusionPipeline - - all - - __call__ - -## Mel -[[autodoc]] Mel diff --git a/diffusers/docs/source/en/api/pipelines/audioldm.mdx b/diffusers/docs/source/en/api/pipelines/audioldm.mdx deleted file mode 100644 index f3987d2263ac649ee5a0c89a2152d54db5d9a323..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/audioldm.mdx +++ /dev/null @@ -1,82 +0,0 @@ - - -# AudioLDM - -## Overview - -AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al. - -Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM -is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap) -latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional -sound effects, human speech and music. - -This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found [here](https://github.com/haoheliu/AudioLDM). - -## Text-to-Audio - -The [`AudioLDMPipeline`] can be used to load pre-trained weights from [cvssp/audioldm](https://huggingface.co/cvssp/audioldm) and generate text-conditional audio outputs: - -```python -from diffusers import AudioLDMPipeline -import torch -import scipy - -repo_id = "cvssp/audioldm" -pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16) -pipe = pipe.to("cuda") - -prompt = "Techno music with a strong, upbeat tempo and high melodic riffs" -audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0] - -# save the audio sample as a .wav file -scipy.io.wavfile.write("techno.wav", rate=16000, data=audio) -``` - -### Tips - -Prompts: -* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream"). -* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with. - -Inference: -* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference. -* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument. - -### How to load and use different schedulers - -The AudioLDM pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers -that can be used with the AudioLDM pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], -[`EulerAncestralDiscreteScheduler`] etc. We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest -scheduler there is. - -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] -method, or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the -[`DPMSolverMultistepScheduler`], you can do the following: - -```python ->>> from diffusers import AudioLDMPipeline, DPMSolverMultistepScheduler ->>> import torch - ->>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", torch_dtype=torch.float16) ->>> pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> dpm_scheduler = DPMSolverMultistepScheduler.from_pretrained("cvssp/audioldm", subfolder="scheduler") ->>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", scheduler=dpm_scheduler, torch_dtype=torch.float16) -``` - -## AudioLDMPipeline -[[autodoc]] AudioLDMPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/cycle_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/cycle_diffusion.mdx deleted file mode 100644 index b8fbff5d7157dc08cf15ea051f4d019b74c39ff5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/cycle_diffusion.mdx +++ /dev/null @@ -1,100 +0,0 @@ - - -# Cycle Diffusion - -## Overview - -Cycle Diffusion is a Text-Guided Image-to-Image Generation model proposed in [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://arxiv.org/abs/2210.05559) by Chen Henry Wu, Fernando De la Torre. - -The abstract of the paper is the following: - -*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.* - -*Tips*: -- The Cycle Diffusion pipeline is fully compatible with any [Stable Diffusion](./stable_diffusion) checkpoints -- Currently Cycle Diffusion only works with the [`DDIMScheduler`]. - -*Example*: - -In the following we should how to best use the [`CycleDiffusionPipeline`] - -```python -import requests -import torch -from PIL import Image -from io import BytesIO - -from diffusers import CycleDiffusionPipeline, DDIMScheduler - -# load the pipeline -# make sure you're logged in with `huggingface-cli login` -model_id_or_path = "CompVis/stable-diffusion-v1-4" -scheduler = DDIMScheduler.from_pretrained(model_id_or_path, subfolder="scheduler") -pipe = CycleDiffusionPipeline.from_pretrained(model_id_or_path, scheduler=scheduler).to("cuda") - -# let's download an initial image -url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/An%20astronaut%20riding%20a%20horse.png" -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image = init_image.resize((512, 512)) -init_image.save("horse.png") - -# let's specify a prompt -source_prompt = "An astronaut riding a horse" -prompt = "An astronaut riding an elephant" - -# call the pipeline -image = pipe( - prompt=prompt, - source_prompt=source_prompt, - image=init_image, - num_inference_steps=100, - eta=0.1, - strength=0.8, - guidance_scale=2, - source_guidance_scale=1, -).images[0] - -image.save("horse_to_elephant.png") - -# let's try another example -# See more samples at the original repo: https://github.com/ChenWu98/cycle-diffusion -url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/A%20black%20colored%20car.png" -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image = init_image.resize((512, 512)) -init_image.save("black.png") - -source_prompt = "A black colored car" -prompt = "A blue colored car" - -# call the pipeline -torch.manual_seed(0) -image = pipe( - prompt=prompt, - source_prompt=source_prompt, - image=init_image, - num_inference_steps=100, - eta=0.1, - strength=0.85, - guidance_scale=3, - source_guidance_scale=1, -).images[0] - -image.save("black_to_blue.png") -``` - -## CycleDiffusionPipeline -[[autodoc]] CycleDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/dance_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/dance_diffusion.mdx deleted file mode 100644 index 92b5b9f877bc8474ad61d5f6815615e3922e23b8..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/dance_diffusion.mdx +++ /dev/null @@ -1,34 +0,0 @@ - - -# Dance Diffusion - -## Overview - -[Dance Diffusion](https://github.com/Harmonai-org/sample-generator) by Zach Evans. - -Dance Diffusion is the first in a suite of generative audio tools for producers and musicians to be released by Harmonai. -For more info or to get involved in the development of these tools, please visit https://harmonai.org and fill out the form on the front page. - -The original codebase of this implementation can be found [here](https://github.com/Harmonai-org/sample-generator). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_dance_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dance_diffusion/pipeline_dance_diffusion.py) | *Unconditional Audio Generation* | - | - - -## DanceDiffusionPipeline -[[autodoc]] DanceDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/ddim.mdx b/diffusers/docs/source/en/api/pipelines/ddim.mdx deleted file mode 100644 index 3adcb375b4481b0047479929c9cd1f89034aae99..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/ddim.mdx +++ /dev/null @@ -1,36 +0,0 @@ - - -# DDIM - -## Overview - -[Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon. - -The abstract of the paper is the following: - -Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space. - -The original codebase of this paper can be found here: [ermongroup/ddim](https://github.com/ermongroup/ddim). -For questions, feel free to contact the author on [tsong.me](https://tsong.me/). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim/pipeline_ddim.py) | *Unconditional Image Generation* | - | - - -## DDIMPipeline -[[autodoc]] DDIMPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/ddpm.mdx b/diffusers/docs/source/en/api/pipelines/ddpm.mdx deleted file mode 100644 index 1be71964041c7bce5300d7177657594a90cdbf2f..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/ddpm.mdx +++ /dev/null @@ -1,37 +0,0 @@ - - -# DDPM - -## Overview - -[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) - (DDPM) by Jonathan Ho, Ajay Jain and Pieter Abbeel proposes the diffusion based model of the same name, but in the context of the 🤗 Diffusers library, DDPM refers to the discrete denoising scheduler from the paper as well as the pipeline. - -The abstract of the paper is the following: - -We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. - -The original codebase of this paper can be found [here](https://github.com/hojonathanho/diffusion). - - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm/pipeline_ddpm.py) | *Unconditional Image Generation* | - | - - -# DDPMPipeline -[[autodoc]] DDPMPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/dit.mdx b/diffusers/docs/source/en/api/pipelines/dit.mdx deleted file mode 100644 index ce96749a1720ba3ee0da67728cd702292f6b6637..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/dit.mdx +++ /dev/null @@ -1,59 +0,0 @@ - - -# Scalable Diffusion Models with Transformers (DiT) - -## Overview - -[Scalable Diffusion Models with Transformers](https://arxiv.org/abs/2212.09748) (DiT) by William Peebles and Saining Xie. - -The abstract of the paper is the following: - -*We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.* - -The original codebase of this paper can be found here: [facebookresearch/dit](https://github.com/facebookresearch/dit). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_dit.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dit/pipeline_dit.py) | *Conditional Image Generation* | - | - - -## Usage example - -```python -from diffusers import DiTPipeline, DPMSolverMultistepScheduler -import torch - -pipe = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16) -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -pipe = pipe.to("cuda") - -# pick words from Imagenet class labels -pipe.labels # to print all available words - -# pick words that exist in ImageNet -words = ["white shark", "umbrella"] - -class_ids = pipe.get_label_ids(words) - -generator = torch.manual_seed(33) -output = pipe(class_labels=class_ids, num_inference_steps=25, generator=generator) - -image = output.images[0] # label 'white shark' -``` - -## DiTPipeline -[[autodoc]] DiTPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/latent_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/latent_diffusion.mdx deleted file mode 100644 index 72c159e90d92ac31ccaeda3869687313ae0593ed..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/latent_diffusion.mdx +++ /dev/null @@ -1,49 +0,0 @@ - - -# Latent Diffusion - -## Overview - -Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer. - -The abstract of the paper is the following: - -*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.* - -The original codebase can be found [here](https://github.com/CompVis/latent-diffusion). - -## Tips: - -- -- -- - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) | *Text-to-Image Generation* | - | -| [pipeline_latent_diffusion_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion_superresolution.py) | *Super Resolution* | - | - -## Examples: - - -## LDMTextToImagePipeline -[[autodoc]] LDMTextToImagePipeline - - all - - __call__ - -## LDMSuperResolutionPipeline -[[autodoc]] LDMSuperResolutionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/latent_diffusion_uncond.mdx b/diffusers/docs/source/en/api/pipelines/latent_diffusion_uncond.mdx deleted file mode 100644 index c293ebb9400e7235ac79b510430fbf3662ed2240..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/latent_diffusion_uncond.mdx +++ /dev/null @@ -1,42 +0,0 @@ - - -# Unconditional Latent Diffusion - -## Overview - -Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer. - -The abstract of the paper is the following: - -*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.* - -The original codebase can be found [here](https://github.com/CompVis/latent-diffusion). - -## Tips: - -- -- -- - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_latent_diffusion_uncond.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py) | *Unconditional Image Generation* | - | - -## Examples: - -## LDMPipeline -[[autodoc]] LDMPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/overview.mdx b/diffusers/docs/source/en/api/pipelines/overview.mdx deleted file mode 100644 index 3b0e7c66152f5506418e3cfe9aa1861fa7e7e20b..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/overview.mdx +++ /dev/null @@ -1,213 +0,0 @@ - - -# Pipelines - -Pipelines provide a simple way to run state-of-the-art diffusion models in inference. -Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler -components - all of which are needed to have a functioning end-to-end diffusion system. - -As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models: -- [Autoencoder](./api/models#vae) -- [Conditional Unet](./api/models#UNet2DConditionModel) -- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel) -- a scheduler component, [scheduler](./api/scheduler#pndm), -- a [CLIPImageProcessor](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor), -- as well as a [safety checker](./stable_diffusion#safety_checker). -All of these components are necessary to run stable diffusion in inference even though they were trained -or created independently from each other. - -To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. -More specifically, we strive to provide pipelines that -- 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implementation according to the corresponding paper (*e.g.* [LDMTextToImagePipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), -- 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), -- 3. are easy to understand with code that is self-explanatory and can be read along-side the official paper (see [Pipelines summary](#pipelines-summary)), -- 4. can easily be contributed by the community (see the [Contribution](#contribution) section). - -**Note** that pipelines do not (and should not) offer any training functionality. -If you are looking for *official* training examples, please have a look at [examples](https://github.com/huggingface/diffusers/tree/main/examples). - -## 🧨 Diffusers Summary - -The following table summarizes all officially supported pipelines, their corresponding paper, and if -available a colab notebook to directly try them out. - - -| Pipeline | Paper | Tasks | Colab -|---|---|:---:|:---:| -| [alt_diffusion](./alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | - -| [audio_diffusion](./audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio_diffusion.git) | Unconditional Audio Generation | -| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb) -| [cycle_diffusion](./cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | -| [dance_diffusion](./dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | -| [ddpm](./ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | -| [ddim](./ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | -| [latent_diffusion](./latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | -| [latent_diffusion](./latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | -| [latent_diffusion_uncond](./latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | -| [paint_by_example](./paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | -| [pndm](./pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | -| [score_sde_ve](./score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [score_sde_vp](./score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [semantic_stable_diffusion](./semantic_stable_diffusion) | [**SEGA: Instructing Diffusion using Semantic Dimensions**](https://arxiv.org/abs/2301.12247) | Text-to-Image Generation | -| [stable_diffusion_text2img](./stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [stable_diffusion_img2img](./stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) -| [stable_diffusion_inpaint](./stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) -| [stable_diffusion_panorama](./stable_diffusion/panorama) | [**MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation**](https://arxiv.org/abs/2302.08113) | Text-Guided Panorama View Generation | -| [stable_diffusion_pix2pix](./stable_diffusion/pix2pix) | [**InstructPix2Pix: Learning to Follow Image Editing Instructions**](https://arxiv.org/abs/2211.09800) | Text-Based Image Editing | -| [stable_diffusion_pix2pix_zero](./stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://arxiv.org/abs/2302.03027) | Text-Based Image Editing | -| [stable_diffusion_attend_and_excite](./stable_diffusion/attend_and_excite) | [**Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models**](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation | -| [stable_diffusion_self_attention_guidance](./stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation | -| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation | -| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_2](./stable_diffusion_2/) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | -| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | -| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Depth-to-Image Text-Guided Generation | -| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_safe](./stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) -| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation | -| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation | -| [stochastic_karras_ve](./stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | -| [text_to_video_sd](./api/pipelines/text_to_video) | [Modelscope's Text-to-video-synthesis Model in Open Domain](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) | Text-to-Video Generation | -| [unclip](./unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation | -| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | -| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | -| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | -| [vq_diffusion](./vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | - - -**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. - -However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below. - -## Pipelines API - -Diffusion models often consist of multiple independently-trained models or other previously existing components. - - -Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. -During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: - -- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.* -"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be -loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. -- [`save_pretrained`](../diffusion_pipeline) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. -In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated -from the local path. -- [`to`](../diffusion_pipeline) which accepts a `string` or `torch.device` to move all models that are of type `torch.nn.Module` to the passed device. The behavior is fully analogous to [PyTorch's `to` method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to). -- [`__call__`] method to use the pipeline in inference. `__call__` defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](./stable_diffusion) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for -each pipeline, one should look directly into the respective pipeline. - -**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should -not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). - -## Contribution - -We are more than happy about any contribution to the officially supported pipelines 🤗. We aspire -all of our pipelines to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. - -- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the [`DiffusionPipeline` class](.../diffusion_pipeline) or be directly attached to the model and scheduler components of the pipeline. -- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and -use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most -logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method. -- **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. We try to make the pipeline code as readable as possible so that each part –from pre-processing to diffusing to post-processing– can easily be adapted. If you would like the community to benefit from your customized pipeline, we would love to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](./overview) would be even better. -- **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. - -## Examples - -### Text-to-Image generation with Stable Diffusion - -```python -# make sure you're logged in with `huggingface-cli login` -from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler - -pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -pipe = pipe.to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] - -image.save("astronaut_rides_horse.png") -``` - -### Image-to-Image text-guided generation with Stable Diffusion - -The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images. - -```python -import requests -from PIL import Image -from io import BytesIO - -from diffusers import StableDiffusionImg2ImgPipeline - -# load the pipeline -device = "cuda" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to( - device -) - -# let's download an initial image -url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image = init_image.resize((768, 512)) - -prompt = "A fantasy landscape, trending on artstation" - -images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images - -images[0].save("fantasy_landscape.png") -``` -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) - -### Tweak prompts reusing seeds and latents - -You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) - - -### In-painting using Stable Diffusion - -The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt. - -```python -import PIL -import requests -import torch -from io import BytesIO - -from diffusers import StableDiffusionInpaintPipeline - - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) - -pipe = StableDiffusionInpaintPipeline.from_pretrained( - "runwayml/stable-diffusion-inpainting", - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") - -prompt = "Face of a yellow cat, high resolution, sitting on a park bench" -image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0] -``` - -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) diff --git a/diffusers/docs/source/en/api/pipelines/paint_by_example.mdx b/diffusers/docs/source/en/api/pipelines/paint_by_example.mdx deleted file mode 100644 index 5abb3406db448fdbeab14b2626bd17621214d819..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/paint_by_example.mdx +++ /dev/null @@ -1,74 +0,0 @@ - - -# PaintByExample - -## Overview - -[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen. - -The abstract of the paper is the following: - -*Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose an information bottleneck and strong augmentations to avoid the trivial solution of directly copying and pasting the exemplar image. Meanwhile, to ensure the controllability of the editing process, we design an arbitrary shape mask for the exemplar image and leverage the classifier-free guidance to increase the similarity to the exemplar image. The whole framework involves a single forward of the diffusion model without any iterative optimization. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.* - -The original codebase can be found [here](https://github.com/Fantasy-Studio/Paint-by-Example). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_paint_by_example.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/paint_by_example/pipeline_paint_by_example.py) | *Image-Guided Image Painting* | - | - -## Tips - -- PaintByExample is supported by the official [Fantasy-Studio/Paint-by-Example](https://huggingface.co/Fantasy-Studio/Paint-by-Example) checkpoint. The checkpoint has been warm-started from the [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and with the objective to inpaint partly masked images conditioned on example / reference images -- To quickly demo *PaintByExample*, please have a look at [this demo](https://huggingface.co/spaces/Fantasy-Studio/Paint-by-Example) -- You can run the following code snippet as an example: - - -```python -# !pip install diffusers transformers - -import PIL -import requests -import torch -from io import BytesIO -from diffusers import DiffusionPipeline - - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png" -mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png" -example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) -example_image = download_image(example_url).resize((512, 512)) - -pipe = DiffusionPipeline.from_pretrained( - "Fantasy-Studio/Paint-by-Example", - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") - -image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0] -image -``` - -## PaintByExamplePipeline -[[autodoc]] PaintByExamplePipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/pndm.mdx b/diffusers/docs/source/en/api/pipelines/pndm.mdx deleted file mode 100644 index 43625fdfbe5206e01dcb11c85e86d31737d3c6ee..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/pndm.mdx +++ /dev/null @@ -1,35 +0,0 @@ - - -# PNDM - -## Overview - -[Pseudo Numerical methods for Diffusion Models on manifolds](https://arxiv.org/abs/2202.09778) (PNDM) by Luping Liu, Yi Ren, Zhijie Lin and Zhou Zhao. - -The abstract of the paper is the following: - -Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while keeping the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules. - -The original codebase can be found [here](https://github.com/luping-liu/PNDM). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm/pipeline_pndm.py) | *Unconditional Image Generation* | - | - - -## PNDMPipeline -[[autodoc]] PNDMPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/repaint.mdx b/diffusers/docs/source/en/api/pipelines/repaint.mdx deleted file mode 100644 index 927398d0bf54119684dae652ce9d2c86ba34bc5c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/repaint.mdx +++ /dev/null @@ -1,77 +0,0 @@ - - -# RePaint - -## Overview - -[RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2201.09865) (PNDM) by Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool. - -The abstract of the paper is the following: - -Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. -RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. - -The original codebase can be found [here](https://github.com/andreas128/RePaint). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|-------------------------------------------------------------------------------------------------------------------------------|--------------------|:---:| -| [pipeline_repaint.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/repaint/pipeline_repaint.py) | *Image Inpainting* | - | - -## Usage example - -```python -from io import BytesIO - -import torch - -import PIL -import requests -from diffusers import RePaintPipeline, RePaintScheduler - - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png" -mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png" - -# Load the original image and the mask as PIL images -original_image = download_image(img_url).resize((256, 256)) -mask_image = download_image(mask_url).resize((256, 256)) - -# Load the RePaint scheduler and pipeline based on a pretrained DDPM model -scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256") -pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler) -pipe = pipe.to("cuda") - -generator = torch.Generator(device="cuda").manual_seed(0) -output = pipe( - original_image=original_image, - mask_image=mask_image, - num_inference_steps=250, - eta=0.0, - jump_length=10, - jump_n_sample=10, - generator=generator, -) -inpainted_image = output.images[0] -``` - -## RePaintPipeline -[[autodoc]] RePaintPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/score_sde_ve.mdx b/diffusers/docs/source/en/api/pipelines/score_sde_ve.mdx deleted file mode 100644 index 42253e301f4eaf0d34976439e5539201eb257237..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/score_sde_ve.mdx +++ /dev/null @@ -1,36 +0,0 @@ - - -# Score SDE VE - -## Overview - -[Score-Based Generative Modeling through Stochastic Differential Equations](https://arxiv.org/abs/2011.13456) (Score SDE) by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon and Ben Poole. - -The abstract of the paper is the following: - -Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model. - -The original codebase can be found [here](https://github.com/yang-song/score_sde_pytorch). - -This pipeline implements the Variance Expanding (VE) variant of the method. - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_score_sde_ve.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve/pipeline_score_sde_ve.py) | *Unconditional Image Generation* | - | - -## ScoreSdeVePipeline -[[autodoc]] ScoreSdeVePipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/semantic_stable_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/semantic_stable_diffusion.mdx deleted file mode 100644 index b4562cf0c389bb917b3f075f279c347442dfdfa9..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/semantic_stable_diffusion.mdx +++ /dev/null @@ -1,79 +0,0 @@ - - -# Semantic Guidance - -Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://arxiv.org/abs/2301.12247) and provides strong semantic control over the image generation. -Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, and stay true to the original image composition. - -The abstract of the paper is the following: - -*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.* - - -*Overview*: - -| Pipeline | Tasks | Colab | Demo -|---|---|:---:|:---:| -| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA) - -## Tips - -- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./stable_diffusion/text2img) checkpoint. - -### Run Semantic Guidance - -The interface of [`SemanticStableDiffusionPipeline`] provides several additional parameters to influence the image generation. -Exemplary usage may look like this: - -```python -import torch -from diffusers import SemanticStableDiffusionPipeline - -pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) -pipe = pipe.to("cuda") - -out = pipe( - prompt="a photo of the face of a woman", - num_images_per_prompt=1, - guidance_scale=7, - editing_prompt=[ - "smiling, smile", # Concepts to apply - "glasses, wearing glasses", - "curls, wavy hair, curly hair", - "beard, full beard, mustache", - ], - reverse_editing_direction=[False, False, False, False], # Direction of guidance i.e. increase all concepts - edit_warmup_steps=[10, 10, 10, 10], # Warmup period for each concept - edit_guidance_scale=[4, 5, 5, 5.4], # Guidance scale for each concept - edit_threshold=[ - 0.99, - 0.975, - 0.925, - 0.96, - ], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions - edit_momentum_scale=0.3, # Momentum scale that will be added to the latent guidance - edit_mom_beta=0.6, # Momentum beta - edit_weights=[1, 1, 1, 1, 1], # Weights of the individual concepts against each other -) -``` - -For more examples check the Colab notebook. - -## StableDiffusionSafePipelineOutput -[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput - - all - -## SemanticStableDiffusionPipeline -[[autodoc]] SemanticStableDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/spectrogram_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/spectrogram_diffusion.mdx deleted file mode 100644 index c98300fe791f054807665373b70d6526b5219682..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/spectrogram_diffusion.mdx +++ /dev/null @@ -1,54 +0,0 @@ - - -# Multi-instrument Music Synthesis with Spectrogram Diffusion - -## Overview - -[Spectrogram Diffusion](https://arxiv.org/abs/2206.05408) by Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, and Jesse Engel. - -An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fréchet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes. - -The original codebase of this implementation can be found at [magenta/music-spectrogram-diffusion](https://github.com/magenta/music-spectrogram-diffusion). - -## Model - -![img](https://storage.googleapis.com/music-synthesis-with-spectrogram-diffusion/architecture.png) - -As depicted above the model takes as input a MIDI file and tokenizes it into a sequence of 5 second intervals. Each tokenized interval then together with positional encodings is passed through the Note Encoder and its representation is concatenated with the previous window's generated spectrogram representation obtained via the Context Encoder. For the initial 5 second window this is set to zero. The resulting context is then used as conditioning to sample the denoised Spectrogram from the MIDI window and we concatenate this spectrogram to the final output as well as use it for the context of the next MIDI window. The process repeats till we have gone over all the MIDI inputs. Finally a MelGAN decoder converts the potentially long spectrogram to audio which is the final result of this pipeline. - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_spectrogram_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion) | *Unconditional Audio Generation* | - | - - -## Example usage - -```python -from diffusers import SpectrogramDiffusionPipeline, MidiProcessor - -pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion") -pipe = pipe.to("cuda") -processor = MidiProcessor() - -# Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid -output = pipe(processor("beethoven_hammerklavier_2.mid")) - -audio = output.audios[0] -``` - -## SpectrogramDiffusionPipeline -[[autodoc]] SpectrogramDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/attend_and_excite.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/attend_and_excite.mdx deleted file mode 100644 index 1a329bc442e7bb6f5e20d60a679c12acf1855c90..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/attend_and_excite.mdx +++ /dev/null @@ -1,75 +0,0 @@ - - -# Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models - -## Overview - -Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over the image generation. - -The abstract of the paper is the following: - -*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.* - -Resources - -* [Project Page](https://attendandexcite.github.io/Attend-and-Excite/) -* [Paper](https://arxiv.org/abs/2301.13826) -* [Original Code](https://github.com/AttendAndExcite/Attend-and-Excite) -* [Demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite) - - -## Available Pipelines: - -| Pipeline | Tasks | Colab | Demo -|---|---|:---:|:---:| -| [pipeline_semantic_stable_diffusion_attend_and_excite.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_semantic_stable_diffusion_attend_and_excite) | *Text-to-Image Generation* | - | https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite - - -### Usage example - - -```python -import torch -from diffusers import StableDiffusionAttendAndExcitePipeline - -model_id = "CompVis/stable-diffusion-v1-4" -pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") -pipe = pipe.to("cuda") - -prompt = "a cat and a frog" - -# use get_indices function to find out indices of the tokens you want to alter -pipe.get_indices(prompt) - -token_indices = [2, 5] -seed = 6141 -generator = torch.Generator("cuda").manual_seed(seed) - -images = pipe( - prompt=prompt, - token_indices=token_indices, - guidance_scale=7.5, - generator=generator, - num_inference_steps=50, - max_iter_to_alter=25, -).images - -image = images[0] -image.save(f"../images/{prompt}_{seed}.png") -``` - - -## StableDiffusionAttendAndExcitePipeline -[[autodoc]] StableDiffusionAttendAndExcitePipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx deleted file mode 100644 index 5a4cfa41ca43d7fe0cf6f12fc7e8c155af92a960..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx +++ /dev/null @@ -1,280 +0,0 @@ - - -# Text-to-Image Generation with ControlNet Conditioning - -## Overview - -[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) by Lvmin Zhang and Maneesh Agrawala. - -Using the pretrained models we can provide control images (for example, a depth map) to control Stable Diffusion text-to-image generation so that it follows the structure of the depth image and fills in the details. - -The abstract of the paper is the following: - -*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.* - -This model was contributed by the amazing community contributor [takuma104](https://huggingface.co/takuma104) ❤️ . - -Resources: - -* [Paper](https://arxiv.org/abs/2302.05543) -* [Original Code](https://github.com/lllyasviel/ControlNet) - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py) | *Text-to-Image Generation with ControlNet Conditioning* | [Colab Example](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb) - -## Usage example - -In the following we give a simple example of how to use a *ControlNet* checkpoint with Diffusers for inference. -The inference pipeline is the same for all pipelines: - -* 1. Take an image and run it through a pre-conditioning processor. -* 2. Run the pre-processed image through the [`StableDiffusionControlNetPipeline`]. - -Let's have a look at a simple example using the [Canny Edge ControlNet](https://huggingface.co/lllyasviel/sd-controlnet-canny). - -```python -from diffusers import StableDiffusionControlNetPipeline -from diffusers.utils import load_image - -# Let's load the popular vermeer image -image = load_image( - "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png" -) -``` - -![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png) - -Next, we process the image to get the canny image. This is step *1.* - running the pre-conditioning processor. The pre-conditioning processor is different for every ControlNet. Please see the model cards of the [official checkpoints](#controlnet-with-stable-diffusion-1.5) for more information about other models. - -First, we need to install opencv: - -``` -pip install opencv-contrib-python -``` - -Next, let's also install all required Hugging Face libraries: - -``` -pip install diffusers transformers git+https://github.com/huggingface/accelerate.git -``` - -Then we can retrieve the canny edges of the image. - -```python -import cv2 -from PIL import Image -import numpy as np - -image = np.array(image) - -low_threshold = 100 -high_threshold = 200 - -image = cv2.Canny(image, low_threshold, high_threshold) -image = image[:, :, None] -image = np.concatenate([image, image, image], axis=2) -canny_image = Image.fromarray(image) -``` - -Let's take a look at the processed image. - -![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png) - -Now, we load the official [Stable Diffusion 1.5 Model](runwayml/stable-diffusion-v1-5) as well as the ControlNet for canny edges. - -```py -from diffusers import StableDiffusionControlNetPipeline, ControlNetModel -import torch - -controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16) -pipe = StableDiffusionControlNetPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 -) -``` - -To speed-up things and reduce memory, let's enable model offloading and use the fast [`UniPCMultistepScheduler`]. - -```py -from diffusers import UniPCMultistepScheduler - -pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) - -# this command loads the individual model components on GPU on-demand. -pipe.enable_model_cpu_offload() -``` - -Finally, we can run the pipeline: - -```py -generator = torch.manual_seed(0) - -out_image = pipe( - "disco dancer with colorful lights", num_inference_steps=20, generator=generator, image=canny_image -).images[0] -``` - -This should take only around 3-4 seconds on GPU (depending on hardware). The output image then looks as follows: - -![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_disco_dancing.png) - - -**Note**: To see how to run all other ControlNet checkpoints, please have a look at [ControlNet with Stable Diffusion 1.5](#controlnet-with-stable-diffusion-1.5). - - - -## Combining multiple conditionings - -Multiple ControlNet conditionings can be combined for a single image generation. Pass a list of ControlNets to the pipeline's constructor and a corresponding list of conditionings to `__call__`. - -When combining conditionings, it is helpful to mask conditionings such that they do not overlap. In the example, we mask the middle of the canny map where the pose conditioning is located. - -It can also be helpful to vary the `controlnet_conditioning_scales` to emphasize one conditioning over the other. - -### Canny conditioning - -The original image: - - - -Prepare the conditioning: - -```python -from diffusers.utils import load_image -from PIL import Image -import cv2 -import numpy as np -from diffusers.utils import load_image - -canny_image = load_image( - "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png" -) -canny_image = np.array(canny_image) - -low_threshold = 100 -high_threshold = 200 - -canny_image = cv2.Canny(canny_image, low_threshold, high_threshold) - -# zero out middle columns of image where pose will be overlayed -zero_start = canny_image.shape[1] // 4 -zero_end = zero_start + canny_image.shape[1] // 2 -canny_image[:, zero_start:zero_end] = 0 - -canny_image = canny_image[:, :, None] -canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2) -canny_image = Image.fromarray(canny_image) -``` - - - -### Openpose conditioning - -The original image: - - - -Prepare the conditioning: - -```python -from controlnet_aux import OpenposeDetector -from diffusers.utils import load_image - -openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet") - -openpose_image = load_image( - "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png" -) -openpose_image = openpose(openpose_image) -``` - - - -### Running ControlNet with multiple conditionings - -```python -from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler -import torch - -controlnet = [ - ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16), - ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16), -] - -pipe = StableDiffusionControlNetPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 -) -pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) - -pipe.enable_xformers_memory_efficient_attention() -pipe.enable_model_cpu_offload() - -prompt = "a giant standing in a fantasy landscape, best quality" -negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality" - -generator = torch.Generator(device="cpu").manual_seed(1) - -images = [openpose_image, canny_image] - -image = pipe( - prompt, - images, - num_inference_steps=20, - generator=generator, - negative_prompt=negative_prompt, - controlnet_conditioning_scale=[1.0, 0.8], -).images[0] - -image.save("./multi_controlnet_output.png") -``` - - - -## Available checkpoints - -ControlNet requires a *control image* in addition to the text-to-image *prompt*. -Each pretrained model is trained using a different conditioning method that requires different images for conditioning the generated outputs. For example, Canny edge conditioning requires the control image to be the output of a Canny filter, while depth conditioning requires the control image to be a depth map. See the overview and image examples below to know more. - -All checkpoints can be found under the authors' namespace [lllyasviel](https://huggingface.co/lllyasviel). - -### ControlNet with Stable Diffusion 1.5 - -| Model Name | Control Image Overview| Control Image Example | Generated Image Example | -|---|---|---|---| -|[lllyasviel/sd-controlnet-canny](https://huggingface.co/lllyasviel/sd-controlnet-canny)
*Trained with canny edge detection* | A monochrome image with white edges on a black background.||| -|[lllyasviel/sd-controlnet-depth](https://huggingface.co/lllyasviel/sd-controlnet-depth)
*Trained with Midas depth estimation* |A grayscale image with black representing deep areas and white representing shallow areas.||| -|[lllyasviel/sd-controlnet-hed](https://huggingface.co/lllyasviel/sd-controlnet-hed)
*Trained with HED edge detection (soft edge)* |A monochrome image with white soft edges on a black background.|| | -|[lllyasviel/sd-controlnet-mlsd](https://huggingface.co/lllyasviel/sd-controlnet-mlsd)
*Trained with M-LSD line detection* |A monochrome image composed only of white straight lines on a black background.||| -|[lllyasviel/sd-controlnet-normal](https://huggingface.co/lllyasviel/sd-controlnet-normal)
*Trained with normal map* |A [normal mapped](https://en.wikipedia.org/wiki/Normal_mapping) image.||| -|[lllyasviel/sd-controlnet-openpose](https://huggingface.co/lllyasviel/sd-controlnet_openpose)
*Trained with OpenPose bone image* |A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.||| -|[lllyasviel/sd-controlnet-scribble](https://huggingface.co/lllyasviel/sd-controlnet_scribble)
*Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|| | -|[lllyasviel/sd-controlnet-seg](https://huggingface.co/lllyasviel/sd-controlnet_seg)
*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|| | - -## StableDiffusionControlNetPipeline -[[autodoc]] StableDiffusionControlNetPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_vae_slicing - - disable_vae_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - -## FlaxStableDiffusionControlNetPipeline -[[autodoc]] FlaxStableDiffusionControlNetPipeline - - all - - __call__ - diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx deleted file mode 100644 index c46576ff288757a316a5efa0ec3b753fd9ce2bd4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx +++ /dev/null @@ -1,33 +0,0 @@ - - -# Depth-to-Image Generation - -## StableDiffusionDepth2ImgPipeline - -The depth-guided stable diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), and [LAION](https://laion.ai/), as part of Stable Diffusion 2.0. It uses [MiDas](https://github.com/isl-org/MiDaS) to infer depth based on an image. - -[`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images as well as a `depth_map` to preserve the images’ structure. - -The original codebase can be found here: -- *Stable Diffusion v2*: [Stability-AI/stablediffusion](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) - -Available Checkpoints are: -- *stable-diffusion-2-depth*: [stabilityai/stable-diffusion-2-depth](https://huggingface.co/stabilityai/stable-diffusion-2-depth) - -[[autodoc]] StableDiffusionDepth2ImgPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/image_variation.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/image_variation.mdx deleted file mode 100644 index 8ca69ff69aec6a74e22beade70b5ef2ef42a0e3c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/image_variation.mdx +++ /dev/null @@ -1,31 +0,0 @@ - - -# Image Variation - -## StableDiffusionImageVariationPipeline - -[`StableDiffusionImageVariationPipeline`] lets you generate variations from an input image using Stable Diffusion. It uses a fine-tuned version of Stable Diffusion model, trained by [Justin Pinkney](https://www.justinpinkney.com/) (@Buntworthy) at [Lambda](https://lambdalabs.com/). - -The original codebase can be found here: -[Stable Diffusion Image Variations](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) - -Available Checkpoints are: -- *sd-image-variations-diffusers*: [lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) - -[[autodoc]] StableDiffusionImageVariationPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/img2img.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/img2img.mdx deleted file mode 100644 index 09bfb853f9c9bdce1fbd4b4ae3571557d2a5bfc1..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/img2img.mdx +++ /dev/null @@ -1,36 +0,0 @@ - - -# Image-to-Image Generation - -## StableDiffusionImg2ImgPipeline - -The Stable Diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [runway](https://github.com/runwayml), and [LAION](https://laion.ai/). The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images using Stable Diffusion. - -The original codebase can be found here: [CampVis/stable-diffusion](https://github.com/CompVis/stable-diffusion/blob/main/scripts/img2img.py) - -[`StableDiffusionImg2ImgPipeline`] is compatible with all Stable Diffusion checkpoints for [Text-to-Image](./text2img) - -The pipeline uses the diffusion-denoising mechanism proposed by SDEdit ([SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations](https://arxiv.org/abs/2108.01073) -proposed by Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon). - -[[autodoc]] StableDiffusionImg2ImgPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - -[[autodoc]] FlaxStableDiffusionImg2ImgPipeline - - all - - __call__ \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx deleted file mode 100644 index 33e84a63261fbf9c370e2d5e22ffbf4a1256bbb4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx +++ /dev/null @@ -1,37 +0,0 @@ - - -# Text-Guided Image Inpainting - -## StableDiffusionInpaintPipeline - -The Stable Diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [runway](https://github.com/runwayml), and [LAION](https://laion.ai/). The [`StableDiffusionInpaintPipeline`] lets you edit specific parts of an image by providing a mask and a text prompt using Stable Diffusion. - -The original codebase can be found here: -- *Stable Diffusion V1*: [CampVis/stable-diffusion](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion) -- *Stable Diffusion V2*: [Stability-AI/stablediffusion](https://github.com/Stability-AI/stablediffusion#image-inpainting-with-stable-diffusion) - -Available checkpoints are: -- *stable-diffusion-inpainting (512x512 resolution)*: [runwayml/stable-diffusion-inpainting](https://huggingface.co/runwayml/stable-diffusion-inpainting) -- *stable-diffusion-2-inpainting (512x512 resolution)*: [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) - -[[autodoc]] StableDiffusionInpaintPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - -[[autodoc]] FlaxStableDiffusionInpaintPipeline - - all - - __call__ \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx deleted file mode 100644 index 61fd2f799114de345400a692c115811fbf222871..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx +++ /dev/null @@ -1,33 +0,0 @@ - - -# Stable Diffusion Latent Upscaler - -## StableDiffusionLatentUpscalePipeline - -The Stable Diffusion Latent Upscaler model was created by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion) in collaboration with [Stability AI](https://stability.ai/). It can be used on top of any [`StableDiffusionUpscalePipeline`] checkpoint to enhance its output image resolution by a factor of 2. - -A notebook that demonstrates the original implementation can be found here: -- [Stable Diffusion Upscaler Demo](https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4) - -Available Checkpoints are: -- *stabilityai/latent-upscaler*: [stabilityai/sd-x2-latent-upscaler](https://huggingface.co/stabilityai/sd-x2-latent-upscaler) - - -[[autodoc]] StableDiffusionLatentUpscalePipeline - - all - - __call__ - - enable_sequential_cpu_offload - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/model_editing.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/model_editing.mdx deleted file mode 100644 index 7aae35ba2a91774a4297ee7ada6d13a40fed6f32..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/model_editing.mdx +++ /dev/null @@ -1,61 +0,0 @@ - - -# Editing Implicit Assumptions in Text-to-Image Diffusion Models - -## Overview - -[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. - -The abstract of the paper is the following: - -*Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.* - -Resources: - -* [Project Page](https://time-diffusion.github.io/). -* [Paper](https://arxiv.org/abs/2303.08084). -* [Original Code](https://github.com/bahjat-kawar/time-diffusion). -* [Demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion). - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionModelEditingPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_model_editing.py) | *Text-to-Image Model Editing* | [🤗 Space](https://huggingface.co/spaces/bahjat-kawar/time-diffusion)) | - -This pipeline enables editing the diffusion model weights, such that its assumptions on a given concept are changed. The resulting change is expected to take effect in all prompt generations pertaining to the edited concept. - -## Usage example - -```python -import torch -from diffusers import StableDiffusionModelEditingPipeline - -model_ckpt = "CompVis/stable-diffusion-v1-4" -pipe = StableDiffusionModelEditingPipeline.from_pretrained(model_ckpt) - -pipe = pipe.to("cuda") - -source_prompt = "A pack of roses" -destination_prompt = "A pack of blue roses" -pipe.edit_model(source_prompt, destination_prompt) - -prompt = "A field of roses" -image = pipe(prompt).images[0] -image.save("field_of_roses.png") -``` - -## StableDiffusionModelEditingPipeline -[[autodoc]] StableDiffusionModelEditingPipeline - - __call__ - - all diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/overview.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/overview.mdx deleted file mode 100644 index 70731fd294b91c8bca9bb1726c14011507c22a4a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/overview.mdx +++ /dev/null @@ -1,82 +0,0 @@ - - -# Stable diffusion pipelines - -Stable Diffusion is a text-to-image _latent diffusion_ model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and can run on consumer GPUs. - -Latent diffusion is the research on top of which Stable Diffusion was built. It was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer. You can learn more details about it in the [specific pipeline for latent diffusion](pipelines/latent_diffusion) that is part of 🤗 Diffusers. - -For more details about how Stable Diffusion works and how it differs from the base latent diffusion model, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-announcement) and [this section of our own blog post](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work). - -*Tips*: -- To tweak your prompts on a specific result you liked, you can generate your own latents, as demonstrated in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) - -*Overview*: - -| Pipeline | Tasks | Colab | Demo -|---|---|:---:|:---:| -| [StableDiffusionPipeline](./text2img) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) | [🤗 Stable Diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion) -| [StableDiffusionImg2ImgPipeline](./img2img) | *Image-to-Image Text-Guided Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) | [🤗 Diffuse the Rest](https://huggingface.co/spaces/huggingface/diffuse-the-rest) -| [StableDiffusionInpaintPipeline](./inpaint) | **Experimental** – *Text-Guided Image Inpainting* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) | Coming soon -| [StableDiffusionDepth2ImgPipeline](./depth2img) | **Experimental** – *Depth-to-Image Text-Guided Generation * | | Coming soon -| [StableDiffusionImageVariationPipeline](./image_variation) | **Experimental** – *Image Variation Generation * | | [🤗 Stable Diffusion Image Variations](https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations) -| [StableDiffusionUpscalePipeline](./upscale) | **Experimental** – *Text-Guided Image Super-Resolution * | | Coming soon -| [StableDiffusionLatentUpscalePipeline](./latent_upscale) | **Experimental** – *Text-Guided Image Super-Resolution * | | Coming soon -| [StableDiffusionInstructPix2PixPipeline](./pix2pix) | **Experimental** – *Text-Based Image Editing * | | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://huggingface.co/spaces/timbrooks/instruct-pix2pix) -| [StableDiffusionAttendAndExcitePipeline](./attend_and_excite) | **Experimental** – *Text-to-Image Generation * | | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite) -| [StableDiffusionPix2PixZeroPipeline](./pix2pix_zero) | **Experimental** – *Text-Based Image Editing * | | [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027) -| [StableDiffusionModelEditingPipeline](./model_editing) | **Experimental** – *Text-to-Image Model Editing * | | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) - - - -## Tips - -### How to load and use different schedulers. - -The stable diffusion pipeline uses [`PNDMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the stable diffusion pipeline such as [`DDIMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: - -```python ->>> from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler - ->>> pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler") ->>> pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=euler_scheduler) -``` - - -### How to convert all use cases with multiple or single pipeline - -If you want to use all possible use cases in a single `DiffusionPipeline` you can either: -- Make use of the [Stable Diffusion Mega Pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#stable-diffusion-mega) or -- Make use of the `components` functionality to instantiate all components in the most memory-efficient way: - -```python ->>> from diffusers import ( -... StableDiffusionPipeline, -... StableDiffusionImg2ImgPipeline, -... StableDiffusionInpaintPipeline, -... ) - ->>> text2img = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") ->>> img2img = StableDiffusionImg2ImgPipeline(**text2img.components) ->>> inpaint = StableDiffusionInpaintPipeline(**text2img.components) - ->>> # now you can use text2img(...), img2img(...), inpaint(...) just like the call methods of each respective pipeline -``` - -## StableDiffusionPipelineOutput -[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/panorama.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/panorama.mdx deleted file mode 100644 index e0c7747a0193013507ccc28e3d48c7ee5ab8ca11..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/panorama.mdx +++ /dev/null @@ -1,58 +0,0 @@ - - -# MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation - -## Overview - -[MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](https://arxiv.org/abs/2302.08113) by Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. - -The abstract of the paper is the following: - -*Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. - -Resources: - -* [Project Page](https://multidiffusion.github.io/). -* [Paper](https://arxiv.org/abs/2302.08113). -* [Original Code](https://github.com/omerbt/MultiDiffusion). -* [Demo](https://huggingface.co/spaces/weizmannscience/MultiDiffusion). - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionPanoramaPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py) | *Text-Guided Panorama View Generation* | [🤗 Space](https://huggingface.co/spaces/weizmannscience/MultiDiffusion)) | - - - -## Usage example - -```python -import torch -from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler - -model_ckpt = "stabilityai/stable-diffusion-2-base" -scheduler = DDIMScheduler.from_pretrained(model_ckpt, subfolder="scheduler") -pipe = StableDiffusionPanoramaPipeline.from_pretrained(model_ckpt, scheduler=scheduler, torch_dtype=torch.float16) - -pipe = pipe.to("cuda") - -prompt = "a photo of the dolomites" -image = pipe(prompt).images[0] -image.save("dolomites.png") -``` - -## StableDiffusionPanoramaPipeline -[[autodoc]] StableDiffusionPanoramaPipeline - - __call__ - - all diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx deleted file mode 100644 index 42cd4b896b2e4603aaf826efc7201672c016563f..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx +++ /dev/null @@ -1,70 +0,0 @@ - - -# InstructPix2Pix: Learning to Follow Image Editing Instructions - -## Overview - -[InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800) by Tim Brooks, Aleksander Holynski and Alexei A. Efros. - -The abstract of the paper is the following: - -*We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.* - -Resources: - -* [Project Page](https://www.timothybrooks.com/instruct-pix2pix). -* [Paper](https://arxiv.org/abs/2211.09800). -* [Original Code](https://github.com/timothybrooks/instruct-pix2pix). -* [Demo](https://huggingface.co/spaces/timbrooks/instruct-pix2pix). - - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionInstructPix2PixPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py) | *Text-Based Image Editing* | [🤗 Space](https://huggingface.co/spaces/timbrooks/instruct-pix2pix) | - - - -## Usage example - -```python -import PIL -import requests -import torch -from diffusers import StableDiffusionInstructPix2PixPipeline - -model_id = "timbrooks/instruct-pix2pix" -pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") - -url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" - - -def download_image(url): - image = PIL.Image.open(requests.get(url, stream=True).raw) - image = PIL.ImageOps.exif_transpose(image) - image = image.convert("RGB") - return image - - -image = download_image(url) - -prompt = "make the mountains snowy" -images = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images -images[0].save("snowy_mountains.png") -``` - -## StableDiffusionInstructPix2PixPipeline -[[autodoc]] StableDiffusionInstructPix2PixPipeline - - __call__ - - all diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix_zero.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix_zero.mdx deleted file mode 100644 index f04a54f242acade990415a1ed7c240c37a828dd7..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/pix2pix_zero.mdx +++ /dev/null @@ -1,291 +0,0 @@ - - -# Zero-shot Image-to-Image Translation - -## Overview - -[Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027). - -The abstract of the paper is the following: - -*Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.* - -Resources: - -* [Project Page](https://pix2pixzero.github.io/). -* [Paper](https://arxiv.org/abs/2302.03027). -* [Original Code](https://github.com/pix2pixzero/pix2pix-zero). -* [Demo](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo). - -## Tips - -* The pipeline can be conditioned on real input images. Check out the code examples below to know more. -* The pipeline exposes two arguments namely `source_embeds` and `target_embeds` -that let you control the direction of the semantic edits in the final image to be generated. Let's say, -you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect -this in the pipeline, you simply have to set the embeddings related to the phrases including "cat" to -`source_embeds` and "dog" to `target_embeds`. Refer to the code example below for more details. -* When you're using this pipeline from a prompt, specify the _source_ concept in the prompt. Taking -the above example, a valid input prompt would be: "a high resolution painting of a **cat** in the style of van gough". -* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to: - * Swap the `source_embeds` and `target_embeds`. - * Change the input prompt to include "dog". -* To learn more about how the source and target embeddings are generated, refer to the [original -paper](https://arxiv.org/abs/2302.03027). Below, we also provide some directions on how to generate the embeddings. -* Note that the quality of the outputs generated with this pipeline is dependent on how good the `source_embeds` and `target_embeds` are. Please, refer to [this discussion](#generating-source-and-target-embeddings) for some suggestions on the topic. - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionPix2PixZeroPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py) | *Text-Based Image Editing* | [🤗 Space](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo) | - - - -## Usage example - -### Based on an image generated with the input prompt - -```python -import requests -import torch - -from diffusers import DDIMScheduler, StableDiffusionPix2PixZeroPipeline - - -def download(embedding_url, local_filepath): - r = requests.get(embedding_url) - with open(local_filepath, "wb") as f: - f.write(r.content) - - -model_ckpt = "CompVis/stable-diffusion-v1-4" -pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained( - model_ckpt, conditions_input_image=False, torch_dtype=torch.float16 -) -pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) -pipeline.to("cuda") - -prompt = "a high resolution painting of a cat in the style of van gogh" -src_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/cat.pt" -target_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/dog.pt" - -for url in [src_embs_url, target_embs_url]: - download(url, url.split("/")[-1]) - -src_embeds = torch.load(src_embs_url.split("/")[-1]) -target_embeds = torch.load(target_embs_url.split("/")[-1]) - -images = pipeline( - prompt, - source_embeds=src_embeds, - target_embeds=target_embeds, - num_inference_steps=50, - cross_attention_guidance_amount=0.15, -).images -images[0].save("edited_image_dog.png") -``` - -### Based on an input image - -When the pipeline is conditioned on an input image, we first obtain an inverted -noise from it using a `DDIMInverseScheduler` with the help of a generated caption. Then -the inverted noise is used to start the generation process. - -First, let's load our pipeline: - -```py -import torch -from transformers import BlipForConditionalGeneration, BlipProcessor -from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionPix2PixZeroPipeline - -captioner_id = "Salesforce/blip-image-captioning-base" -processor = BlipProcessor.from_pretrained(captioner_id) -model = BlipForConditionalGeneration.from_pretrained(captioner_id, torch_dtype=torch.float16, low_cpu_mem_usage=True) - -sd_model_ckpt = "CompVis/stable-diffusion-v1-4" -pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained( - sd_model_ckpt, - caption_generator=model, - caption_processor=processor, - torch_dtype=torch.float16, - safety_checker=None, -) -pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) -pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config) -pipeline.enable_model_cpu_offload() -``` - -Then, we load an input image for conditioning and obtain a suitable caption for it: - -```py -import requests -from PIL import Image - -img_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/test_images/cats/cat_6.png" -raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB").resize((512, 512)) -caption = pipeline.generate_caption(raw_image) -``` - -Then we employ the generated caption and the input image to get the inverted noise: - -```py -generator = torch.manual_seed(0) -inv_latents = pipeline.invert(caption, image=raw_image, generator=generator).latents -``` - -Now, generate the image with edit directions: - -```py -# See the "Generating source and target embeddings" section below to -# automate the generation of these captions with a pre-trained model like Flan-T5 as explained below. -source_prompts = ["a cat sitting on the street", "a cat playing in the field", "a face of a cat"] -target_prompts = ["a dog sitting on the street", "a dog playing in the field", "a face of a dog"] - -source_embeds = pipeline.get_embeds(source_prompts, batch_size=2) -target_embeds = pipeline.get_embeds(target_prompts, batch_size=2) - - -image = pipeline( - caption, - source_embeds=source_embeds, - target_embeds=target_embeds, - num_inference_steps=50, - cross_attention_guidance_amount=0.15, - generator=generator, - latents=inv_latents, - negative_prompt=caption, -).images[0] -image.save("edited_image.png") -``` - -## Generating source and target embeddings - -The authors originally used the [GPT-3 API](https://openai.com/api/) to generate the source and target captions for discovering -edit directions. However, we can also leverage open source and public models for the same purpose. -Below, we provide an end-to-end example with the [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) model -for generating captions and [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for -computing embeddings on the generated captions. - -**1. Load the generation model**: - -```py -import torch -from transformers import AutoTokenizer, T5ForConditionalGeneration - -tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl") -model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", torch_dtype=torch.float16) -``` - -**2. Construct a starting prompt**: - -```py -source_concept = "cat" -target_concept = "dog" - -source_text = f"Provide a caption for images containing a {source_concept}. " -"The captions should be in English and should be no longer than 150 characters." - -target_text = f"Provide a caption for images containing a {target_concept}. " -"The captions should be in English and should be no longer than 150 characters." -``` - -Here, we're interested in the "cat -> dog" direction. - -**3. Generate captions**: - -We can use a utility like so for this purpose. - -```py -def generate_captions(input_prompt): - input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.to("cuda") - - outputs = model.generate( - input_ids, temperature=0.8, num_return_sequences=16, do_sample=True, max_new_tokens=128, top_k=10 - ) - return tokenizer.batch_decode(outputs, skip_special_tokens=True) -``` - -And then we just call it to generate our captions: - -```py -source_captions = generate_captions(source_text) -target_captions = generate_captions(target_concept) -``` - -We encourage you to play around with the different parameters supported by the -`generate()` method ([documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_tf_utils.TFGenerationMixin.generate)) for the generation quality you are looking for. - -**4. Load the embedding model**: - -Here, we need to use the same text encoder model used by the subsequent Stable Diffusion model. - -```py -from diffusers import StableDiffusionPix2PixZeroPipeline - -pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16 -) -pipeline = pipeline.to("cuda") -tokenizer = pipeline.tokenizer -text_encoder = pipeline.text_encoder -``` - -**5. Compute embeddings**: - -```py -import torch - -def embed_captions(sentences, tokenizer, text_encoder, device="cuda"): - with torch.no_grad(): - embeddings = [] - for sent in sentences: - text_inputs = tokenizer( - sent, - padding="max_length", - max_length=tokenizer.model_max_length, - truncation=True, - return_tensors="pt", - ) - text_input_ids = text_inputs.input_ids - prompt_embeds = text_encoder(text_input_ids.to(device), attention_mask=None)[0] - embeddings.append(prompt_embeds) - return torch.concatenate(embeddings, dim=0).mean(dim=0).unsqueeze(0) - -source_embeddings = embed_captions(source_captions, tokenizer, text_encoder) -target_embeddings = embed_captions(target_captions, tokenizer, text_encoder) -``` - -And you're done! [Here](https://colab.research.google.com/drive/1tz2C1EdfZYAPlzXXbTnf-5PRBiR8_R1F?usp=sharing) is a Colab Notebook that you can use to interact with the entire process. - -Now, you can use these embeddings directly while calling the pipeline: - -```py -from diffusers import DDIMScheduler - -pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) - -images = pipeline( - prompt, - source_embeds=source_embeddings, - target_embeds=target_embeddings, - num_inference_steps=50, - cross_attention_guidance_amount=0.15, -).images -images[0].save("edited_image_dog.png") -``` - -## StableDiffusionPix2PixZeroPipeline -[[autodoc]] StableDiffusionPix2PixZeroPipeline - - __call__ - - all diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/self_attention_guidance.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/self_attention_guidance.mdx deleted file mode 100644 index b34c1f51cf668b289ca000719828addb88f6a20e..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/self_attention_guidance.mdx +++ /dev/null @@ -1,64 +0,0 @@ - - -# Self-Attention Guidance (SAG) - -## Overview - -[Self-Attention Guidance](https://arxiv.org/abs/2210.00939) by Susung Hong et al. - -The abstract of the paper is the following: - -*Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample quality and diversity. Despite their remarkable performance, DDMs remain black boxes on which further study is necessary to take a profound step. Motivated by this, we delve into the design of conventional U-shaped diffusion models. More specifically, we investigate the self-attention modules within these models through carefully designed experiments and explore their characteristics. In addition, inspired by the studies that substantiate the effectiveness of the guidance schemes, we present plug-and-play diffusion guidance, namely Self-Attention Guidance (SAG), that can drastically boost the performance of existing diffusion models. Our method, SAG, extracts the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Subsequently, we measure the dissimilarity between the predicted noises obtained from feeding the blurred and original input to the diffusion model and leverage it as guidance. With this guidance, we observe apparent improvements in a wide range of diffusion models, e.g., ADM, IDDPM, and Stable Diffusion, and show that the results further improve by combining our method with the conventional guidance scheme. We provide extensive ablation studies to verify our choices.* - -Resources: - -* [Project Page](https://ku-cvlab.github.io/Self-Attention-Guidance). -* [Paper](https://arxiv.org/abs/2210.00939). -* [Original Code](https://github.com/KU-CVLAB/Self-Attention-Guidance). -* [Demo](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb). - - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [StableDiffusionSAGPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py) | *Text-to-Image Generation* | [Colab](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb) | - -## Usage example - -```python -import torch -from diffusers import StableDiffusionSAGPipeline -from accelerate.utils import set_seed - -pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16) -pipe = pipe.to("cuda") - -seed = 8978 -prompt = "." -guidance_scale = 7.5 -num_images_per_prompt = 1 - -sag_scale = 1.0 - -set_seed(seed) -images = pipe( - prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale -).images -images[0].save("example.png") -``` - -## StableDiffusionSAGPipeline -[[autodoc]] StableDiffusionSAGPipeline - - __call__ - - all diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/text2img.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/text2img.mdx deleted file mode 100644 index 6b8d53bf6510a0b122529170e0de3cbddcc40690..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/text2img.mdx +++ /dev/null @@ -1,45 +0,0 @@ - - -# Text-to-Image Generation - -## StableDiffusionPipeline - -The Stable Diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [runway](https://github.com/runwayml), and [LAION](https://laion.ai/). The [`StableDiffusionPipeline`] is capable of generating photo-realistic images given any text input using Stable Diffusion. - -The original codebase can be found here: -- *Stable Diffusion V1*: [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion) -- *Stable Diffusion v2*: [Stability-AI/stablediffusion](https://github.com/Stability-AI/stablediffusion) - -Available Checkpoints are: -- *stable-diffusion-v1-4 (512x512 resolution)* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) -- *stable-diffusion-v1-5 (512x512 resolution)* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) -- *stable-diffusion-2-base (512x512 resolution)*: [stabilityai/stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) -- *stable-diffusion-2 (768x768 resolution)*: [stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) -- *stable-diffusion-2-1-base (512x512 resolution)* [stabilityai/stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) -- *stable-diffusion-2-1 (768x768 resolution)*: [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) - -[[autodoc]] StableDiffusionPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_vae_slicing - - disable_vae_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - - enable_vae_tiling - - disable_vae_tiling - -[[autodoc]] FlaxStableDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion/upscale.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion/upscale.mdx deleted file mode 100644 index f70d8f445fd95fb49e7a92c7566951c40ec74933..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion/upscale.mdx +++ /dev/null @@ -1,32 +0,0 @@ - - -# Super-Resolution - -## StableDiffusionUpscalePipeline - -The upscaler diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), and [LAION](https://laion.ai/), as part of Stable Diffusion 2.0. [`StableDiffusionUpscalePipeline`] can be used to enhance the resolution of input images by a factor of 4. - -The original codebase can be found here: -- *Stable Diffusion v2*: [Stability-AI/stablediffusion](https://github.com/Stability-AI/stablediffusion#image-upscaling-with-stable-diffusion) - -Available Checkpoints are: -- *stabilityai/stable-diffusion-x4-upscaler (x4 resolution resolution)*: [stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) - - -[[autodoc]] StableDiffusionUpscalePipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion_2.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion_2.mdx deleted file mode 100644 index e922072e4e3185f9de4a0d6e734e0c46a4fe3215..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion_2.mdx +++ /dev/null @@ -1,176 +0,0 @@ - - -# Stable diffusion 2 - -Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release). -The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). - -*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. -These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).* - -For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release). - -## Tips - -### Available checkpoints: - -Note that the architecture is more or less identical to [Stable Diffusion 1](./stable_diffusion/overview) so please refer to [this page](./stable_diffusion/overview) for API documentation. - -- *Text-to-Image (512x512 resolution)*: [stabilityai/stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) with [`StableDiffusionPipeline`] -- *Text-to-Image (768x768 resolution)*: [stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) with [`StableDiffusionPipeline`] -- *Image Inpainting (512x512 resolution)*: [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) with [`StableDiffusionInpaintPipeline`] -- *Super-Resolution (x4 resolution resolution)*: [stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) [`StableDiffusionUpscalePipeline`] -- *Depth-to-Image (512x512 resolution)*: [stabilityai/stable-diffusion-2-depth](https://huggingface.co/stabilityai/stable-diffusion-2-depth) with [`StableDiffusionDepth2ImagePipeline`] - -We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest scheduler there is. - - -### Text-to-Image - -- *Text-to-Image (512x512 resolution)*: [stabilityai/stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) with [`StableDiffusionPipeline`] - -```python -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler -import torch - -repo_id = "stabilityai/stable-diffusion-2-base" -pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16") - -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -pipe = pipe.to("cuda") - -prompt = "High quality photo of an astronaut riding a horse in space" -image = pipe(prompt, num_inference_steps=25).images[0] -image.save("astronaut.png") -``` - -- *Text-to-Image (768x768 resolution)*: [stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) with [`StableDiffusionPipeline`] - -```python -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler -import torch - -repo_id = "stabilityai/stable-diffusion-2" -pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16") - -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -pipe = pipe.to("cuda") - -prompt = "High quality photo of an astronaut riding a horse in space" -image = pipe(prompt, guidance_scale=9, num_inference_steps=25).images[0] -image.save("astronaut.png") -``` - -### Image Inpainting - -- *Image Inpainting (512x512 resolution)*: [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) with [`StableDiffusionInpaintPipeline`] - -```python -import PIL -import requests -import torch -from io import BytesIO - -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler - - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) - -repo_id = "stabilityai/stable-diffusion-2-inpainting" -pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16") - -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -pipe = pipe.to("cuda") - -prompt = "Face of a yellow cat, high resolution, sitting on a park bench" -image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0] - -image.save("yellow_cat.png") -``` - -### Super-Resolution - -- *Image Upscaling (x4 resolution resolution)*: [stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) with [`StableDiffusionUpscalePipeline`] - - -```python -import requests -from PIL import Image -from io import BytesIO -from diffusers import StableDiffusionUpscalePipeline -import torch - -# load model and scheduler -model_id = "stabilityai/stable-diffusion-x4-upscaler" -pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16) -pipeline = pipeline.to("cuda") - -# let's download an image -url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png" -response = requests.get(url) -low_res_img = Image.open(BytesIO(response.content)).convert("RGB") -low_res_img = low_res_img.resize((128, 128)) -prompt = "a white cat" -upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0] -upscaled_image.save("upsampled_cat.png") -``` - -### Depth-to-Image - -- *Depth-Guided Text-to-Image*: [stabilityai/stable-diffusion-2-depth](https://huggingface.co/stabilityai/stable-diffusion-2-depth) [`StableDiffusionDepth2ImagePipeline`] - - -```python -import torch -import requests -from PIL import Image - -from diffusers import StableDiffusionDepth2ImgPipeline - -pipe = StableDiffusionDepth2ImgPipeline.from_pretrained( - "stabilityai/stable-diffusion-2-depth", - torch_dtype=torch.float16, -).to("cuda") - - -url = "http://images.cocodataset.org/val2017/000000039769.jpg" -init_image = Image.open(requests.get(url, stream=True).raw) -prompt = "two tigers" -n_propmt = "bad, deformed, ugly, bad anotomy" -image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0] -``` - -### How to load and use different schedulers. - -The stable diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the stable diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: - -```python ->>> from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler - ->>> pipeline = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-2", subfolder="scheduler") ->>> pipeline = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2", scheduler=euler_scheduler) -``` diff --git a/diffusers/docs/source/en/api/pipelines/stable_diffusion_safe.mdx b/diffusers/docs/source/en/api/pipelines/stable_diffusion_safe.mdx deleted file mode 100644 index 688eb5013c6a287c77722f006eea59bab73343e6..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_diffusion_safe.mdx +++ /dev/null @@ -1,90 +0,0 @@ - - -# Safe Stable Diffusion - -Safe Stable Diffusion was proposed in [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://arxiv.org/abs/2211.05105) and mitigates the well known issue that models like Stable Diffusion that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. For instance Stable Diffusion may unexpectedly generate nudity, violence, images depicting self-harm, or otherwise offensive content. -Safe Stable Diffusion is an extension to the Stable Diffusion that drastically reduces content like this. - -The abstract of the paper is the following: - -*Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer, as we demonstrate, from degenerated and biased human behavior. In turn, they may even reinforce such biases. To help combat these undesired side effects, we present safe latent diffusion (SLD). Specifically, to measure the inappropriate degeneration due to unfiltered and imbalanced training sets, we establish a novel image generation test bed-inappropriate image prompts (I2P)-containing dedicated, real-world image-to-text prompts covering concepts such as nudity and violence. As our exhaustive empirical evaluation demonstrates, the introduced SLD removes and suppresses inappropriate image parts during the diffusion process, with no additional training required and no adverse effect on overall image quality or text alignment.* - - -*Overview*: - -| Pipeline | Tasks | Colab | Demo -|---|---|:---:|:---:| -| [pipeline_stable_diffusion_safe.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) | [![Huggingface Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/AIML-TUDA/unsafe-vs-safe-stable-diffusion) - -## Tips - -- Safe Stable Diffusion may also be used with weights of [Stable Diffusion](./api/pipelines/stable_diffusion/text2img). - -### Run Safe Stable Diffusion - -Safe Stable Diffusion can be tested very easily with the [`StableDiffusionPipelineSafe`], and the `"AIML-TUDA/stable-diffusion-safe"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](./using-diffusers/conditional_image_generation). - -### Interacting with the Safety Concept - -To check and edit the currently used safety concept, use the `safety_concept` property of [`StableDiffusionPipelineSafe`]: -```python ->>> from diffusers import StableDiffusionPipelineSafe - ->>> pipeline = StableDiffusionPipelineSafe.from_pretrained("AIML-TUDA/stable-diffusion-safe") ->>> pipeline.safety_concept -``` -For each image generation the active concept is also contained in [`StableDiffusionSafePipelineOutput`]. - -### Using pre-defined safety configurations - -You may use the 4 configurations defined in the [Safe Latent Diffusion paper](https://arxiv.org/abs/2211.05105) as follows: - -```python ->>> from diffusers import StableDiffusionPipelineSafe ->>> from diffusers.pipelines.stable_diffusion_safe import SafetyConfig - ->>> pipeline = StableDiffusionPipelineSafe.from_pretrained("AIML-TUDA/stable-diffusion-safe") ->>> prompt = "the four horsewomen of the apocalypse, painting by tom of finland, gaston bussiere, craig mullins, j. c. leyendecker" ->>> out = pipeline(prompt=prompt, **SafetyConfig.MAX) -``` - -The following configurations are available: `SafetyConfig.WEAK`, `SafetyConfig.MEDIUM`, `SafetyConfig.STRONG`, and `SafetyConfig.MAX`. - -### How to load and use different schedulers - -The safe stable diffusion pipeline uses [`PNDMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the stable diffusion pipeline such as [`DDIMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: - -```python ->>> from diffusers import StableDiffusionPipelineSafe, EulerDiscreteScheduler - ->>> pipeline = StableDiffusionPipelineSafe.from_pretrained("AIML-TUDA/stable-diffusion-safe") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("AIML-TUDA/stable-diffusion-safe", subfolder="scheduler") ->>> pipeline = StableDiffusionPipelineSafe.from_pretrained( -... "AIML-TUDA/stable-diffusion-safe", scheduler=euler_scheduler -... ) -``` - - -## StableDiffusionSafePipelineOutput -[[autodoc]] pipelines.stable_diffusion_safe.StableDiffusionSafePipelineOutput - - all - - __call__ - -## StableDiffusionPipelineSafe -[[autodoc]] StableDiffusionPipelineSafe - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/stable_unclip.mdx b/diffusers/docs/source/en/api/pipelines/stable_unclip.mdx deleted file mode 100644 index ee359d0ba486a30fb732fe3d191e7088c6c69a1e..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stable_unclip.mdx +++ /dev/null @@ -1,175 +0,0 @@ - - -# Stable unCLIP - -Stable unCLIP checkpoints are finetuned from [stable diffusion 2.1](./stable_diffusion_2) checkpoints to condition on CLIP image embeddings. -Stable unCLIP also still conditions on text embeddings. Given the two separate conditionings, stable unCLIP can be used -for text guided image variation. When combined with an unCLIP prior, it can also be used for full text to image generation. - -To know more about the unCLIP process, check out the following paper: - -[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. - -## Tips - -Stable unCLIP takes a `noise_level` as input during inference. `noise_level` determines how much noise is added -to the image embeddings. A higher `noise_level` increases variation in the final un-noised images. By default, -we do not add any additional noise to the image embeddings i.e. `noise_level = 0`. - -### Available checkpoints: - -* Image variation - * [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) - * [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small) -* Text-to-image - * [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small) - -### Text-to-Image Generation -Stable unCLIP can be leveraged for text-to-image generation by pipelining it with the prior model of KakaoBrain's open source DALL-E 2 replication [Karlo](https://huggingface.co/kakaobrain/karlo-v1-alpha) - -```python -import torch -from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline -from diffusers.models import PriorTransformer -from transformers import CLIPTokenizer, CLIPTextModelWithProjection - -prior_model_id = "kakaobrain/karlo-v1-alpha" -data_type = torch.float16 -prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type) - -prior_text_model_id = "openai/clip-vit-large-patch14" -prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id) -prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type) -prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler") -prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config) - -stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small" - -pipe = StableUnCLIPPipeline.from_pretrained( - stable_unclip_model_id, - torch_dtype=data_type, - variant="fp16", - prior_tokenizer=prior_tokenizer, - prior_text_encoder=prior_text_model, - prior=prior, - prior_scheduler=prior_scheduler, -) - -pipe = pipe.to("cuda") -wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular" - -images = pipe(prompt=wave_prompt).images -images[0].save("waves.png") -``` - - -For text-to-image we use `stabilityai/stable-diffusion-2-1-unclip-small` as it was trained on CLIP ViT-L/14 embedding, the same as the Karlo model prior. [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) was trained on OpenCLIP ViT-H, so we don't recommend its use. - - - -### Text guided Image-to-Image Variation - -```python -from diffusers import StableUnCLIPImg2ImgPipeline -from diffusers.utils import load_image -import torch - -pipe = StableUnCLIPImg2ImgPipeline.from_pretrained( - "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16" -) -pipe = pipe.to("cuda") - -url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png" -init_image = load_image(url) - -images = pipe(init_image).images -images[0].save("variation_image.png") -``` - -Optionally, you can also pass a prompt to `pipe` such as: - -```python -prompt = "A fantasy landscape, trending on artstation" - -images = pipe(init_image, prompt=prompt).images -images[0].save("variation_image_two.png") -``` - -### Memory optimization - -If you are short on GPU memory, you can enable smart CPU offloading so that models that are not needed -immediately for a computation can be offloaded to CPU: - -```python -from diffusers import StableUnCLIPImg2ImgPipeline -from diffusers.utils import load_image -import torch - -pipe = StableUnCLIPImg2ImgPipeline.from_pretrained( - "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16" -) -# Offload to CPU. -pipe.enable_model_cpu_offload() - -url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png" -init_image = load_image(url) - -images = pipe(init_image).images -images[0] -``` - -Further memory optimizations are possible by enabling VAE slicing on the pipeline: - -```python -from diffusers import StableUnCLIPImg2ImgPipeline -from diffusers.utils import load_image -import torch - -pipe = StableUnCLIPImg2ImgPipeline.from_pretrained( - "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16" -) -pipe.enable_model_cpu_offload() -pipe.enable_vae_slicing() - -url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png" -init_image = load_image(url) - -images = pipe(init_image).images -images[0] -``` - -### StableUnCLIPPipeline - -[[autodoc]] StableUnCLIPPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_vae_slicing - - disable_vae_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - - -### StableUnCLIPImg2ImgPipeline - -[[autodoc]] StableUnCLIPImg2ImgPipeline - - all - - __call__ - - enable_attention_slicing - - disable_attention_slicing - - enable_vae_slicing - - disable_vae_slicing - - enable_xformers_memory_efficient_attention - - disable_xformers_memory_efficient_attention - \ No newline at end of file diff --git a/diffusers/docs/source/en/api/pipelines/stochastic_karras_ve.mdx b/diffusers/docs/source/en/api/pipelines/stochastic_karras_ve.mdx deleted file mode 100644 index 17a414303b9c8670361258e52047db4aff399cf7..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/stochastic_karras_ve.mdx +++ /dev/null @@ -1,36 +0,0 @@ - - -# Stochastic Karras VE - -## Overview - -[Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) by Tero Karras, Miika Aittala, Timo Aila and Samuli Laine. - -The abstract of the paper is the following: - -We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of an existing ImageNet-64 model from 2.07 to near-SOTA 1.55. - -This pipeline implements the Stochastic sampling tailored to the Variance-Expanding (VE) models. - - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_stochastic_karras_ve.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochastic_karras_ve/pipeline_stochastic_karras_ve.py) | *Unconditional Image Generation* | - | - - -## KarrasVePipeline -[[autodoc]] KarrasVePipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/text_to_video.mdx b/diffusers/docs/source/en/api/pipelines/text_to_video.mdx deleted file mode 100644 index 82b2f19ce1b2eb0456906ecf9ed1dfde4f6a0d26..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/text_to_video.mdx +++ /dev/null @@ -1,130 +0,0 @@ - - - - -This pipeline is for research purposes only. - - - -# Text-to-video synthesis - -## Overview - -[VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation](https://arxiv.org/abs/2303.08320) by Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, Tieniu Tan. - -The abstract of the paper is the following: - -*A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-support text-conditioned video creation.* - -Resources: - -* [Website](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) -* [GitHub repository](https://github.com/modelscope/modelscope/) -* [🤗 Spaces](https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis) - -## Available Pipelines: - -| Pipeline | Tasks | Demo -|---|---|:---:| -| [TextToVideoSDPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py) | *Text-to-Video Generation* | [🤗 Spaces](https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis) - -## Usage example - -Let's start by generating a short video with the default length of 16 frames (2s at 8 fps): - -```python -import torch -from diffusers import DiffusionPipeline -from diffusers.utils import export_to_video - -pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16") -pipe = pipe.to("cuda") - -prompt = "Spiderman is surfing" -video_frames = pipe(prompt).frames -video_path = export_to_video(video_frames) -video_path -``` - -Diffusers supports different optimization techniques to improve the latency -and memory footprint of a pipeline. Since videos are often more memory-heavy than images, -we can enable CPU offloading and VAE slicing to keep the memory footprint at bay. - -Let's generate a video of 8 seconds (64 frames) on the same GPU using CPU offloading and VAE slicing: - -```python -import torch -from diffusers import DiffusionPipeline -from diffusers.utils import export_to_video - -pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16") -pipe.enable_model_cpu_offload() - -# memory optimization -pipe.enable_vae_slicing() - -prompt = "Darth Vader surfing a wave" -video_frames = pipe(prompt, num_frames=64).frames -video_path = export_to_video(video_frames) -video_path -``` - -It just takes **7 GBs of GPU memory** to generate the 64 video frames using PyTorch 2.0, "fp16" precision and the techniques mentioned above. - -We can also use a different scheduler easily, using the same method we'd use for Stable Diffusion: - -```python -import torch -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler -from diffusers.utils import export_to_video - -pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16") -pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -pipe.enable_model_cpu_offload() - -prompt = "Spiderman is surfing" -video_frames = pipe(prompt, num_inference_steps=25).frames -video_path = export_to_video(video_frames) -video_path -``` - -Here are some sample outputs: - - - - - - -
- An astronaut riding a horse. -
- An astronaut riding a horse. -
- Darth vader surfing in waves. -
- Darth vader surfing in waves. -
- -## Available checkpoints - -* [damo-vilab/text-to-video-ms-1.7b](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/) -* [damo-vilab/text-to-video-ms-1.7b-legacy](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b-legacy) - -## TextToVideoSDPipeline -[[autodoc]] TextToVideoSDPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/unclip.mdx b/diffusers/docs/source/en/api/pipelines/unclip.mdx deleted file mode 100644 index 13a578a0ab4857c38dd37598b334c731ba184f46..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/unclip.mdx +++ /dev/null @@ -1,37 +0,0 @@ - - -# unCLIP - -## Overview - -[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen - -The abstract of the paper is the following: - -Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. - -The unCLIP model in diffusers comes from kakaobrain's karlo and the original codebase can be found [here](https://github.com/kakaobrain/karlo). Additionally, lucidrains has a DALL-E 2 recreation [here](https://github.com/lucidrains/DALLE2-pytorch). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_unclip.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/unclip/pipeline_unclip.py) | *Text-to-Image Generation* | - | -| [pipeline_unclip_image_variation.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/unclip/pipeline_unclip_image_variation.py) | *Image-Guided Image Generation* | - | - - -## UnCLIPPipeline -[[autodoc]] UnCLIPPipeline - - all - - __call__ - -[[autodoc]] UnCLIPImageVariationPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/versatile_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/versatile_diffusion.mdx deleted file mode 100644 index bfafa8e8f1fc8b36e1488b917922ff676222db98..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/versatile_diffusion.mdx +++ /dev/null @@ -1,70 +0,0 @@ - - -# VersatileDiffusion - -VersatileDiffusion was proposed in [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) by Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi . - -The abstract of the paper is the following: - -*The recent advances in diffusion models have set an impressive milestone in many generation tasks. Trending works such as DALL-E2, Imagen, and Stable Diffusion have attracted great interest in academia and industry. Despite the rapid landscape changes, recent new approaches focus on extensions and performance rather than capacity, thus requiring separate models for separate tasks. In this work, we expand the existing single-flow diffusion pipeline into a multi-flow network, dubbed Versatile Diffusion (VD), that handles text-to-image, image-to-text, image-variation, and text-variation in one unified model. Moreover, we generalize VD to a unified multi-flow multimodal diffusion framework with grouped layers, swappable streams, and other propositions that can process modalities beyond images and text. Through our experiments, we demonstrate that VD and its underlying framework have the following merits: a) VD handles all subtasks with competitive quality; b) VD initiates novel extensions and applications such as disentanglement of style and semantic, image-text dual-guided generation, etc.; c) Through these experiments and applications, VD provides more semantic insights of the generated outputs.* - -## Tips - -- VersatileDiffusion is conceptually very similar as [Stable Diffusion](./api/pipelines/stable_diffusion/overview), but instead of providing just a image data stream conditioned on text, VersatileDiffusion provides both a image and text data stream and can be conditioned on both text and image. - -### *Run VersatileDiffusion* - -You can both load the memory intensive "all-in-one" [`VersatileDiffusionPipeline`] that can run all tasks -with the same class as shown in [`VersatileDiffusionPipeline.text_to_image`], [`VersatileDiffusionPipeline.image_variation`], and [`VersatileDiffusionPipeline.dual_guided`] - -**or** - -You can run the individual pipelines which are much more memory efficient: - -- *Text-to-Image*: [`VersatileDiffusionTextToImagePipeline.__call__`] -- *Image Variation*: [`VersatileDiffusionImageVariationPipeline.__call__`] -- *Dual Text and Image Guided Generation*: [`VersatileDiffusionDualGuidedPipeline.__call__`] - -### *How to load and use different schedulers.* - -The versatile diffusion pipelines uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. -To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: - -```python ->>> from diffusers import VersatileDiffusionPipeline, EulerDiscreteScheduler - ->>> pipeline = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - ->>> # or ->>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("shi-labs/versatile-diffusion", subfolder="scheduler") ->>> pipeline = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", scheduler=euler_scheduler) -``` - -## VersatileDiffusionPipeline -[[autodoc]] VersatileDiffusionPipeline - -## VersatileDiffusionTextToImagePipeline -[[autodoc]] VersatileDiffusionTextToImagePipeline - - all - - __call__ - -## VersatileDiffusionImageVariationPipeline -[[autodoc]] VersatileDiffusionImageVariationPipeline - - all - - __call__ - -## VersatileDiffusionDualGuidedPipeline -[[autodoc]] VersatileDiffusionDualGuidedPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/pipelines/vq_diffusion.mdx b/diffusers/docs/source/en/api/pipelines/vq_diffusion.mdx deleted file mode 100644 index f8182c674f7a75eff8bb9276d191a156c0ba6741..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/pipelines/vq_diffusion.mdx +++ /dev/null @@ -1,35 +0,0 @@ - - -# VQDiffusion - -## Overview - -[Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) by Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo - -The abstract of the paper is the following: - -We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality. - -The original codebase can be found [here](https://github.com/microsoft/VQ-Diffusion). - -## Available Pipelines: - -| Pipeline | Tasks | Colab -|---|---|:---:| -| [pipeline_vq_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/vq_diffusion/pipeline_vq_diffusion.py) | *Text-to-Image Generation* | - | - - -## VQDiffusionPipeline -[[autodoc]] VQDiffusionPipeline - - all - - __call__ diff --git a/diffusers/docs/source/en/api/schedulers/ddim.mdx b/diffusers/docs/source/en/api/schedulers/ddim.mdx deleted file mode 100644 index 51b0cc3e9a09c85215b03f2af18430962cd2ba88..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/ddim.mdx +++ /dev/null @@ -1,27 +0,0 @@ - - -# Denoising Diffusion Implicit Models (DDIM) - -## Overview - -[Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon. - -The abstract of the paper is the following: - -Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space. - -The original codebase of this paper can be found here: [ermongroup/ddim](https://github.com/ermongroup/ddim). -For questions, feel free to contact the author on [tsong.me](https://tsong.me/). - -## DDIMScheduler -[[autodoc]] DDIMScheduler diff --git a/diffusers/docs/source/en/api/schedulers/ddim_inverse.mdx b/diffusers/docs/source/en/api/schedulers/ddim_inverse.mdx deleted file mode 100644 index 5096a3cee283d7a59eeedc48b1dea5080c46aa21..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/ddim_inverse.mdx +++ /dev/null @@ -1,21 +0,0 @@ - - -# Inverse Denoising Diffusion Implicit Models (DDIMInverse) - -## Overview - -This scheduler is the inverted scheduler of [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon. -The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/pdf/2211.09794.pdf) - -## DDIMInverseScheduler -[[autodoc]] DDIMInverseScheduler diff --git a/diffusers/docs/source/en/api/schedulers/ddpm.mdx b/diffusers/docs/source/en/api/schedulers/ddpm.mdx deleted file mode 100644 index 6c4058b941fab8ec7177f9635aecc7b924b39d68..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/ddpm.mdx +++ /dev/null @@ -1,27 +0,0 @@ - - -# Denoising Diffusion Probabilistic Models (DDPM) - -## Overview - -[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) - (DDPM) by Jonathan Ho, Ajay Jain and Pieter Abbeel proposes the diffusion based model of the same name, but in the context of the 🤗 Diffusers library, DDPM refers to the discrete denoising scheduler from the paper as well as the pipeline. - -The abstract of the paper is the following: - -We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. - -The original paper can be found [here](https://arxiv.org/abs/2010.02502). - -## DDPMScheduler -[[autodoc]] DDPMScheduler diff --git a/diffusers/docs/source/en/api/schedulers/deis.mdx b/diffusers/docs/source/en/api/schedulers/deis.mdx deleted file mode 100644 index 9ab8418210983d4920c677de1aa4a865ab2bfca8..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/deis.mdx +++ /dev/null @@ -1,22 +0,0 @@ - - -# DEIS - -Fast Sampling of Diffusion Models with Exponential Integrator. - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2204.13902). The original implementation can be found [here](https://github.com/qsh-zh/deis). - -## DEISMultistepScheduler -[[autodoc]] DEISMultistepScheduler diff --git a/diffusers/docs/source/en/api/schedulers/dpm_discrete.mdx b/diffusers/docs/source/en/api/schedulers/dpm_discrete.mdx deleted file mode 100644 index b57c478adf0c97373279b5ad834dd01bd30a6b13..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/dpm_discrete.mdx +++ /dev/null @@ -1,22 +0,0 @@ - - -# DPM Discrete Scheduler inspired by Karras et. al paper - -## Overview - -Inspired by [Karras et. al](https://arxiv.org/abs/2206.00364). Scheduler ported from @crowsonkb's https://github.com/crowsonkb/k-diffusion library: - -All credit for making this scheduler work goes to [Katherine Crowson](https://github.com/crowsonkb/) - -## KDPM2DiscreteScheduler -[[autodoc]] KDPM2DiscreteScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/dpm_discrete_ancestral.mdx b/diffusers/docs/source/en/api/schedulers/dpm_discrete_ancestral.mdx deleted file mode 100644 index e341a68b553b53601d22e61df35dd58aca00fdfc..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/dpm_discrete_ancestral.mdx +++ /dev/null @@ -1,22 +0,0 @@ - - -# DPM Discrete Scheduler with ancestral sampling inspired by Karras et. al paper - -## Overview - -Inspired by [Karras et. al](https://arxiv.org/abs/2206.00364). Scheduler ported from @crowsonkb's https://github.com/crowsonkb/k-diffusion library: - -All credit for making this scheduler work goes to [Katherine Crowson](https://github.com/crowsonkb/) - -## KDPM2AncestralDiscreteScheduler -[[autodoc]] KDPM2AncestralDiscreteScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/euler.mdx b/diffusers/docs/source/en/api/schedulers/euler.mdx deleted file mode 100644 index f107623363bf49763fc0552bbccd70f7529592f7..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/euler.mdx +++ /dev/null @@ -1,21 +0,0 @@ - - -# Euler scheduler - -## Overview - -Euler scheduler (Algorithm 2) from the paper [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) by Karras et al. (2022). Based on the original [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51) implementation by Katherine Crowson. -Fast scheduler which often times generates good outputs with 20-30 steps. - -## EulerDiscreteScheduler -[[autodoc]] EulerDiscreteScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/euler_ancestral.mdx b/diffusers/docs/source/en/api/schedulers/euler_ancestral.mdx deleted file mode 100644 index 60fd524b195593608f1d2a900ad86756f8fd25ba..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/euler_ancestral.mdx +++ /dev/null @@ -1,21 +0,0 @@ - - -# Euler Ancestral scheduler - -## Overview - -Ancestral sampling with Euler method steps. Based on the original [k-diffusion](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72) implementation by Katherine Crowson. -Fast scheduler which often times generates good outputs with 20-30 steps. - -## EulerAncestralDiscreteScheduler -[[autodoc]] EulerAncestralDiscreteScheduler diff --git a/diffusers/docs/source/en/api/schedulers/heun.mdx b/diffusers/docs/source/en/api/schedulers/heun.mdx deleted file mode 100644 index 245c20584c6d4e35e2f0f12afd6ea5da7c220ffe..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/heun.mdx +++ /dev/null @@ -1,23 +0,0 @@ - - -# Heun scheduler inspired by Karras et. al paper - -## Overview - -Algorithm 1 of [Karras et. al](https://arxiv.org/abs/2206.00364). -Scheduler ported from @crowsonkb's https://github.com/crowsonkb/k-diffusion library: - -All credit for making this scheduler work goes to [Katherine Crowson](https://github.com/crowsonkb/) - -## HeunDiscreteScheduler -[[autodoc]] HeunDiscreteScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/ipndm.mdx b/diffusers/docs/source/en/api/schedulers/ipndm.mdx deleted file mode 100644 index 854713d22d77b5d179eb93a97b7a7e0082c7b543..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/ipndm.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# improved pseudo numerical methods for diffusion models (iPNDM) - -## Overview - -Original implementation can be found [here](https://github.com/crowsonkb/v-diffusion-pytorch/blob/987f8985e38208345c1959b0ea767a625831cc9b/diffusion/sampling.py#L296). - -## IPNDMScheduler -[[autodoc]] IPNDMScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/lms_discrete.mdx b/diffusers/docs/source/en/api/schedulers/lms_discrete.mdx deleted file mode 100644 index a7a6e87c85daed0ba5024ff2474c444ab6171068..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/lms_discrete.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Linear multistep scheduler for discrete beta schedules - -## Overview - -Original implementation can be found [here](https://arxiv.org/abs/2206.00364). - -## LMSDiscreteScheduler -[[autodoc]] LMSDiscreteScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/multistep_dpm_solver.mdx b/diffusers/docs/source/en/api/schedulers/multistep_dpm_solver.mdx deleted file mode 100644 index 588b453a0b00627315db8daa96582d754661c21e..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/multistep_dpm_solver.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Multistep DPM-Solver - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2206.00927) and the [improved version](https://arxiv.org/abs/2211.01095). The original implementation can be found [here](https://github.com/LuChengTHU/dpm-solver). - -## DPMSolverMultistepScheduler -[[autodoc]] DPMSolverMultistepScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/overview.mdx b/diffusers/docs/source/en/api/schedulers/overview.mdx deleted file mode 100644 index a8f4dcd4d0b06023ff3c4526416cc7947f271e15..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/overview.mdx +++ /dev/null @@ -1,92 +0,0 @@ - - -# Schedulers - -Diffusers contains multiple pre-built schedule functions for the diffusion process. - -## What is a scheduler? - -The schedule functions, denoted *Schedulers* in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample. That's why schedulers may also be called *Samplers* in other diffusion models implementations. - -- Schedulers define the methodology for iteratively adding noise to an image or for updating a sample based on model outputs. - - adding noise in different manners represent the algorithmic processes to train a diffusion model by adding noise to images. - - for inference, the scheduler defines how to update a sample based on an output from a pretrained model. -- Schedulers are often defined by a *noise schedule* and an *update rule* to solve the differential equation solution. - -### Discrete versus continuous schedulers - -All schedulers take in a timestep to predict the updated version of the sample being diffused. -The timesteps dictate where in the diffusion process the step is, where data is generated by iterating forward in time and inference is executed by propagating backwards through timesteps. -Different algorithms use timesteps that can be discrete (accepting `int` inputs), such as the [`DDPMScheduler`] or [`PNDMScheduler`], or continuous (accepting `float` inputs), such as the score-based schedulers [`ScoreSdeVeScheduler`] or [`ScoreSdeVpScheduler`]. - -## Designing Re-usable schedulers - -The core design principle between the schedule functions is to be model, system, and framework independent. -This allows for rapid experimentation and cleaner abstractions in the code, where the model prediction is separated from the sample update. -To this end, the design of schedulers is such that: - -- Schedulers can be used interchangeably between diffusion models in inference to find the preferred trade-off between speed and generation quality. -- Schedulers are currently by default in PyTorch, but are designed to be framework independent (partial Jax support currently exists). -- Many diffusion pipelines, such as [`StableDiffusionPipeline`] and [`DiTPipeline`] can use any of [`KarrasDiffusionSchedulers`] - -## Schedulers Summary - -The following table summarizes all officially supported schedulers, their corresponding paper - -| Scheduler | Paper | -|---|---| -| [ddim](./ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | -| [ddim_inverse](./ddim_inverse) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | -| [ddpm](./ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | -| [deis](./deis) | [**DEISMultistepScheduler**](https://arxiv.org/abs/2204.13902) | -| [singlestep_dpm_solver](./singlestep_dpm_solver) | [**Singlestep DPM-Solver**](https://arxiv.org/abs/2206.00927) | -| [multistep_dpm_solver](./multistep_dpm_solver) | [**Multistep DPM-Solver**](https://arxiv.org/abs/2206.00927) | -| [heun](./heun) | [**Heun scheduler inspired by Karras et. al paper**](https://arxiv.org/abs/2206.00364) | -| [dpm_discrete](./dpm_discrete) | [**DPM Discrete Scheduler inspired by Karras et. al paper**](https://arxiv.org/abs/2206.00364) | -| [dpm_discrete_ancestral](./dpm_discrete_ancestral) | [**DPM Discrete Scheduler with ancestral sampling inspired by Karras et. al paper**](https://arxiv.org/abs/2206.00364) | -| [stochastic_karras_ve](./stochastic_karras_ve) | [**Variance exploding, stochastic sampling from Karras et. al**](https://arxiv.org/abs/2206.00364) | -| [lms_discrete](./lms_discrete) | [**Linear multistep scheduler for discrete beta schedules**](https://arxiv.org/abs/2206.00364) | -| [pndm](./pndm) | [**Pseudo numerical methods for diffusion models (PNDM)**](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181) | -| [score_sde_ve](./score_sde_ve) | [**variance exploding stochastic differential equation (VE-SDE) scheduler**](https://arxiv.org/abs/2011.13456) | -| [ipndm](./ipndm) | [**improved pseudo numerical methods for diffusion models (iPNDM)**](https://github.com/crowsonkb/v-diffusion-pytorch/blob/987f8985e38208345c1959b0ea767a625831cc9b/diffusion/sampling.py#L296) | -| [score_sde_vp](./score_sde_vp) | [**Variance preserving stochastic differential equation (VP-SDE) scheduler**](https://arxiv.org/abs/2011.13456) | -| [euler](./euler) | [**Euler scheduler**](https://arxiv.org/abs/2206.00364) | -| [euler_ancestral](./euler_ancestral) | [**Euler Ancestral scheduler**](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72) | -| [vq_diffusion](./vq_diffusion) | [**VQDiffusionScheduler**](https://arxiv.org/abs/2111.14822) | -| [unipc](./unipc) | [**UniPCMultistepScheduler**](https://arxiv.org/abs/2302.04867) | -| [repaint](./repaint) | [**RePaint scheduler**](https://arxiv.org/abs/2201.09865) | - -## API - -The core API for any new scheduler must follow a limited structure. -- Schedulers should provide one or more `def step(...)` functions that should be called to update the generated sample iteratively. -- Schedulers should provide a `set_timesteps(...)` method that configures the parameters of a schedule function for a specific inference task. -- Schedulers should be framework-specific. - -The base class [`SchedulerMixin`] implements low level utilities used by multiple schedulers. - -### SchedulerMixin -[[autodoc]] SchedulerMixin - -### SchedulerOutput -The class [`SchedulerOutput`] contains the outputs from any schedulers `step(...)` call. - -[[autodoc]] schedulers.scheduling_utils.SchedulerOutput - -### KarrasDiffusionSchedulers - -`KarrasDiffusionSchedulers` encompasses the main generalization of schedulers in Diffusers. The schedulers in this class are distinguished, at a high level, by their noise sampling strategy; the type of network and scaling; and finally the training strategy or how the loss is weighed. - -The different schedulers, depending on the type of ODE solver, fall into the above taxonomy and provide a good abstraction for the design of the main schedulers implemented in Diffusers. The schedulers in this class are given below: - -[[autodoc]] schedulers.scheduling_utils.KarrasDiffusionSchedulers diff --git a/diffusers/docs/source/en/api/schedulers/pndm.mdx b/diffusers/docs/source/en/api/schedulers/pndm.mdx deleted file mode 100644 index 6670914b7ac0a0fd77224b06805fed2e463866e4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/pndm.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Pseudo numerical methods for diffusion models (PNDM) - -## Overview - -Original implementation can be found [here](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181). - -## PNDMScheduler -[[autodoc]] PNDMScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/repaint.mdx b/diffusers/docs/source/en/api/schedulers/repaint.mdx deleted file mode 100644 index b7e2bcf119c12ce63fde95a2c5c689bb97da8db5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/repaint.mdx +++ /dev/null @@ -1,23 +0,0 @@ - - -# RePaint scheduler - -## Overview - -DDPM-based inpainting scheduler for unsupervised inpainting with extreme masks. -Intended for use with [`RePaintPipeline`]. -Based on the paper [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2201.09865) -and the original implementation by Andreas Lugmayr et al.: https://github.com/andreas128/RePaint - -## RePaintScheduler -[[autodoc]] RePaintScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/score_sde_ve.mdx b/diffusers/docs/source/en/api/schedulers/score_sde_ve.mdx deleted file mode 100644 index 66a00c69e3b42d42093ca0434e0b56f9cb9aae52..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/score_sde_ve.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Variance Exploding Stochastic Differential Equation (VE-SDE) scheduler - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2011.13456). - -## ScoreSdeVeScheduler -[[autodoc]] ScoreSdeVeScheduler diff --git a/diffusers/docs/source/en/api/schedulers/score_sde_vp.mdx b/diffusers/docs/source/en/api/schedulers/score_sde_vp.mdx deleted file mode 100644 index ac1d2f109c81d1ab81b2b1d87e5280c6f870dc43..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/score_sde_vp.mdx +++ /dev/null @@ -1,26 +0,0 @@ - - -# Variance Preserving Stochastic Differential Equation (VP-SDE) scheduler - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2011.13456). - - - -Score SDE-VP is under construction. - - - -## ScoreSdeVpScheduler -[[autodoc]] schedulers.scheduling_sde_vp.ScoreSdeVpScheduler diff --git a/diffusers/docs/source/en/api/schedulers/singlestep_dpm_solver.mdx b/diffusers/docs/source/en/api/schedulers/singlestep_dpm_solver.mdx deleted file mode 100644 index 7142e0ded5a7833fd61bcbc1ae7018e0472c6fde..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/singlestep_dpm_solver.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Singlestep DPM-Solver - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2206.00927) and the [improved version](https://arxiv.org/abs/2211.01095). The original implementation can be found [here](https://github.com/LuChengTHU/dpm-solver). - -## DPMSolverSinglestepScheduler -[[autodoc]] DPMSolverSinglestepScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/stochastic_karras_ve.mdx b/diffusers/docs/source/en/api/schedulers/stochastic_karras_ve.mdx deleted file mode 100644 index b8e4f9ff7e99c897c78a2a43e50ae047564460e9..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/stochastic_karras_ve.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# Variance exploding, stochastic sampling from Karras et. al - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2206.00364). - -## KarrasVeScheduler -[[autodoc]] KarrasVeScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/api/schedulers/unipc.mdx b/diffusers/docs/source/en/api/schedulers/unipc.mdx deleted file mode 100644 index 134dc1ef3170b7ee15b9af2c98eedec719ea8c98..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/unipc.mdx +++ /dev/null @@ -1,24 +0,0 @@ - - -# UniPC - -## Overview - -UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. - -For more details about the method, please refer to the [paper](https://arxiv.org/abs/2302.04867) and the [code](https://github.com/wl-zhao/UniPC). - -Fast Sampling of Diffusion Models with Exponential Integrator. - -## UniPCMultistepScheduler -[[autodoc]] UniPCMultistepScheduler diff --git a/diffusers/docs/source/en/api/schedulers/vq_diffusion.mdx b/diffusers/docs/source/en/api/schedulers/vq_diffusion.mdx deleted file mode 100644 index 0ed145119fd2b513a4a1e33af894ae1c0f71df49..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/api/schedulers/vq_diffusion.mdx +++ /dev/null @@ -1,20 +0,0 @@ - - -# VQDiffusionScheduler - -## Overview - -Original paper can be found [here](https://arxiv.org/abs/2111.14822) - -## VQDiffusionScheduler -[[autodoc]] VQDiffusionScheduler \ No newline at end of file diff --git a/diffusers/docs/source/en/conceptual/contribution.mdx b/diffusers/docs/source/en/conceptual/contribution.mdx deleted file mode 100644 index e9aa10a871d3afff3dbb9426db05baf6a0be3817..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/conceptual/contribution.mdx +++ /dev/null @@ -1,498 +0,0 @@ - - -# How to contribute to Diffusers 🧨 - -We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it! - -Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. Join us on Discord - -Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility. - -We enormously value feedback from the community, so please do not be afraid to speak up if you believe you have valuable feedback that can help improve the library - every message, comment, issue, and pull request (PR) is read and considered. - -## Overview - -You can contribute in many ways ranging from answering questions on issues to adding new diffusion models to -the core library. - -In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community. - -* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR). -* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) -* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) -* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). -* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source). -* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples) -* 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples). -* 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22). -* 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md). - -As said before, **all contributions are valuable to the community**. -In the following, we will explain each contribution a bit more in detail. - -For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull requst](#how-to-open-a-pr) - -### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord - -Any question or comment related to the Diffusers library can be asked on the [discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/) or on [Discord](https://discord.gg/G7tWnz98XR). Such questions and comments include (but are not limited to): -- Reports of training or inference experiments in an attempt to share knowledge -- Presentation of personal projects -- Questions to non-official training examples -- Project proposals -- General feedback -- Paper summaries -- Asking for help on personal projects that build on top of the Diffusers library -- General questions -- Ethical questions regarding diffusion models -- ... - -Every question that is asked on the forum or on Discord actively encourages the community to publicly -share knowledge and might very well help a beginner in the future that has the same question you're -having. Please do pose any questions you might have. -In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from. - -**Please** keep in mind that the more effort you put into asking or answering a question, the higher -the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database. -In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accesible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. - -**NOTE about channels**: -[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago. -In addition, questions and answers posted in the forum can easily be linked to. -In contrast, *Discord* has a chat-like format that invites fast back-and-forth communication. -While it will most likely take less time for you to get an answer to your question on Discord, your -question won't be visible anymore over time. Also, it's much harder to find information that was posted a while back on Discord. We therefore strongly recommend using the forum for high-quality questions and answers in an attempt to create long-lasting knowledge for the community. If discussions on Discord lead to very interesting answers and conclusions, we recommend posting the results on the forum to make the information more available for future readers. - -### 2. Opening new issues on the GitHub issues tab - -The 🧨 Diffusers library is robust and reliable thanks to the users who notify us of -the problems they encounter. So thank you for reporting an issue. - -Remember, GitHub issues are reserved for technical questions directly related to the Diffusers library, bug reports, feature requests, or feedback on the library design. - -In a nutshell, this means that everything that is **not** related to the **code of the Diffusers library** (including the documentation) should **not** be asked on GitHub, but rather on either the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR). - -**Please consider the following guidelines when opening a new issue**: -- Make sure you have searched whether your issue has already been asked before (use the search bar on GitHub under Issues). -- Please never report a new issue on another (related) issue. If another issue is highly related, please -open a new issue nevertheless and link to the related issue. -- Make sure your issue is written in English. Please use one of the great, free online translation services, such as [DeepL](https://www.deepl.com/translator) to translate from your native language to English if you are not comfortable in English. -- Check whether your issue might be solved by updating to the newest Diffusers version. Before posting your issue, please make sure that `python -c "import diffusers; print(diffusers.__version__)"` is higher or matches the latest Diffusers version. -- Remember that the more effort you put into opening a new issue, the higher the quality of your answer will be and the better the overall quality of the Diffusers issues. - -New issues usually include the following. - -#### 2.1. Reproducible, minimal bug reports. - -A bug report should always have a reproducible code snippet and be as minimal and concise as possible. -This means in more detail: -- Narrow the bug down as much as you can, **do not just dump your whole code file** -- Format your code -- Do not include any external libraries except for Diffusers depending on them. -- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue. -- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it. -- **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell. -- If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible. - -For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. - -You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new/choose). - -#### 2.2. Feature requests. - -A world-class feature request addresses the following points: - -1. Motivation first: -* Is it related to a problem/frustration with the library? If so, please explain -why. Providing a code snippet that demonstrates the problem is best. -* Is it related to something you would need for a project? We'd love to hear -about it! -* Is it something you worked on and think could benefit the community? -Awesome! Tell us what problem it solved for you. -2. Write a *full paragraph* describing the feature; -3. Provide a **code snippet** that demonstrates its future use; -4. In case this is related to a paper, please attach a link; -5. Attach any additional information (drawings, screenshots, etc.) you think may help. - -You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=). - -#### 2.3 Feedback. - -Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed. -If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions. - -You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=). - -#### 2.4 Technical questions. - -Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on -why this part of the code is difficult to understand. - -You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml). - -#### 2.5 Proposal to add a new model, scheduler, or pipeline. - -If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information: - -* Short description of the diffusion pipeline, model, or scheduler and link to the paper or public release. -* Link to any of its open-source implementation. -* Link to the model weights if they are available. - -If you are willing to contribute to the model yourself, let us know so we can best guide you. Also, don't forget -to tag the original author of the component (model, scheduler, pipeline, etc.) by GitHub handle if you can find it. - -You can open a request for a model/pipeline/scheduler [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=New+model%2Fpipeline%2Fscheduler&template=new-model-addition.yml). - -### 3. Answering issues on the GitHub issues tab - -Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct. -Some tips to give a high-quality answer to an issue: -- Be as concise and minimal as possible -- Stay on topic. An answer to the issue should concern the issue and only the issue. -- Provide links to code, papers, or other sources that prove or encourage your point. -- Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet. - -Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great -help to the maintainers if you can answer such issues, encouraging the author of the issue to be -more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR) - -If you have verified that the issued bug report is correct and requires a correction in the source code, -please have a look at the next sections. - -For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull requst](#how-to-open-a-pr) section. - -### 4. Fixing a "Good first issue" - -*Good first issues* are marked by the [Good first issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) label. Usually, the issue already -explains how a potential solution should look so that it is easier to fix. -If the issue hasn't been closed and you would like to try to fix this issue, you can just leave a message "I would like to try this issue.". There are usually three scenarios: -- a.) The issue description already proposes a fix. In this case and if the solution makes sense to you, you can open a PR or draft PR to fix it. -- b.) The issue description does not propose a fix. In this case, you can ask what a proposed fix could look like and someone from the Diffusers team should answer shortly. If you have a good idea of how to fix it, feel free to directly open a PR. -- c.) There is already an open PR to fix the issue, but the issue hasn't been closed yet. If the PR has gone stale, you can simply open a new PR and link to the stale PR. PRs often go stale if the original contributor who wanted to fix the issue suddenly cannot find the time anymore to proceed. This often happens in open-source and is very normal. In this case, the community will be very happy if you give it a new try and leverage the knowledge of the existing PR. If there is already a PR and it is active, you can help the author by giving suggestions, reviewing the PR or even asking whether you can contribute to the PR. - - -### 5. Contribute to the documentation - -A good library **always** has good documentation! The official documentation is often one of the first points of contact for new users of the library, and therefore contributing to the documentation is a **highly -valuable contribution**. - -Contributing to the library can have many forms: - -- Correcting spelling or grammatical errors. -- Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we are very happy if you take some time to correct it. -- Correct the shape or dimensions of a docstring input or output tensor. -- Clarify documentation that is hard to understand or incorrect. -- Update outdated code examples. -- Translating the documentation to another language. - -Anything displayed on [the official Diffusers doc page](https://huggingface.co/docs/diffusers/index) is part of the official documentation and can be corrected, adjusted in the respective [documentation source](https://github.com/huggingface/diffusers/tree/main/docs/source). - -Please have a look at [this page](https://github.com/huggingface/diffusers/tree/main/docs) on how to verify changes made to the documentation locally. - - -### 6. Contribute a community pipeline - -[Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user. -Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview). -We support two types of pipelines: - -- Official Pipelines -- Community Pipelines - -Both official and community pipelines follow the same design and consist of the same type of components. - -Official pipelines are tested and maintained by the core maintainers of Diffusers. Their code -resides in [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines). -In contrast, community pipelines are contributed and maintained purely by the **community** and are **not** tested. -They reside in [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and while they can be accessed via the [PyPI diffusers package](https://pypi.org/project/diffusers/), their code is not part of the PyPI distribution. - -The reason for the distinction is that the core maintainers of the Diffusers library cannot maintain and test all -possible ways diffusion models can be used for inference, but some of them may be of interest to the community. -Officially released diffusion pipelines, -such as Stable Diffusion are added to the core src/diffusers/pipelines package which ensures -high quality of maintenance, no backward-breaking code changes, and testing. -More bleeding edge pipelines should be added as community pipelines. If usage for a community pipeline is high, the pipeline can be moved to the official pipelines upon request from the community. This is one of the ways we strive to be a community-driven library. - -To add a community pipeline, one should add a .py file to [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and adapt the [examples/community/README.md](https://github.com/huggingface/diffusers/tree/main/examples/community/README.md) to include an example of the new pipeline. - -An example can be seen [here](https://github.com/huggingface/diffusers/pull/2400). - -Community pipeline PRs are only checked at a superficial level and ideally they should be maintained by their original authors. - -Contributing a community pipeline is a great way to understand how Diffusers models and schedulers work. Having contributed a community pipeline is usually the first stepping stone to contributing an official pipeline to the -core package. - -### 7. Contribute to training examples - -Diffusers examples are a collection of training scripts that reside in [examples](https://github.com/huggingface/diffusers/tree/main/examples). - -We support two types of training examples: - -- Official training examples -- Research training examples - -Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders. -The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community. -This is because of the same reasons put forward in [6. Contribute a community pipeline](#contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. -If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. - -Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the -training examples, it is required to clone the repository: - -``` -git clone https://github.com/huggingface/diffusers -``` - -as well as to install all additional dependencies required for training: - -``` -pip install -r /examples//requirements.txt -``` - -Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). - -Training examples of the Diffusers library should adhere to the following philosophy: -- All the code necessary to run the examples should be found in a single Python file -- One should be able to run the example from the command line with `python .py --args` -- Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials. - -To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like. -We strongly advise contributors to make use of the [Accelerate library](https://github.com/huggingface/accelerate) as it's tightly integrated -with Diffusers. -Once an example script works, please make sure to add a comprehensive `README.md` that states how to use the example exactly. This README should include: -- An example command on how to run the example script as shown [here e.g.](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch). -- A link to some training results (logs, models, ...) that show what the user can expect as shown [here e.g.](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5). -- If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations). - -If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples. - -### 8. Fixing a "Good second issue" - -*Good second issues* are marked by the [Good second issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) label. Good second issues are -usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). -The issue description usually gives less guidance on how to fix the issue and requires -a decent understanding of the library by the interested contributor. -If you are interested in tackling a second good issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR. -Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged. - -### 9. Adding pipelines, models, schedulers - -Pipelines, models, and schedulers are the most important pieces of the Diffusers library. -They provide easy access to state-of-the-art diffusion technologies and thus allow the community to -build powerful generative AI applications. - -By adding a new model, pipeline, or scheduler you might enable a new powerful use case for any of the user interfaces relying on Diffusers which can be of immense value for the whole generative AI ecosystem. - -Diffusers has a couple of open feature requests for all three components - feel free to gloss over them -if you don't know yet what specific component you would like to add: -- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) -- [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) - -Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) a read to better understand the design of any of the three components. Please be aware that -we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy -as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please -open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design -pattern/design choice shall be changed everywhere in the library and whether we shall update our design philosophy. Consistency across the library is very important for us. - -Please make sure to add links to the original codebase/paper to the PR and ideally also ping the -original author directly on the PR so that they can follow the progress and potentially help with questions. - -If you are unsure or stuck in the PR, don't hesitate to leave a message to ask for a first review or help. - -## How to write a good issue - -**The better your issue is written, the higher the chances that it will be quickly resolved.** - -1. Make sure that you've used the correct template for your issue. You can pick between *Bug Report*, *Feature Request*, *Feedback about API Design*, *New model/pipeline/scheduler addition*, *Forum*, or a blank issue. Make sure to pick the correct one when opening [a new issue](https://github.com/huggingface/diffusers/issues/new/choose). -2. **Be precise**: Give your issue a fitting title. Try to formulate your issue description as simple as possible. The more precise you are when submitting an issue, the less time it takes to understand the issue and potentially solve it. Make sure to open an issue for one issue only and not for multiple issues. If you found multiple issues, simply open multiple issues. If your issue is a bug, try to be as precise as possible about what bug it is - you should not just write "Error in diffusers". -3. **Reproducibility**: No reproducible code snippet == no solution. If you encounter a bug, maintainers **have to be able to reproduce** it. Make sure that you include a code snippet that can be copy-pasted into a Python interpreter to reproduce the issue. Make sure that your code snippet works, *i.e.* that there are no missing imports or missing links to images, ... Your issue should contain an error message **and** a code snippet that can be copy-pasted without any changes to reproduce the exact same error message. If your issue is using local model weights or local data that cannot be accessed by the reader, the issue cannot be solved. If you cannot share your data or model, try to make a dummy model or dummy data. -4. **Minimalistic**: Try to help the reader as much as you can to understand the issue as quickly as possible by staying as concise as possible. Remove all code / all information that is irrelevant to the issue. If you have found a bug, try to create the easiest code example you can to demonstrate your issue, do not just dump your whole workflow into the issue as soon as you have found a bug. E.g., if you train a model and get an error at some point during the training, you should first try to understand what part of the training code is responsible for the error and try to reproduce it with a couple of lines. Try to use dummy data instead of full datasets. -5. Add links. If you are referring to a certain naming, method, or model make sure to provide a link so that the reader can better understand what you mean. If you are referring to a specific PR or issue, make sure to link it to your issue. Do not assume that the reader knows what you are talking about. The more links you add to your issue the better. -6. Formatting. Make sure to nicely format your issue by formatting code into Python code syntax, and error messages into normal code syntax. See the [official GitHub formatting docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) for more information. -7. Think of your issue not as a ticket to be solved, but rather as a beautiful entry to a well-written encyclopedia. Every added issue is a contribution to publicly available knowledge. By adding a nicely written issue you not only make it easier for maintainers to solve your issue, but you are helping the whole community to better understand a certain aspect of the library. - -## How to write a good PR - -1. Be a chameleon. Understand existing design patterns and syntax and make sure your code additions flow seamlessly into the existing code base. Pull requests that significantly diverge from existing design patterns or user interfaces will not be merged. -2. Be laser focused. A pull request should solve one problem and one problem only. Make sure to not fall into the trap of "also fixing another problem while we're adding it". It is much more difficult to review pull requests that solve multiple, unrelated problems at once. -3. If helpful, try to add a code snippet that displays an example of how your addition can be used. -4. The title of your pull request should be a summary of its contribution. -5. If your pull request addresses an issue, please mention the issue number in -the pull request description to make sure they are linked (and people -consulting the issue know you are working on it); -6. To indicate a work in progress please prefix the title with `[WIP]`. These -are useful to avoid duplicated work, and to differentiate it from PRs ready -to be merged; -7. Try to formulate and format your text as explained in [How to write a good issue](#how-to-write-a-good-issue). -8. Make sure existing tests pass; -9. Add high-coverage tests. No quality testing = no merge. -- If you are adding new `@slow` tests, make sure they pass using -`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`. -CircleCI does not run the slow tests, but GitHub actions does every night! -10. All public methods must have informative docstrings that work nicely with markdown. See `[pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py)` for an example. -11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like -[`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files. -If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images -to this dataset. - -## How to open a PR - -Before writing code, we strongly advise you to search through the existing PRs or -issues to make sure that nobody is already working on the same thing. If you are -unsure, it is always a good idea to open an issue to get some feedback. - -You will need basic `git` proficiency to be able to contribute to -🧨 Diffusers. `git` is not the easiest tool to use but it has the greatest -manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro -Git](https://git-scm.com/book/en/v2) is a very good reference. - -Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)): - -1. Fork the [repository](https://github.com/huggingface/diffusers) by -clicking on the 'Fork' button on the repository's page. This creates a copy of the code -under your GitHub user account. - -2. Clone your fork to your local disk, and add the base repository as a remote: - - ```bash - $ git clone git@github.com:/diffusers.git - $ cd diffusers - $ git remote add upstream https://github.com/huggingface/diffusers.git - ``` - -3. Create a new branch to hold your development changes: - - ```bash - $ git checkout -b a-descriptive-name-for-my-changes - ``` - -**Do not** work on the `main` branch. - -4. Set up a development environment by running the following command in a virtual environment: - - ```bash - $ pip install -e ".[dev]" - ``` - -If you have already cloned the repo, you might need to `git pull` to get the most recent changes in the -library. - -5. Develop the features on your branch. - -As you work on the features, you should make sure that the test suite -passes. You should run the tests impacted by your changes like this: - - ```bash - $ pytest tests/.py - ``` - -You can also run the full suite with the following command, but it takes -a beefy machine to produce a result in a decent amount of time now that -Diffusers has grown a lot. Here is the command for it: - - ```bash - $ make test - ``` - -🧨 Diffusers relies on `black` and `isort` to format its source code -consistently. After you make changes, apply automatic style corrections and code verifications -that can't be automated in one go with: - - ```bash - $ make style - ``` - -🧨 Diffusers also uses `ruff` and a few custom scripts to check for coding mistakes. Quality -control runs in CI, however, you can also run the same checks with: - - ```bash - $ make quality - ``` - -Once you're happy with your changes, add changed files using `git add` and -make a commit with `git commit` to record your changes locally: - - ```bash - $ git add modified_file.py - $ git commit - ``` - -It is a good idea to sync your copy of the code with the original -repository regularly. This way you can quickly account for changes: - - ```bash - $ git pull upstream main - ``` - -Push the changes to your account using: - - ```bash - $ git push -u origin a-descriptive-name-for-my-changes - ``` - -6. Once you are satisfied, go to the -webpage of your fork on GitHub. Click on 'Pull request' to send your changes -to the project maintainers for review. - -7. It's ok if maintainers ask you for changes. It happens to core contributors -too! So everyone can see the changes in the Pull request, work in your local -branch and push the changes to your fork. They will automatically appear in -the pull request. - -### Tests - -An extensive test suite is included to test the library behavior and several examples. Library tests can be found in -the [tests folder](https://github.com/huggingface/diffusers/tree/main/tests). - -We like `pytest` and `pytest-xdist` because it's faster. From the root of the -repository, here's how to run tests with `pytest` for the library: - -```bash -$ python -m pytest -n auto --dist=loadfile -s -v ./tests/ -``` - -In fact, that's how `make test` is implemented! - -You can specify a smaller set of tests in order to test only the feature -you're working on. - -By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to -`yes` to run them. This will download many gigabytes of models — make sure you -have enough disk space and a good Internet connection, or a lot of patience! - -```bash -$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/ -``` - -`unittest` is fully supported, here's how to run tests with it: - -```bash -$ python -m unittest discover -s tests -t . -v -$ python -m unittest discover -s examples -t examples -v -``` - -### Syncing forked main with upstream (HuggingFace) main - -To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs, -when syncing the main branch of a forked repository, please, follow these steps: -1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main. -2. If a PR is absolutely necessary, use the following steps after checking out your branch: -``` -$ git checkout -b your-branch-for-syncing -$ git pull --squash --no-commit upstream main -$ git commit -m '' -$ git push --set-upstream origin your-branch-for-syncing -``` - -### Style guide - -For documentation strings, 🧨 Diffusers follows the [google style](https://google.github.io/styleguide/pyguide.html). diff --git a/diffusers/docs/source/en/conceptual/ethical_guidelines.mdx b/diffusers/docs/source/en/conceptual/ethical_guidelines.mdx deleted file mode 100644 index 0c1a7b789203e02ee8e1e43b78311940e197540e..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/conceptual/ethical_guidelines.mdx +++ /dev/null @@ -1,51 +0,0 @@ -# 🧨 Diffusers’ Ethical Guidelines - -## Preamble - -[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training. - -Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library. - -The risks associated with using this technology are still being examined, but to name a few: copyrights issues for artists; deep-fake exploitation; sexual content generation in inappropriate contexts; non-consensual impersonation; harmful social biases perpetuating the oppression of marginalized groups. -We will keep tracking risks and adapt the following guidelines based on the community's responsiveness and valuable feedback. - - -## Scope - -The Diffusers community will apply the following ethical guidelines to the project’s development and help coordinate how the community will integrate the contributions, especially concerning sensitive topics related to ethical concerns. - - -## Ethical guidelines - -The following ethical guidelines apply generally, but we will primarily implement them when dealing with ethically sensitive issues while making a technical choice. Furthermore, we commit to adapting those ethical principles over time following emerging harms related to the state of the art of the technology in question. - -- **Transparency**: we are committed to being transparent in managing PRs, explaining our choices to users, and making technical decisions. - -- **Consistency**: we are committed to guaranteeing our users the same level of attention in project management, keeping it technically stable and consistent. - -- **Simplicity**: with a desire to make it easy to use and exploit the Diffusers library, we are committed to keeping the project’s goals lean and coherent. - -- **Accessibility**: the Diffusers project helps lower the entry bar for contributors who can help run it even without technical expertise. Doing so makes research artifacts more accessible to the community. - -- **Reproducibility**: we aim to be transparent about the reproducibility of upstream code, models, and datasets when made available through the Diffusers library. - -- **Responsibility**: as a community and through teamwork, we hold a collective responsibility to our users by anticipating and mitigating this technology's potential risks and dangers. - - -## Examples of implementations: Safety features and Mechanisms - -The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us. - -- [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project. - -- **Bias exploration and evaluation**: the Hugging Face team provides a [space](https://huggingface.co/spaces/society-ethics/DiffusionBiasExplorer) to demonstrate the biases in Stable Diffusion interactively. In this sense, we support and encourage bias explorers and evaluations. - -- **Encouraging safety in deployment** - - - [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://arxiv.org/abs/2211.05105). - - - [**Safety Checker**](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py): It checks and compares the class probability of a set of hard-coded harmful concepts in the embedding space against an image after it has been generated. The harmful concepts are intentionally hidden to prevent reverse engineering of the checker. - -- **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use. - -- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use. diff --git a/diffusers/docs/source/en/conceptual/evaluation.mdx b/diffusers/docs/source/en/conceptual/evaluation.mdx deleted file mode 100644 index 2721adea0c160bfb0d80dd078364df60d8e19e10..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/conceptual/evaluation.mdx +++ /dev/null @@ -1,565 +0,0 @@ - - -# Evaluating Diffusion Models - - - Open In Colab - - -Evaluation of generative models like [Stable Diffusion](https://huggingface.co/docs/diffusers/stable_diffusion) is subjective in nature. But as practitioners and researchers, we often have to make careful choices amongst many different possibilities. So, when working with different generative models (like GANs, Diffusion, etc.), how do we choose one over the other? - -Qualitative evaluation of such models can be error-prone and might incorrectly influence a decision. -However, quantitative metrics don't necessarily correspond to image quality. So, usually, a combination -of both qualitative and quantitative evaluations provides a stronger signal when choosing one model -over the other. - -In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`. - -The methods shown in this document can also be used to evaluate different [noise schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) keeping the underlying generation model fixed. - -## Scenarios - -We cover Diffusion models with the following pipelines: - -- Text-guided image generation (such as the [`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img)). -- Text-guided image generation, additionally conditioned on an input image (such as the [`StableDiffusionImg2ImgPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img), and [`StableDiffusionInstructPix2PixPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix)). -- Class-conditioned image generation models (such as the [`DiTPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit)). - -## Qualitative Evaluation - -Qualitative evaluation typically involves human assessment of generated images. Quality is measured across aspects such as compositionality, image-text alignment, and spatial relations. Common prompts provide a degree of uniformity for subjective metrics. DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively. - -From the [official Parti website](https://parti.research.google/): - -> PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects. - -![parti-prompts](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts.png) - -PartiPrompts has the following columns: - -- Prompt -- Category of the prompt (such as “Abstract”, “World Knowledge”, etc.) -- Challenge reflecting the difficulty (such as “Basic”, “Complex”, “Writing & Symbols”, etc.) - -These benchmarks allow for side-by-side human evaluation of different image generation models. Let’s see how we can use `diffusers` on a couple of PartiPrompts. - -Below we show some prompts sampled across different challenges: Basic, Complex, Linguistic Structures, Imagination, and Writing & Symbols. Here we are using PartiPrompts as a [dataset](https://huggingface.co/datasets/nateraw/parti-prompts). - -```python -from datasets import load_dataset - -# prompts = load_dataset("nateraw/parti-prompts", split="train") -# prompts = prompts.shuffle() -# sample_prompts = [prompts[i]["Prompt"] for i in range(5)] - -# Fixing these sample prompts in the interest of reproducibility. -sample_prompts = [ - "a corgi", - "a hot air balloon with a yin-yang symbol, with the moon visible in the daytime sky", - "a car with no windows", - "a cube made of porcupine", - 'The saying "BE EXCELLENT TO EACH OTHER" written on a red brick wall with a graffiti image of a green alien wearing a tuxedo. A yellow fire hydrant is on a sidewalk in the foreground.', -] -``` - -Now we can use these prompts to generate some images using Stable Diffusion ([v1-4 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4)): - -```python -import torch - -seed = 0 -generator = torch.manual_seed(seed) - -images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images -``` - -![parti-prompts-14](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png) - -We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), yields: - -![parti-prompts-15](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-15.png) - -Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For -more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers. - - - -It is useful to look at some inference samples while a model is training to measure the -training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for -logging to TensorBoard and Weights & Biases. - - - -## Quantitative Evaluation - -In this section, we will walk you through how to evaluate three different diffusion pipelines using: - -- CLIP score -- CLIP directional similarity -- FID - -### Text-guided image generation - -[CLIP score](https://arxiv.org/abs/2104.08718) measures the compatibility of image-caption pairs. Higher CLIP scores imply higher compatibility 🔼. The CLIP score is a quantitative measurement of the qualitative concept "compatibility". Image-caption pair compatibility can also be thought of as the semantic similarity between the image and the caption. CLIP score was found to have high correlation with human judgement. - -Let's first load a [`StableDiffusionPipeline`]: - -```python -from diffusers import StableDiffusionPipeline -import torch - -model_ckpt = "CompVis/stable-diffusion-v1-4" -sd_pipeline = StableDiffusionPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16).to("cuda") -``` - -Generate some images with multiple prompts: - -```python -prompts = [ - "a photo of an astronaut riding a horse on mars", - "A high tech solarpunk utopia in the Amazon rainforest", - "A pikachu fine dining with a view to the Eiffel Tower", - "A mecha robot in a favela in expressionist style", - "an insect robot preparing a delicious meal", - "A small cabin on top of a snowy mountain in the style of Disney, artstation", -] - -images = sd_pipeline(prompts, num_images_per_prompt=1, output_type="numpy").images - -print(images.shape) -# (6, 512, 512, 3) -``` - -And then, we calculate the CLIP score. - -```python -from torchmetrics.functional.multimodal import clip_score -from functools import partial - -clip_score_fn = partial(clip_score, model_name_or_path="openai/clip-vit-base-patch16") - - -def calculate_clip_score(images, prompts): - images_int = (images * 255).astype("uint8") - clip_score = clip_score_fn(torch.from_numpy(images_int).permute(0, 3, 1, 2), prompts).detach() - return round(float(clip_score), 4) - - -sd_clip_score = calculate_clip_score(images, prompts) -print(f"CLIP score: {sd_clip_score}") -# CLIP score: 35.7038 -``` - -In the above example, we generated one image per prompt. If we generated multiple images per prompt, we would have to take the average score from the generated images per prompt. - -Now, if we wanted to compare two checkpoints compatible with the [`StableDiffusionPipeline`] we should pass a generator while calling the pipeline. First, we generate images with a -fixed seed with the [v1-4 Stable Diffusion checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4): - -```python -seed = 0 -generator = torch.manual_seed(seed) - -images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images -``` - -Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images: - -```python -model_ckpt_1_5 = "runwayml/stable-diffusion-v1-5" -sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=weight_dtype).to(device) - -images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="numpy").images -``` - -And finally, we compare their CLIP scores: - -```python -sd_clip_score_1_4 = calculate_clip_score(images, prompts) -print(f"CLIP Score with v-1-4: {sd_clip_score_1_4}") -# CLIP Score with v-1-4: 34.9102 - -sd_clip_score_1_5 = calculate_clip_score(images_1_5, prompts) -print(f"CLIP Score with v-1-5: {sd_clip_score_1_5}") -# CLIP Score with v-1-5: 36.2137 -``` - -It seems like the [v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint performs better than its predecessor. Note, however, that the number of prompts we used to compute the CLIP scores is quite low. For a more practical evaluation, this number should be way higher, and the prompts should be diverse. - - - -By construction, there are some limitations in this score. The captions in the training dataset -were crawled from the web and extracted from `alt` and similar tags associated an image on the internet. -They are not necessarily representative of what a human being would use to describe an image. Hence we -had to "engineer" some prompts here. - - - -### Image-conditioned text-to-image generation - -In this case, we condition the generation pipeline with an input image as well as a text prompt. Let's take the [`StableDiffusionInstructPix2PixPipeline`], as an example. It takes an edit instruction as an input prompt and an input image to be edited. - -Here is one example: - -![edit-instruction](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png) - -One strategy to evaluate such a model is to measure the consistency of the change between the two images (in [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) space) with the change between the two image captions (as shown in [CLIP-Guided Domain Adaptation of Image Generators](https://arxiv.org/abs/2108.00946)). This is referred to as the "**CLIP directional similarity**". - -- Caption 1 corresponds to the input image (image 1) that is to be edited. -- Caption 2 corresponds to the edited image (image 2). It should reflect the edit instruction. - -Following is a pictorial overview: - -![edit-consistency](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-consistency.png) - -We have prepared a mini dataset to implement this metric. Let's first load the dataset. - -```python -from datasets import load_dataset - -dataset = load_dataset("sayakpaul/instructpix2pix-demo", split="train") -dataset.features -``` - -```bash -{'input': Value(dtype='string', id=None), - 'edit': Value(dtype='string', id=None), - 'output': Value(dtype='string', id=None), - 'image': Image(decode=True, id=None)} -``` - -Here we have: - -- `input` is a caption corresponding to the `image`. -- `edit` denotes the edit instruction. -- `output` denotes the modified caption reflecting the `edit` instruction. - -Let's take a look at a sample. - -```python -idx = 0 -print(f"Original caption: {dataset[idx]['input']}") -print(f"Edit instruction: {dataset[idx]['edit']}") -print(f"Modified caption: {dataset[idx]['output']}") -``` - -```bash -Original caption: 2. FAROE ISLANDS: An archipelago of 18 mountainous isles in the North Atlantic Ocean between Norway and Iceland, the Faroe Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills' -Edit instruction: make the isles all white marble -Modified caption: 2. WHITE MARBLE ISLANDS: An archipelago of 18 mountainous white marble isles in the North Atlantic Ocean between Norway and Iceland, the White Marble Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills' -``` - -And here is the image: - -```python -dataset[idx]["image"] -``` - -![edit-dataset](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-dataset.png) - -We will first edit the images of our dataset with the edit instruction and compute the directional similarity. - -Let's first load the [`StableDiffusionInstructPix2PixPipeline`]: - -```python -from diffusers import StableDiffusionInstructPix2PixPipeline - -instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained( - "timbrooks/instruct-pix2pix", torch_dtype=torch.float16 -).to(device) -``` - -Now, we perform the edits: - -```python -import numpy as np - - -def edit_image(input_image, instruction): - image = instruct_pix2pix_pipeline( - instruction, - image=input_image, - output_type="numpy", - generator=generator, - ).images[0] - return image - - -input_images = [] -original_captions = [] -modified_captions = [] -edited_images = [] - -for idx in range(len(dataset)): - input_image = dataset[idx]["image"] - edit_instruction = dataset[idx]["edit"] - edited_image = edit_image(input_image, edit_instruction) - - input_images.append(np.array(input_image)) - original_captions.append(dataset[idx]["input"]) - modified_captions.append(dataset[idx]["output"]) - edited_images.append(edited_image) -``` - -To measure the directional similarity, we first load CLIP's image and text encoders: - -```python -from transformers import ( - CLIPTokenizer, - CLIPTextModelWithProjection, - CLIPVisionModelWithProjection, - CLIPImageProcessor, -) - -clip_id = "openai/clip-vit-large-patch14" -tokenizer = CLIPTokenizer.from_pretrained(clip_id) -text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to(device) -image_processor = CLIPImageProcessor.from_pretrained(clip_id) -image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to(device) -``` - -Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix#diffusers.StableDiffusionInstructPix2PixPipeline.text_encoder). - -Next, we prepare a PyTorch `nn.Module` to compute directional similarity: - -```python -import torch.nn as nn -import torch.nn.functional as F - - -class DirectionalSimilarity(nn.Module): - def __init__(self, tokenizer, text_encoder, image_processor, image_encoder): - super().__init__() - self.tokenizer = tokenizer - self.text_encoder = text_encoder - self.image_processor = image_processor - self.image_encoder = image_encoder - - def preprocess_image(self, image): - image = self.image_processor(image, return_tensors="pt")["pixel_values"] - return {"pixel_values": image.to(device)} - - def tokenize_text(self, text): - inputs = self.tokenizer( - text, - max_length=self.tokenizer.model_max_length, - padding="max_length", - truncation=True, - return_tensors="pt", - ) - return {"input_ids": inputs.input_ids.to(device)} - - def encode_image(self, image): - preprocessed_image = self.preprocess_image(image) - image_features = self.image_encoder(**preprocessed_image).image_embeds - image_features = image_features / image_features.norm(dim=1, keepdim=True) - return image_features - - def encode_text(self, text): - tokenized_text = self.tokenize_text(text) - text_features = self.text_encoder(**tokenized_text).text_embeds - text_features = text_features / text_features.norm(dim=1, keepdim=True) - return text_features - - def compute_directional_similarity(self, img_feat_one, img_feat_two, text_feat_one, text_feat_two): - sim_direction = F.cosine_similarity(img_feat_two - img_feat_one, text_feat_two - text_feat_one) - return sim_direction - - def forward(self, image_one, image_two, caption_one, caption_two): - img_feat_one = self.encode_image(image_one) - img_feat_two = self.encode_image(image_two) - text_feat_one = self.encode_text(caption_one) - text_feat_two = self.encode_text(caption_two) - directional_similarity = self.compute_directional_similarity( - img_feat_one, img_feat_two, text_feat_one, text_feat_two - ) - return directional_similarity -``` - -Let's put `DirectionalSimilarity` to use now. - -```python -dir_similarity = DirectionalSimilarity(tokenizer, text_encoder, image_processor, image_encoder) -scores = [] - -for i in range(len(input_images)): - original_image = input_images[i] - original_caption = original_captions[i] - edited_image = edited_images[i] - modified_caption = modified_captions[i] - - similarity_score = dir_similarity(original_image, edited_image, original_caption, modified_caption) - scores.append(float(similarity_score.detach().cpu())) - -print(f"CLIP directional similarity: {np.mean(scores)}") -# CLIP directional similarity: 0.0797976553440094 -``` - -Like the CLIP Score, the higher the CLIP directional similarity, the better it is. - -It should be noted that the `StableDiffusionInstructPix2PixPipeline` exposes two arguments, namely, `image_guidance_scale` and `guidance_scale` that let you control the quality of the final edited image. We encourage you to experiment with these two arguments and see the impact of that on the directional similarity. - -We can extend the idea of this metric to measure how similar the original image and edited version are. To do that, we can just do `F.cosine_similarity(img_feat_two, img_feat_one)`. For these kinds of edits, we would still want the primary semantics of the images to be preserved as much as possible, i.e., a high similarity score. - -We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline). - - - -Both CLIP score and CLIP direction similarity rely on the CLIP model, which can make the evaluations biased. - - - -***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction. - -***Using the above metrics helps evaluate models that are class-conditioned. For example, [DiT](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/overview). It was pre-trained being conditioned on the ImageNet-1k classes.*** - -### Class-conditioned image generation - -Class-conditioned generative models are usually pre-trained on a class-labeled dataset such as [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k). Popular metrics for evaluating these models include Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and Inception Score (IS). In this document, we focus on FID ([Heusel et al.](https://arxiv.org/abs/1706.08500)). We show how to compute it with the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit), which uses the [DiT model](https://arxiv.org/abs/2212.09748) under the hood. - -FID aims to measure how similar are two datasets of images. As per [this resource](https://mmgeneration.readthedocs.io/en/latest/quick_run.html#fid): - -> Fréchet Inception Distance is a measure of similarity between two datasets of images. It was shown to correlate well with the human judgment of visual quality and is most often used to evaluate the quality of samples of Generative Adversarial Networks. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network. - -These two datasets are essentially the dataset of real images and the dataset of fake images (generated images in our case). FID is usually calculated with two large datasets. However, for this document, we will work with two mini datasets. - -Let's first download a few images from the ImageNet-1k training set: - -```python -from zipfile import ZipFile -import requests - - -def download(url, local_filepath): - r = requests.get(url) - with open(local_filepath, "wb") as f: - f.write(r.content) - return local_filepath - - -dummy_dataset_url = "https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/sample-imagenet-images.zip" -local_filepath = download(dummy_dataset_url, dummy_dataset_url.split("/")[-1]) - -with ZipFile(local_filepath, "r") as zipper: - zipper.extractall(".") -``` - -```python -from PIL import Image -import os - -dataset_path = "sample-imagenet-images" -image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_path)]) - -real_images = [np.array(Image.open(path).convert("RGB")) for path in image_paths] -``` - -These are 10 images from the following Imagenet-1k classes: "cassette_player", "chain_saw" (x2), "church", "gas_pump" (x3), "parachute" (x2), and "tench". - -

- real-images
- Real images. -

- -Now that the images are loaded, let's apply some lightweight pre-processing on them to use them for FID calculation. - -```python -from torchvision.transforms import functional as F - - -def preprocess_image(image): - image = torch.tensor(image).unsqueeze(0) - image = image.permute(0, 3, 1, 2) / 255.0 - return F.center_crop(image, (256, 256)) - - -real_images = torch.cat([preprocess_image(image) for image in real_images]) -print(real_images.shape) -# torch.Size([10, 3, 256, 256]) -``` - -We now load the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit) to generate images conditioned on the above-mentioned classes. - -```python -from diffusers import DiTPipeline, DPMSolverMultistepScheduler - -dit_pipeline = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16) -dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config) -dit_pipeline = dit_pipeline.to("cuda") - -words = [ - "cassette player", - "chainsaw", - "chainsaw", - "church", - "gas pump", - "gas pump", - "gas pump", - "parachute", - "parachute", - "tench", -] - -class_ids = dit_pipeline.get_label_ids(words) -output = dit_pipeline(class_labels=class_ids, generator=generator, output_type="numpy") - -fake_images = output.images -fake_images = torch.tensor(fake_images) -fake_images = fake_images.permute(0, 3, 1, 2) -print(fake_images.shape) -# torch.Size([10, 3, 256, 256]) -``` - -Now, we can compute the FID using [`torchmetrics`](https://torchmetrics.readthedocs.io/). - -```python -from torchmetrics.image.fid import FrechetInceptionDistance - -fid = FrechetInceptionDistance(normalize=True) -fid.update(real_images, real=True) -fid.update(fake_images, real=False) - -print(f"FID: {float(fid.compute())}") -# FID: 177.7147216796875 -``` - -The lower the FID, the better it is. Several things can influence FID here: - -- Number of images (both real and fake) -- Randomness induced in the diffusion process -- Number of inference steps in the diffusion process -- The scheduler being used in the diffusion process - -For the last two points, it is, therefore, a good practice to run the evaluation across different seeds and inference steps, and then report an average result. - - - -FID results tend to be fragile as they depend on a lot of factors: - -* The specific Inception model used during computation. -* The implementation accuracy of the computation. -* The image format (not the same if we start from PNGs vs JPGs). - -Keeping that in mind, FID is often most useful when comparing similar runs, but it is -hard to reproduce paper results unless the authors carefully disclose the FID -measurement code. - -These points apply to other related metrics too, such as KID and IS. - - - -As a final step, let's visually inspect the `fake_images`. - -

- fake-images
- Fake images. -

diff --git a/diffusers/docs/source/en/conceptual/philosophy.mdx b/diffusers/docs/source/en/conceptual/philosophy.mdx deleted file mode 100644 index 564530f2cb489f652f8a0870f313659c8469cfcf..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/conceptual/philosophy.mdx +++ /dev/null @@ -1,110 +0,0 @@ - - -# Philosophy - -🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities. -Its purpose is to serve as a **modular toolbox** for both inference and training. - -We aim at building a library that stands the test of time and therefore take API design very seriously. - -In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones: - -## Usability over Performance - -- While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library. -- Diffusers aim at being a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages. -- Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired. - -## Simple over easy - -As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library: -- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management. -- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible. -- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers. -- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training -is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline. - -## Tweakable, contributor-friendly over abstraction - -For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). -In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers. -Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable. -**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because: -- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions. -- Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions. -- Open-source libraries rely on community contributions and therefore must build a library that is easy to contribute to. The more abstract the code, the more dependencies, the harder to read, and the harder to contribute to. Contributors simply stop contributing to very abstract libraries out of fear of breaking vital functionality. If contributing to a library cannot break other fundamental code, not only is it more inviting for potential new contributors, but it is also easier to review and contribute to multiple parts in parallel. - -At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look -at [this blog post](https://huggingface.co/blog/transformers-design-philosophy). - -In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such -as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel). - -Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗. -We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=). - -## Design Philosophy in Details - -Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consist of three major classes, [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). -Let's walk through more in-detail design decisions for each class. - -### Pipelines - -Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference. - -The following design principles are followed: -- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251). -- Pipelines all inherit from [`DiffusionPipeline`]. -- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function. -- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function. -- Pipelines should be used **only** for inference. -- Pipelines should be very readable, self-explanatory, and easy to tweak. -- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs. -- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner). -- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines. -- Pipelines should be named after the task they are intended to solve. -- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file. - -### Models - -Models are designed as configurable toolboxes that are natural extensions of [PyTorch's Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). They only partly follow the **single-file policy**. - -The following design principles are followed: -- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context. -- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc... -- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy. -- Models intend to expose complexity, just like PyTorch's module does, and give clear error messages. -- Models all inherit from `ModelMixin` and `ConfigMixin`. -- Models can be optimized for performance when it doesn’t demand major code changes, keeps backward compatibility, and gives significant memory or compute gain. -- Models should by default have the highest precision and lowest performance setting. -- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different. -- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work. -- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and -readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py). - -### Schedulers - -Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**. - -The following design principles are followed: -- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). -- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained. -- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper). -- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism. -- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`. -- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.mdx). -- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called. -- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon. -- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1). -- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box". -- In almost all cases, novel schedulers shall be implemented in a new scheduling file. diff --git a/diffusers/docs/source/en/imgs/access_request.png b/diffusers/docs/source/en/imgs/access_request.png deleted file mode 100644 index 33c6abc88dfb226e929b44c30c173c787b407045..0000000000000000000000000000000000000000 Binary files a/diffusers/docs/source/en/imgs/access_request.png and /dev/null differ diff --git a/diffusers/docs/source/en/imgs/diffusers_library.jpg b/diffusers/docs/source/en/imgs/diffusers_library.jpg deleted file mode 100644 index 07ba9c6571a3f070d9d10b78dccfd4d4537dd539..0000000000000000000000000000000000000000 Binary files a/diffusers/docs/source/en/imgs/diffusers_library.jpg and /dev/null differ diff --git a/diffusers/docs/source/en/index.mdx b/diffusers/docs/source/en/index.mdx deleted file mode 100644 index d020eb5d7d174da5a0d291b1efd4e810d3c1dc90..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/index.mdx +++ /dev/null @@ -1,93 +0,0 @@ - - -

-
- -
-

- -# Diffusers - -🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or want to train your own diffusion model, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](conceptual/philosophy#usability-over-performance), [simple over easy](conceptual/philosophy#simple-over-easy), and [customizability over abstractions](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction). - -The library has three main components: - -- State-of-the-art [diffusion pipelines](api/pipelines/overview) for inference with just a few lines of code. -- Interchangeable [noise schedulers](api/schedulers/overview) for balancing trade-offs between generation speed and quality. -- Pretrained [models](api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems. - - - -## Supported pipelines - -| Pipeline | Paper/Repository | Tasks | -|---|---|:---:| -| [alt_diffusion](./api/pipelines/alt_diffusion) | [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -| [audio_diffusion](./api/pipelines/audio_diffusion) | [Audio Diffusion](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | -| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | -| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | -| [dance_diffusion](./api/pipelines/dance_diffusion) | [Dance Diffusion](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | -| [ddpm](./api/pipelines/ddpm) | [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | -| [ddim](./api/pipelines/ddim) | [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | -| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | -| [paint_by_example](./api/pipelines/paint_by_example) | [Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | -| [pndm](./api/pipelines/pndm) | [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | -| [score_sde_ve](./api/pipelines/score_sde_ve) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [score_sde_vp](./api/pipelines/score_sde_vp) | [Score-Based Generative Modeling through Stochastic Differential Equations](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [Semantic Guidance](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | -| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | -| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | -| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | -| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [MultiDiffusion](https://multidiffusion.github.io/) | Text-to-Panorama Generation | -| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800) | Text-Guided Image Editing| -| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [Zero-shot Image-to-Image Translation](https://pix2pixzero.github.io/) | Text-Guided Image Editing | -| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation | -| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation | -| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [Stable Diffusion Image Variations](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation | -| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [Stable Diffusion Latent Upscaler](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_model_editing](./api/pipelines/stable_diffusion/model_editing) | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://time-diffusion.github.io/) | Text-to-Image Model Editing | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Depth-Conditional Stable Diffusion](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [Safe Stable Diffusion](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | -| [stable_unclip](./stable_unclip) | Stable unCLIP | Text-to-Image Generation | -| [stable_unclip](./stable_unclip) | Stable unCLIP | Image-to-Image Text-Guided Generation | -| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [Elucidating the Design Space of Diffusion-Based Generative Models](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | -| [text_to_video_sd](./api/pipelines/text_to_video) | [Modelscope's Text-to-video-synthesis Model in Open Domain](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) | Text-to-Video Generation | -| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125)(implementation by [kakaobrain](https://github.com/kakaobrain/karlo)) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | -| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | \ No newline at end of file diff --git a/diffusers/docs/source/en/installation.mdx b/diffusers/docs/source/en/installation.mdx deleted file mode 100644 index 8639bcfca95b47cfa9d0116c4fae4f3f3cbe888a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/installation.mdx +++ /dev/null @@ -1,144 +0,0 @@ - - -# Installation - -Install 🤗 Diffusers for whichever deep learning library you’re working with. - -🤗 Diffusers is tested on Python 3.7+, PyTorch 1.7.0+ and flax. Follow the installation instructions below for the deep learning library you are using: - -- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions. -- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions. - -## Install with pip - -You should install 🤗 Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html). -If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). -A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies. - -Start by creating a virtual environment in your project directory: - -```bash -python -m venv .env -``` - -Activate the virtual environment: - -```bash -source .env/bin/activate -``` - -Now you're ready to install 🤗 Diffusers with the following command: - -**For PyTorch** - -```bash -pip install diffusers["torch"] -``` - -**For Flax** - -```bash -pip install diffusers["flax"] -``` - -## Install from source - -Before intsalling `diffusers` from source, make sure you have `torch` and `accelerate` installed. - -For `torch` installation refer to the `torch` [docs](https://pytorch.org/get-started/locally/#start-locally). - -To install `accelerate` - -```bash -pip install accelerate -``` - -Install 🤗 Diffusers from source with the following command: - -```bash -pip install git+https://github.com/huggingface/diffusers -``` - -This command installs the bleeding edge `main` version rather than the latest `stable` version. -The `main` version is useful for staying up-to-date with the latest developments. -For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet. -However, this means the `main` version may not always be stable. -We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day. -If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues), so we can fix it even sooner! - -## Editable install - -You will need an editable install if you'd like to: - -* Use the `main` version of the source code. -* Contribute to 🤗 Diffusers and need to test changes in the code. - -Clone the repository and install 🤗 Diffusers with the following commands: - -```bash -git clone https://github.com/huggingface/diffusers.git -cd diffusers -``` - -**For PyTorch** - -``` -pip install -e ".[torch]" -``` - -**For Flax** - -``` -pip install -e ".[flax]" -``` - -These commands will link the folder you cloned the repository to and your Python library paths. -Python will now look inside the folder you cloned to in addition to the normal library paths. -For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/diffusers/`. - - - -You must keep the `diffusers` folder if you want to keep using the library. - - - -Now you can easily update your clone to the latest version of 🤗 Diffusers with the following command: - -```bash -cd ~/diffusers/ -git pull -``` - -Your Python environment will find the `main` version of 🤗 Diffusers on the next run. - -## Notice on telemetry logging - -Our library gathers telemetry information during `from_pretrained()` requests. -This data includes the version of Diffusers and PyTorch/Flax, the requested model or pipeline class, -and the path to a pretrained checkpoint if it is hosted on the Hub. -This usage data helps us debug issues and prioritize new features. -Telemetry is only sent when loading models and pipelines from the HuggingFace Hub, -and is not collected during local usage. - -We understand that not everyone wants to share additional information, and we respect your privacy, -so you can disable telemetry collection by setting the `DISABLE_TELEMETRY` environment variable from your terminal: - -On Linux/MacOS: -```bash -export DISABLE_TELEMETRY=YES -``` - -On Windows: -```bash -set DISABLE_TELEMETRY=YES -``` \ No newline at end of file diff --git a/diffusers/docs/source/en/optimization/fp16.mdx b/diffusers/docs/source/en/optimization/fp16.mdx deleted file mode 100644 index d05c5aabea2b473ee0398eb331470c344826859a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/fp16.mdx +++ /dev/null @@ -1,423 +0,0 @@ - - -# Memory and speed - -We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers). - -We'll discuss how the following settings impact performance and memory. - -| | Latency | Speedup | -| ---------------- | ------- | ------- | -| original | 9.50s | x1 | -| fp16 | 3.61s | x2.63 | -| channels last | 3.30s | x2.88 | -| traced UNet | 3.21s | x2.96 | -| memory efficient attention | 2.63s | x3.61 | - - - obtained on NVIDIA TITAN RTX by generating a single image of size 512x512 from - the prompt "a photo of an astronaut riding a horse on mars" with 50 DDIM - steps. - - -### Use tf32 instead of fp32 (on Ampere and later CUDA devices) - -On Ampere and later CUDA devices matrix multiplications and convolutions can use the TensorFloat32 (TF32) mode for faster but slightly less accurate computations. By default PyTorch enables TF32 mode for convolutions but not matrix multiplications, and unless a network requires full float32 precision we recommend enabling this setting for matrix multiplications, too. It can significantly speed up computations with typically negligible loss of numerical accuracy. You can read more about it [here](https://huggingface.co/docs/transformers/v4.18.0/en/performance#tf32). All you need to do is to add this before your inference: - -```python -import torch - -torch.backends.cuda.matmul.allow_tf32 = True -``` - -## Half precision weights - -To save more GPU memory and get more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them: - -```Python -import torch -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" -image = pipe(prompt).images[0] -``` - - - It is strongly discouraged to make use of [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than using pure - float16 precision. - - -## Sliced attention for additional memory savings - -For even additional memory savings, you can use a sliced version of attention that performs the computation in steps instead of all at once. - - - Attention slicing is useful even if a batch size of just 1 is used - as long - as the model uses more than one attention head. If there is more than one - attention head the *QK^T* attention matrix can be computed sequentially for - each head which can save a significant amount of memory. - - -To perform the attention computation sequentially over each head, you only need to invoke [`~DiffusionPipeline.enable_attention_slicing`] in your pipeline before inference, like here: - -```Python -import torch -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_attention_slicing() -image = pipe(prompt).images[0] -``` - -There's a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM! - - -## Sliced VAE decode for larger batches - -To decode large batches of images with limited VRAM, or to enable batches with 32 images or more, you can use sliced VAE decode that decodes the batch latents one image at a time. - -You likely want to couple this with [`~StableDiffusionPipeline.enable_attention_slicing`] or [`~StableDiffusionPipeline.enable_xformers_memory_efficient_attention`] to further minimize memory use. - -To perform the VAE decode one image at a time, invoke [`~StableDiffusionPipeline.enable_vae_slicing`] in your pipeline before inference. For example: - -```Python -import torch -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_vae_slicing() -images = pipe([prompt] * 32).images -``` - -You may see a small performance boost in VAE decode on multi-image batches. There should be no performance impact on single-image batches. - - -## Tiled VAE decode and encode for large images - -Tiled VAE processing makes it possible to work with large images on limited VRAM. For example, generating 4k images in 8GB of VRAM. Tiled VAE decoder splits the image into overlapping tiles, decodes the tiles, and blends the outputs to make the final image. - -You want to couple this with [`~StableDiffusionPipeline.enable_attention_slicing`] or [`~StableDiffusionPipeline.enable_xformers_memory_efficient_attention`] to further minimize memory use. - -To use tiled VAE processing, invoke [`~StableDiffusionPipeline.enable_vae_tiling`] in your pipeline before inference. For example: - -```python -import torch -from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -) -pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) -pipe = pipe.to("cuda") -prompt = "a beautiful landscape photograph" -pipe.enable_vae_tiling() -pipe.enable_xformers_memory_efficient_attention() - -image = pipe([prompt], width=3840, height=2224, num_inference_steps=20).images[0] -``` - -The output image will have some tile-to-tile tone variation from the tiles having separate decoders, but you shouldn't see sharp seams between the tiles. The tiling is turned off for images that are 512x512 or smaller. - - - -## Offloading to CPU with accelerate for memory savings - -For additional memory savings, you can offload the weights to CPU and only load them to GPU when performing the forward pass. - -To perform CPU offloading, all you have to do is invoke [`~StableDiffusionPipeline.enable_sequential_cpu_offload`]: - -```Python -import torch -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - - torch_dtype=torch.float16, -) - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_sequential_cpu_offload() -image = pipe(prompt).images[0] -``` - -And you can get the memory consumption to < 3GB. - -Note that this method works at the submodule level, not on whole models. This is the best way to minimize memory consumption, but inference is much slower due to the iterative nature of the process. The UNet component of the pipeline runs several times (as many as `num_inference_steps`); each time, the different submodules of the UNet are sequentially onloaded and then offloaded as they are needed, so the number of memory transfers is large. - - -Consider using model offloading as another point in the optimization space: it will be much faster, but memory savings won't be as large. - - -It is also possible to chain offloading with attention slicing for minimal memory consumption (< 2GB). - -```Python -import torch -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - - torch_dtype=torch.float16, -) - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_sequential_cpu_offload() -pipe.enable_attention_slicing(1) - -image = pipe(prompt).images[0] -``` - -**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information. - - - -## Model offloading for fast inference and memory savings - -[Sequential CPU offloading](#sequential_offloading), as discussed in the previous section, preserves a lot of memory but makes inference slower, because submodules are moved to GPU as needed, and immediately returned to CPU when a new module runs. - -Full-model offloading is an alternative that moves whole models to the GPU, instead of handling each model's constituent _modules_. This results in a negligible impact on inference time (compared with moving the pipeline to `cuda`), while still providing some memory savings. - -In this scenario, only one of the main components of the pipeline (typically: text encoder, unet and vae) -will be in the GPU while the others wait in the CPU. Components like the UNet that run for multiple iterations will stay on GPU until they are no longer needed. - -This feature can be enabled by invoking `enable_model_cpu_offload()` on the pipeline, as shown below. - -```Python -import torch -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -) - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_model_cpu_offload() -image = pipe(prompt).images[0] -``` - -This is also compatible with attention slicing for additional memory savings. - -```Python -import torch -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -) - -prompt = "a photo of an astronaut riding a horse on mars" -pipe.enable_model_cpu_offload() -pipe.enable_attention_slicing(1) - -image = pipe(prompt).images[0] -``` - - -This feature requires `accelerate` version 0.17.0 or larger. - - -## Using Channels Last memory format - -Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model. - -For example, in order to set the UNet model in our pipeline to use channels last format, we can use the following: - -```python -print(pipe.unet.conv_out.state_dict()["weight"].stride()) # (2880, 9, 3, 1) -pipe.unet.to(memory_format=torch.channels_last) # in-place operation -print( - pipe.unet.conv_out.state_dict()["weight"].stride() -) # (2880, 1, 960, 320) having a stride of 1 for the 2nd dimension proves that it works -``` - -## Tracing - -Tracing runs an example input tensor through your model, and captures the operations that are invoked as that input makes its way through the model's layers so that an executable or `ScriptFunction` is returned that will be optimized using just-in-time compilation. - -To trace our UNet model, we can use the following: - -```python -import time -import torch -from diffusers import StableDiffusionPipeline -import functools - -# torch disable grad -torch.set_grad_enabled(False) - -# set variables -n_experiments = 2 -unet_runs_per_experiment = 50 - - -# load inputs -def generate_inputs(): - sample = torch.randn(2, 4, 64, 64).half().cuda() - timestep = torch.rand(1).half().cuda() * 999 - encoder_hidden_states = torch.randn(2, 77, 768).half().cuda() - return sample, timestep, encoder_hidden_states - - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -).to("cuda") -unet = pipe.unet -unet.eval() -unet.to(memory_format=torch.channels_last) # use channels_last memory format -unet.forward = functools.partial(unet.forward, return_dict=False) # set return_dict=False as default - -# warmup -for _ in range(3): - with torch.inference_mode(): - inputs = generate_inputs() - orig_output = unet(*inputs) - -# trace -print("tracing..") -unet_traced = torch.jit.trace(unet, inputs) -unet_traced.eval() -print("done tracing") - - -# warmup and optimize graph -for _ in range(5): - with torch.inference_mode(): - inputs = generate_inputs() - orig_output = unet_traced(*inputs) - - -# benchmarking -with torch.inference_mode(): - for _ in range(n_experiments): - torch.cuda.synchronize() - start_time = time.time() - for _ in range(unet_runs_per_experiment): - orig_output = unet_traced(*inputs) - torch.cuda.synchronize() - print(f"unet traced inference took {time.time() - start_time:.2f} seconds") - for _ in range(n_experiments): - torch.cuda.synchronize() - start_time = time.time() - for _ in range(unet_runs_per_experiment): - orig_output = unet(*inputs) - torch.cuda.synchronize() - print(f"unet inference took {time.time() - start_time:.2f} seconds") - -# save the model -unet_traced.save("unet_traced.pt") -``` - -Then we can replace the `unet` attribute of the pipeline with the traced model like the following - -```python -from diffusers import StableDiffusionPipeline -import torch -from dataclasses import dataclass - - -@dataclass -class UNet2DConditionOutput: - sample: torch.FloatTensor - - -pipe = StableDiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -).to("cuda") - -# use jitted unet -unet_traced = torch.jit.load("unet_traced.pt") - - -# del pipe.unet -class TracedUNet(torch.nn.Module): - def __init__(self): - super().__init__() - self.in_channels = pipe.unet.in_channels - self.device = pipe.unet.device - - def forward(self, latent_model_input, t, encoder_hidden_states): - sample = unet_traced(latent_model_input, t, encoder_hidden_states)[0] - return UNet2DConditionOutput(sample=sample) - - -pipe.unet = TracedUNet() - -with torch.inference_mode(): - image = pipe([prompt] * 1, num_inference_steps=50).images[0] -``` - - -## Memory Efficient Attention - -Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf). - -Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt): - -| GPU | Base Attention FP16 | Memory Efficient Attention FP16 | -|------------------ |--------------------- |--------------------------------- | -| NVIDIA Tesla T4 | 3.5it/s | 5.5it/s | -| NVIDIA 3060 RTX | 4.6it/s | 7.8it/s | -| NVIDIA A10G | 8.88it/s | 15.6it/s | -| NVIDIA RTX A6000 | 11.7it/s | 21.09it/s | -| NVIDIA TITAN RTX | 12.51it/s | 18.22it/s | -| A100-SXM4-40GB | 18.6it/s | 29.it/s | -| A100-SXM-80GB | 18.7it/s | 29.5it/s | - -To leverage it just make sure you have: - - PyTorch > 1.12 - - Cuda available - - [Installed the xformers library](xformers). -```python -from diffusers import DiffusionPipeline -import torch - -pipe = DiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - torch_dtype=torch.float16, -).to("cuda") - -pipe.enable_xformers_memory_efficient_attention() - -with torch.inference_mode(): - sample = pipe("a small cat") - -# optional: You can disable it via -# pipe.disable_xformers_memory_efficient_attention() -``` diff --git a/diffusers/docs/source/en/optimization/habana.mdx b/diffusers/docs/source/en/optimization/habana.mdx deleted file mode 100644 index a5f476b0cef2ad8ddb457ef1dc4b10a9da072a59..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/habana.mdx +++ /dev/null @@ -1,70 +0,0 @@ - - -# How to use Stable Diffusion on Habana Gaudi - -🤗 Diffusers is compatible with Habana Gaudi through 🤗 [Optimum Habana](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion). - -## Requirements - -- Optimum Habana 1.4 or later, [here](https://huggingface.co/docs/optimum/habana/installation) is how to install it. -- SynapseAI 1.8. - - -## Inference Pipeline - -To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances: -- A pipeline with [`GaudiStableDiffusionPipeline`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline). This pipeline supports *text-to-image generation*. -- A scheduler with [`GaudiDDIMScheduler`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline#optimum.habana.diffusers.GaudiDDIMScheduler). This scheduler has been optimized for Habana Gaudi. - -When initializing the pipeline, you have to specify `use_habana=True` to deploy it on HPUs. -Furthermore, in order to get the fastest possible generations you should enable **HPU graphs** with `use_hpu_graphs=True`. -Finally, you will need to specify a [Gaudi configuration](https://huggingface.co/docs/optimum/habana/package_reference/gaudi_config) which can be downloaded from the [Hugging Face Hub](https://huggingface.co/Habana). - -```python -from optimum.habana import GaudiConfig -from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline - -model_name = "stabilityai/stable-diffusion-2-base" -scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler") -pipeline = GaudiStableDiffusionPipeline.from_pretrained( - model_name, - scheduler=scheduler, - use_habana=True, - use_hpu_graphs=True, - gaudi_config="Habana/stable-diffusion", -) -``` - -You can then call the pipeline to generate images by batches from one or several prompts: -```python -outputs = pipeline( - prompt=[ - "High quality photo of an astronaut riding a horse in space", - "Face of a yellow cat, high resolution, sitting on a park bench", - ], - num_images_per_prompt=10, - batch_size=4, -) -``` - -For more information, check out Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository. - - -## Benchmark - -Here are the latencies for Habana first-generation Gaudi and Gaudi2 with the [Habana/stable-diffusion](https://huggingface.co/Habana/stable-diffusion) Gaudi configuration (mixed precision bf16/fp32): - -| | Latency (batch size = 1) | Throughput (batch size = 8) | -| ---------------------- |:------------------------:|:---------------------------:| -| first-generation Gaudi | 4.29s | 0.283 images/s | -| Gaudi2 | 1.54s | 0.904 images/s | diff --git a/diffusers/docs/source/en/optimization/mps.mdx b/diffusers/docs/source/en/optimization/mps.mdx deleted file mode 100644 index 3be8c621ee3e27b019f8cbe87d5718aebd310f19..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/mps.mdx +++ /dev/null @@ -1,67 +0,0 @@ - - -# How to use Stable Diffusion in Apple Silicon (M1/M2) - -🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch `mps` device. These are the steps you need to follow to use your M1 or M2 computer with Stable Diffusion. - -## Requirements - -- Mac computer with Apple silicon (M1/M2) hardware. -- macOS 12.6 or later (13.0 or later recommended). -- arm64 version of Python. -- PyTorch 2.0 (recommended) or 1.13 (minimum version supported for `mps`). You can install it with `pip` or `conda` using the instructions in https://pytorch.org/get-started/locally/. - - -## Inference Pipeline - -The snippet below demonstrates how to use the `mps` backend using the familiar `to()` interface to move the Stable Diffusion pipeline to your M1 or M2 device. - - - -**If you are using PyTorch 1.13** you need to "prime" the pipeline using an additional one-time pass through it. This is a temporary workaround for a weird issue we detected: the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and it's ok to use just one inference step and discard the result. - - - -We strongly recommend you use PyTorch 2 or better, as it solves a number of problems like the one described in the previous tip. - -```python -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -pipe = pipe.to("mps") - -# Recommended if your computer has < 64 GB of RAM -pipe.enable_attention_slicing() - -prompt = "a photo of an astronaut riding a horse on mars" - -# First-time "warmup" pass if PyTorch version is 1.13 (see explanation above) -_ = pipe(prompt, num_inference_steps=1) - -# Results match those from the CPU device after the warmup pass. -image = pipe(prompt).images[0] -``` - -## Performance Recommendations - -M1/M2 performance is very sensitive to memory pressure. The system will automatically swap if it needs to, but performance will degrade significantly when it does. - -We recommend you use _attention slicing_ to reduce memory pressure during inference and prevent swapping, particularly if your computer has less than 64 GB of system RAM, or if you generate images at non-standard resolutions larger than 512 × 512 pixels. Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually has a performance impact of ~20% in computers without universal memory, but we have observed _better performance_ in most Apple Silicon computers, unless you have 64 GB or more. - -```python -pipeline.enable_attention_slicing() -``` - -## Known Issues - -- Generating multiple prompts in a batch [crashes or doesn't work reliably](https://github.com/huggingface/diffusers/issues/363). We believe this is related to the [`mps` backend in PyTorch](https://github.com/pytorch/pytorch/issues/84039). This is being resolved, but for now we recommend to iterate instead of batching. diff --git a/diffusers/docs/source/en/optimization/onnx.mdx b/diffusers/docs/source/en/optimization/onnx.mdx deleted file mode 100644 index 6f96ba0cc1941ba0709d3ee672fa09725ff459a2..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/onnx.mdx +++ /dev/null @@ -1,65 +0,0 @@ - - - -# How to use the ONNX Runtime for inference - -🤗 [Optimum](https://github.com/huggingface/optimum) provides a Stable Diffusion pipeline compatible with ONNX Runtime. - -## Installation - -Install 🤗 Optimum with the following command for ONNX Runtime support: - -``` -pip install optimum["onnxruntime"] -``` - -## Stable Diffusion Inference - -To load an ONNX model and run inference with the ONNX Runtime, you need to replace [`StableDiffusionPipeline`] with `ORTStableDiffusionPipeline`. In case you want to load -a PyTorch model and convert it to the ONNX format on-the-fly, you can set `export=True`. - -```python -from optimum.onnxruntime import ORTStableDiffusionPipeline - -model_id = "runwayml/stable-diffusion-v1-5" -pipe = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True) -prompt = "a photo of an astronaut riding a horse on mars" -images = pipe(prompt).images[0] -pipe.save_pretrained("./onnx-stable-diffusion-v1-5") -``` - -If you want to export the pipeline in the ONNX format offline and later use it for inference, -you can use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command: - -```bash -optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/ -``` - -Then perform inference: - -```python -from optimum.onnxruntime import ORTStableDiffusionPipeline - -model_id = "sd_v15_onnx" -pipe = ORTStableDiffusionPipeline.from_pretrained(model_id) -prompt = "a photo of an astronaut riding a horse on mars" -images = pipe(prompt).images[0] -``` - -Notice that we didn't have to specify `export=True` above. - -You can find more examples in [optimum documentation](https://huggingface.co/docs/optimum/). - -## Known Issues - -- Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching. diff --git a/diffusers/docs/source/en/optimization/open_vino.mdx b/diffusers/docs/source/en/optimization/open_vino.mdx deleted file mode 100644 index 5366e86b4a54d110805df5aa5b400662f4b4bfaa..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/open_vino.mdx +++ /dev/null @@ -1,39 +0,0 @@ - - - -# How to use OpenVINO for inference - -🤗 [Optimum](https://github.com/huggingface/optimum-intel) provides a Stable Diffusion pipeline compatible with OpenVINO. You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors ([see](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) the full list of supported devices). - -## Installation - -Install 🤗 Optimum Intel with the following command: - -``` -pip install optimum["openvino"] -``` - -## Stable Diffusion Inference - -To load an OpenVINO model and run inference with OpenVINO Runtime, you need to replace `StableDiffusionPipeline` with `OVStableDiffusionPipeline`. In case you want to load a PyTorch model and convert it to the OpenVINO format on-the-fly, you can set `export=True`. - -```python -from optimum.intel.openvino import OVStableDiffusionPipeline - -model_id = "runwayml/stable-diffusion-v1-5" -pipe = OVStableDiffusionPipeline.from_pretrained(model_id, export=True) -prompt = "a photo of an astronaut riding a horse on mars" -images = pipe(prompt).images[0] -``` - -You can find more examples (such as static reshaping and model compilation) in [optimum documentation](https://huggingface.co/docs/optimum/intel/inference#export-and-inference-of-stable-diffusion-models). diff --git a/diffusers/docs/source/en/optimization/opt_overview.mdx b/diffusers/docs/source/en/optimization/opt_overview.mdx deleted file mode 100644 index 8d8386f85f43df2d22c00a9b54df5de59e07fe01..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/opt_overview.mdx +++ /dev/null @@ -1,17 +0,0 @@ - - -# Overview - -Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 🧨 Diffuser's goal is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware. - -This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You can also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors. \ No newline at end of file diff --git a/diffusers/docs/source/en/optimization/torch2.0.mdx b/diffusers/docs/source/en/optimization/torch2.0.mdx deleted file mode 100644 index 206ac4e447ccae0dbed2c269587be3c98e5829f1..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/torch2.0.mdx +++ /dev/null @@ -1,210 +0,0 @@ - - -# Accelerated PyTorch 2.0 support in Diffusers - -Starting from version `0.13.0`, Diffusers supports the latest optimization from the upcoming [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) release. These include: -1. Support for accelerated transformers implementation with memory-efficient attention – no extra dependencies required. -2. [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) support for extra performance boost when individual models are compiled. - - -## Installation -To benefit from the accelerated attention implementation and `torch.compile`, you just need to install the latest versions of PyTorch 2.0 from `pip`, and make sure you are on diffusers 0.13.0 or later. As explained below, `diffusers` automatically uses the attention optimizations (but not `torch.compile`) when available. - -```bash -pip install --upgrade torch torchvision diffusers -``` - -## Using accelerated transformers and torch.compile. - - -1. **Accelerated Transformers implementation** - - PyTorch 2.0 includes an optimized and memory-efficient attention implementation through the [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) function, which automatically enables several optimizations depending on the inputs and the GPU type. This is similar to the `memory_efficient_attention` from [xFormers](https://github.com/facebookresearch/xformers), but built natively into PyTorch. - - These optimizations will be enabled by default in Diffusers if PyTorch 2.0 is installed and if `torch.nn.functional.scaled_dot_product_attention` is available. To use it, just install `torch 2.0` as suggested above and simply use the pipeline. For example: - - ```Python - import torch - from diffusers import DiffusionPipeline - - pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) - pipe = pipe.to("cuda") - - prompt = "a photo of an astronaut riding a horse on mars" - image = pipe(prompt).images[0] - ``` - - If you want to enable it explicitly (which is not required), you can do so as shown below. - - ```Python - import torch - from diffusers import DiffusionPipeline - from diffusers.models.attention_processor import AttnProcessor2_0 - - pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") - pipe.unet.set_attn_processor(AttnProcessor2_0()) - - prompt = "a photo of an astronaut riding a horse on mars" - image = pipe(prompt).images[0] - ``` - - This should be as fast and memory efficient as `xFormers`. More details [in our benchmark](#benchmark). - - -2. **torch.compile** - - To get an additional speedup, we can use the new `torch.compile` feature. To do so, we simply wrap our `unet` with `torch.compile`. For more information and different options, refer to the - [torch compile docs](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html). - - ```python - import torch - from diffusers import DiffusionPipeline - - pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") - pipe.unet = torch.compile(pipe.unet) - - batch_size = 10 - prompt = "A photo of an astronaut riding a horse on marse." - images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images - ``` - - Depending on the type of GPU, `compile()` can yield between 2-9% of _additional speed-up_ over the accelerated transformer optimizations. Note, however, that compilation is able to squeeze more performance improvements in more recent GPU architectures such as Ampere (A100, 3090), Ada (4090) and Hopper (H100). - - Compilation takes some time to complete, so it is best suited for situations where you need to prepare your pipeline once and then perform the same type of inference operations multiple times. - - -## Benchmark - -We conducted a simple benchmark on different GPUs to compare vanilla attention, xFormers, `torch.nn.functional.scaled_dot_product_attention` and `torch.compile+torch.nn.functional.scaled_dot_product_attention`. -For the benchmark we used the [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) model with 50 steps. The `xFormers` benchmark is done using the `torch==1.13.1` version, while the accelerated transformers optimizations are tested using nightly versions of PyTorch 2.0. The tables below summarize the results we got. - -Please refer to [our featured blog post in the PyTorch site](https://pytorch.org/blog/accelerated-diffusers-pt-20/) for more details. - -### FP16 benchmark - -The table below shows the benchmark results for inference using `fp16`. As we can see, `torch.nn.functional.scaled_dot_product_attention` is as fast as `xFormers` (sometimes slightly faster/slower) on all the GPUs we tested. -And using `torch.compile` gives further speed-up of up of 10% over `xFormers`, but it's mostly noticeable on the A100 GPU. - -___The time reported is in seconds.___ - -| GPU | Batch Size | Vanilla Attention | xFormers | PyTorch2.0 SDPA | SDPA + torch.compile | Speed over xformers (%) | -| --- | --- | --- | --- | --- | --- | --- | -| A100 | 1 | 2.69 | 2.7 | 1.98 | 2.47 | 8.52 | -| A100 | 2 | 3.21 | 3.04 | 2.38 | 2.78 | 8.55 | -| A100 | 4 | 5.27 | 3.91 | 3.89 | 3.53 | 9.72 | -| A100 | 8 | 9.74 | 7.03 | 7.04 | 6.62 | 5.83 | -| A100 | 10 | 12.02 | 8.7 | 8.67 | 8.45 | 2.87 | -| A100 | 16 | 18.95 | 13.57 | 13.55 | 13.20 | 2.73 | -| A100 | 32 (1) | OOM | 26.56 | 26.68 | 25.85 | 2.67 | -| A100 | 64 | | 52.51 | 53.03 | 50.93 | 3.01 | -| | | | | | | | -| A10 | 4 | 13.94 | 9.81 | 10.01 | 9.35 | 4.69 | -| A10 | 8 | 27.09 | 19 | 19.53 | 18.33 | 3.53 | -| A10 | 10 | 33.69 | 23.53 | 24.19 | 22.52 | 4.29 | -| A10 | 16 | OOM | 37.55 | 38.31 | 36.81 | 1.97 | -| A10 | 32 (1) | | 77.19 | 78.43 | 76.64 | 0.71 | -| A10 | 64 (1) | | 173.59 | 158.99 | 155.14 | 10.63 | -| | | | | | | | -| T4 | 4 | 38.81 | 30.09 | 29.74 | 27.55 | 8.44 | -| T4 | 8 | OOM | 55.71 | 55.99 | 53.85 | 3.34 | -| T4 | 10 | OOM | 68.96 | 69.86 | 65.35 | 5.23 | -| T4 | 16 | OOM | 111.47 | 113.26 | 106.93 | 4.07 | -| | | | | | | | -| V100 | 4 | 9.84 | 8.16 | 8.09 | 7.65 | 6.25 | -| V100 | 8 | OOM | 15.62 | 15.44 | 14.59 | 6.59 | -| V100 | 10 | OOM | 19.52 | 19.28 | 18.18 | 6.86 | -| V100 | 16 | OOM | 30.29 | 29.84 | 28.22 | 6.83 | -| | | | | | | | -| 3090 | 1 | 2.94 | 2.5 | 2.42 | 2.33 | 6.80 | -| 3090 | 4 | 10.04 | 7.82 | 7.72 | 7.38 | 5.63 | -| 3090 | 8 | 19.27 | 14.97 | 14.88 | 14.15 | 5.48 | -| 3090 | 10| 24.08 | 18.7 | 18.62 | 18.12 | 3.10 | -| 3090 | 16 | OOM | 29.06 | 28.88 | 28.2 | 2.96 | -| 3090 | 32 (1) | | 58.05 | 57.42 | 56.28 | 3.05 | -| 3090 | 64 (1) | | 126.54 | 114.27 | 112.21 | 11.32 | -| | | | | | | | -| 3090 Ti | 1 | 2.7 | 2.26 | 2.19 | 2.12 | 6.19 | -| 3090 Ti | 4 | 9.07 | 7.14 | 7.00 | 6.71 | 6.02 | -| 3090 Ti | 8 | 17.51 | 13.65 | 13.53 | 12.94 | 5.20 | -| 3090 Ti | 10 (2) | 21.79 | 16.85 | 16.77 | 16.44 | 2.43 | -| 3090 Ti | 16 | OOM | 26.1 | 26.04 | 25.53 | 2.18 | -| 3090 Ti | 32 (1) | | 51.78 | 51.71 | 50.91 | 1.68 | -| 3090 Ti | 64 (1) | | 112.02 | 102.78 | 100.89 | 9.94 | -| | | | | | | | -| 4090 | 1 | 4.47 | 3.98 | 1.28 | 1.21 | 69.60 | -| 4090 | 4 | 10.48 | 8.37 | 3.76 | 3.56 | 57.47 | -| 4090 | 8 | 14.33 | 10.22 | 7.43 | 6.99 | 31.60 | -| 4090 | 16 | | 17.07 | 14.98 | 14.58 | 14.59 | -| 4090 | 32 (1) | | 39.03 | 30.18 | 29.49 | 24.44 | -| 4090 | 64 (1) | | 77.29 | 61.34 | 59.96 | 22.42 | - - - -### FP32 benchmark - -The table below shows the benchmark results for inference using `fp32`. In this case, `torch.nn.functional.scaled_dot_product_attention` is faster than `xFormers` on all the GPUs we tested. - -Using `torch.compile` in addition to the accelerated transformers implementation can yield up to 19% performance improvement over `xFormers` in Ampere and Ada cards, and up to 20% (Ampere) or 28% (Ada) over vanilla attention. - -| GPU | Batch Size | Vanilla Attention | xFormers | PyTorch2.0 SDPA | SDPA + torch.compile | Speed over xformers (%) | Speed over vanilla (%) | -| --- | --- | --- | --- | --- | --- | --- | --- | -| A100 | 1 | 4.97 | 3.86 | 2.6 | 2.86 | 25.91 | 42.45 | -| A100 | 2 | 9.03 | 6.76 | 4.41 | 4.21 | 37.72 | 53.38 | -| A100 | 4 | 16.70 | 12.42 | 7.94 | 7.54 | 39.29 | 54.85 | -| A100 | 10 | OOM | 29.93 | 18.70 | 18.46 | 38.32 | | -| A100 | 16 | | 47.08 | 29.41 | 29.04 | 38.32 | | -| A100 | 32 | | 92.89 | 57.55 | 56.67 | 38.99 | | -| A100 | 64 | | 185.3 | 114.8 | 112.98 | 39.03 | | -| | | | | | | | -| A10 | 1 | 10.59 | 8.81 | 7.51 | 7.35 | 16.57 | 30.59 | -| A10 | 4 | 34.77 | 27.63 | 22.77 | 22.07 | 20.12 | 36.53 | -| A10 | 8 | | 56.19 | 43.53 | 43.86 | 21.94 | | -| A10 | 16 | | 116.49 | 88.56 | 86.64 | 25.62 | | -| A10 | 32 | | 221.95 | 175.74 | 168.18 | 24.23 | | -| A10 | 48 | | 333.23 | 264.84 | | 20.52 | | -| | | | | | | | -| T4 | 1 | 28.2 | 24.49 | 23.93 | 23.56 | 3.80 | 16.45 | -| T4 | 2 | 52.77 | 45.7 | 45.88 | 45.06 | 1.40 | 14.61 | -| T4 | 4 | OOM | 85.72 | 85.78 | 84.48 | 1.45 | | -| T4 | 8 | | 149.64 | 150.75 | 148.4 | 0.83 | | -| | | | | | | | -| V100 | 1 | 7.4 | 6.84 | 6.8 | 6.66 | 2.63 | 10.00 | -| V100 | 2 | 13.85 | 12.81 | 12.66 | 12.35 | 3.59 | 10.83 | -| V100 | 4 | OOM | 25.73 | 25.31 | 24.78 | 3.69 | | -| V100 | 8 | | 43.95 | 43.37 | 42.25 | 3.87 | | -| V100 | 16 | | 84.99 | 84.73 | 82.55 | 2.87 | | -| | | | | | | | -| 3090 | 1 | 7.09 | 6.78 | 5.34 | 5.35 | 21.09 | 24.54 | -| 3090 | 4 | 22.69 | 21.45 | 18.56 | 18.18 | 15.24 | 19.88 | -| 3090 | 8 | | 42.59 | 36.68 | 35.61 | 16.39 | | -| 3090 | 16 | | 85.35 | 72.93 | 70.18 | 17.77 | | -| 3090 | 32 (1) | | 162.05 | 143.46 | 138.67 | 14.43 | | -| | | | | | | | -| 3090 Ti | 1 | 6.45 | 6.19 | 4.99 | 4.89 | 21.00 | 24.19 | -| 3090 Ti | 4 | 20.32 | 19.31 | 17.02 | 16.48 | 14.66 | 18.90 | -| 3090 Ti | 8 | | 37.93 | 33.21 | 32.24 | 15.00 | | -| 3090 Ti | 16 | | 75.37 | 66.63 | 64.5 | 14.42 | | -| 3090 Ti | 32 (1) | | 142.55 | 128.89 | 124.92 | 12.37 | | -| | | | | | | | -| 4090 | 1 | 5.54 | 4.99 | 2.66 | 2.58 | 48.30 | 53.43 | -| 4090 | 4 | 13.67 | 11.4 | 8.81 | 8.46 | 25.79 | 38.11 | -| 4090 | 8 | | 19.79 | 17.55 | 16.62 | 16.02 | | -| 4090 | 16 | | 38.62 | 35.65 | 34.07 | 11.78 | | -| 4090 | 32 (1) | | 76.57 | 69.48 | 65.35 | 14.65 | | -| 4090 | 48 | | 114.44 | 106.3 | | 7.11 | | - - -(1) Batch Size >= 32 requires enable_vae_slicing() because of https://github.com/pytorch/pytorch/issues/81665. -This is required for PyTorch 1.13.1, and also for PyTorch 2.0 and large batch sizes. - -For more details about how this benchmark was run, please refer to [this PR](https://github.com/huggingface/diffusers/pull/2303) and to [the blog post](https://pytorch.org/blog/accelerated-diffusers-pt-20/). diff --git a/diffusers/docs/source/en/optimization/xformers.mdx b/diffusers/docs/source/en/optimization/xformers.mdx deleted file mode 100644 index ede074a59fa9e05d216a01801042a342a24ca254..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/optimization/xformers.mdx +++ /dev/null @@ -1,35 +0,0 @@ - - -# Installing xFormers - -We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption. - -Starting from version `0.0.16` of xFormers, released on January 2023, installation can be easily performed using pre-built pip wheels: - -```bash -pip install xformers -``` - - - -The xFormers PIP package requires the latest version of PyTorch (1.13.1 as of xFormers 0.0.16). If you need to use a previous version of PyTorch, then we recommend you install xFormers from source using [the project instructions](https://github.com/facebookresearch/xformers#installing-xformers). - - - -After xFormers is installed, you can use `enable_xformers_memory_efficient_attention()` for faster inference and reduced memory consumption, as discussed [here](fp16#memory-efficient-attention). - - - -According to [this issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training (fine-tune or Dreambooth) in some GPUs. If you observe that problem, please install a development version as indicated in that comment. - - diff --git a/diffusers/docs/source/en/quicktour.mdx b/diffusers/docs/source/en/quicktour.mdx deleted file mode 100644 index d494b79dccd567e8fae61b23d88743e6a5e7d019..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/quicktour.mdx +++ /dev/null @@ -1,313 +0,0 @@ - - -[[open-in-colab]] - -# Quicktour - -Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone. - -Whether you're a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about: - -* The [`DiffusionPipeline`] is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference. -* Popular pretrained [model](./api/models) architectures and modules that can be used as building blocks for creating diffusion systems. -* Many different [schedulers](./api/schedulers/overview) - algorithms that control how noise is added for training, and how to generate denoised images during inference. - -The quicktour will show you how to use the [`DiffusionPipeline`] for inference, and then walk you through how to combine a model and scheduler to replicate what's happening inside the [`DiffusionPipeline`]. - - - -The quicktour is a simplified version of the introductory 🧨 Diffusers [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) to help you get started quickly. If you want to learn more about 🧨 Diffusers goal, design philosophy, and additional details about it's core API, check out the notebook! - - - -Before you begin, make sure you have all the necessary libraries installed: - -```bash -pip install --upgrade diffusers accelerate transformers -``` - -- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training. -- [🤗 Transformers](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview). - -## DiffusionPipeline - -The [`DiffusionPipeline`] is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the [🧨 Diffusers Summary](./api/pipelines/overview#diffusers-summary) table. - -| **Task** | **Description** | **Pipeline** -|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------| -| Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) | -| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) | -| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) | -| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) | -| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) | - -Start by creating an instance of a [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. -You can use the [`DiffusionPipeline`] for any [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) stored on the Hugging Face Hub. -In this quicktour, you'll load the [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint for text-to-image generation. - - - -For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) models, please carefully read the [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) first before running the model. 🧨 Diffusers implements a [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) to prevent offensive or harmful content, but the model's improved image generation capabilities can still produce potentially harmful content. - - - -Load the model with the [`~DiffusionPipeline.from_pretrained`] method: - -```python ->>> from diffusers import DiffusionPipeline - ->>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -``` - -The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. You'll see that the Stable Diffusion pipeline is composed of the [`UNet2DConditionModel`] and [`PNDMScheduler`] among other things: - -```py ->>> pipeline -StableDiffusionPipeline { - "_class_name": "StableDiffusionPipeline", - "_diffusers_version": "0.13.1", - ..., - "scheduler": [ - "diffusers", - "PNDMScheduler" - ], - ..., - "unet": [ - "diffusers", - "UNet2DConditionModel" - ], - "vae": [ - "diffusers", - "AutoencoderKL" - ] -} -``` - -We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters. -You can move the generator object to a GPU, just like you would in PyTorch: - -```python ->>> pipeline.to("cuda") -``` - -Now you can pass a text prompt to the `pipeline` to generate an image, and then access the denoised image. By default, the image output is wrapped in a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. - -```python ->>> image = pipeline("An image of a squirrel in Picasso style").images[0] ->>> image -``` - -
- -
- -Save the image by calling `save`: - -```python ->>> image.save("image_of_squirrel_painting.png") -``` - -### Local pipeline - -You can also use the pipeline locally. The only difference is you need to download the weights first: - -``` -git lfs install -git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -``` - -Then load the saved weights into the pipeline: - -```python ->>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") -``` - -Now you can run the pipeline as you would in the section above. - -### Swapping schedulers - -Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default [`PNDMScheduler`] with the [`EulerDiscreteScheduler`], load it with the [`~diffusers.ConfigMixin.from_config`] method: - -```py ->>> from diffusers import EulerDiscreteScheduler - ->>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) -``` - -Try generating an image with the new scheduler and see if you notice a difference! - -In the next section, you'll take a closer look at the components - the model and scheduler - that make up the [`DiffusionPipeline`] and learn how to use these components to generate an image of a cat. - -## Models - -Most models take a noisy sample, and at each timestep it predicts the *noise residual* (other models learn to predict the previous sample directly or the velocity or [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems. - -Models are initiated with the [`~ModelMixin.from_pretrained`] method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you'll load the [`UNet2DModel`], a basic unconditional image generation model with a checkpoint trained on cat images: - -```py ->>> from diffusers import UNet2DModel - ->>> repo_id = "google/ddpm-cat-256" ->>> model = UNet2DModel.from_pretrained(repo_id) -``` - -To access the model parameters, call `model.config`: - -```py ->>> model.config -``` - -The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can't be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference. - -Some of the most important parameters are: - -* `sample_size`: the height and width dimension of the input sample. -* `in_channels`: the number of input channels of the input sample. -* `down_block_types` and `up_block_types`: the type of down- and upsampling blocks used to create the UNet architecture. -* `block_out_channels`: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks. -* `layers_per_block`: the number of ResNet blocks present in each UNet block. - -To use the model for inference, create the image shape with random Gaussian noise. It should have a `batch` axis because the model can receive multiple random noises, a `channel` axis corresponding to the number of input channels, and a `sample_size` axis for the height and width of the image: - -```py ->>> import torch - ->>> torch.manual_seed(0) - ->>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size) ->>> noisy_sample.shape -torch.Size([1, 3, 256, 256]) -``` - -For inference, pass the noisy image to the model and a `timestep`. The `timestep` indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the `sample` method to get the model output: - -```py ->>> with torch.no_grad(): -... noisy_residual = model(sample=noisy_sample, timestep=2).sample -``` - -To generate actual examples though, you'll need a scheduler to guide the denoising process. In the next section, you'll learn how to couple a model with a scheduler. - -## Schedulers - -Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the `noisy_residual`. - - - -🧨 Diffusers is a toolbox for building diffusion systems. While the [`DiffusionPipeline`] is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system. - - - -For the quicktour, you'll instantiate the [`DDPMScheduler`] with it's [`~diffusers.ConfigMixin.from_config`] method: - -```py ->>> from diffusers import DDPMScheduler - ->>> scheduler = DDPMScheduler.from_config(repo_id) ->>> scheduler -DDPMScheduler { - "_class_name": "DDPMScheduler", - "_diffusers_version": "0.13.1", - "beta_end": 0.02, - "beta_schedule": "linear", - "beta_start": 0.0001, - "clip_sample": true, - "clip_sample_range": 1.0, - "num_train_timesteps": 1000, - "prediction_type": "epsilon", - "trained_betas": null, - "variance_type": "fixed_small" -} -``` - - - -💡 Notice how the scheduler is instantiated from a configuration. Unlike a model, a scheduler does not have trainable weights and is parameter-free! - - - -Some of the most important parameters are: - -* `num_train_timesteps`: the length of the denoising process or in other words, the number of timesteps required to process random Gaussian noise into a data sample. -* `beta_schedule`: the type of noise schedule to use for inference and training. -* `beta_start` and `beta_end`: the start and end noise values for the noise schedule. - -To predict a slightly less noisy image, pass the following to the scheduler's [`~diffusers.DDPMScheduler.step`] method: model output, `timestep`, and current `sample`. - -```py ->>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample ->>> less_noisy_sample.shape -``` - -The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisier! Let's bring it all together now and visualize the entire denoising process. - -First, create a function that postprocesses and displays the denoised image as a `PIL.Image`: - -```py ->>> import PIL.Image ->>> import numpy as np - - ->>> def display_sample(sample, i): -... image_processed = sample.cpu().permute(0, 2, 3, 1) -... image_processed = (image_processed + 1.0) * 127.5 -... image_processed = image_processed.numpy().astype(np.uint8) - -... image_pil = PIL.Image.fromarray(image_processed[0]) -... display(f"Image at step {i}") -... display(image_pil) -``` - -To speed up the denoising process, move the input and model to a GPU: - -```py ->>> model.to("cuda") ->>> noisy_sample = noisy_sample.to("cuda") -``` - -Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler: - -```py ->>> import tqdm - ->>> sample = noisy_sample - ->>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)): -... # 1. predict noise residual -... with torch.no_grad(): -... residual = model(sample, t).sample - -... # 2. compute less noisy image and set x_t -> x_t-1 -... sample = scheduler.step(residual, t, sample).prev_sample - -... # 3. optionally look at image -... if (i + 1) % 50 == 0: -... display_sample(sample, i + 1) -``` - -Sit back and watch as a cat is generated from nothing but noise! 😻 - -
- -
- -## Next steps - -Hopefully you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can: - -* Train or finetune a model to generate your own images in the [training](./tutorials/basic_training) tutorial. -* See example official and community [training or finetuning scripts](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) for a variety of use cases. -* Learn more about loading, accessing, changing and comparing schedulers in the [Using different Schedulers](./using-diffusers/schedulers) guide. -* Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher quality images with the [Stable Diffusion](./stable_diffusion) guide. -* Dive deeper into speeding up 🧨 Diffusers with guides on [optimized PyTorch on a GPU](./optimization/fp16), and inference guides for running [Stable Diffusion on Apple Silicon (M1/M2)](./optimization/mps) and [ONNX Runtime](./optimization/onnx). diff --git a/diffusers/docs/source/en/stable_diffusion.mdx b/diffusers/docs/source/en/stable_diffusion.mdx deleted file mode 100644 index eebe0ec660f2dd2d0ed73108f7ec4eb590b12e6c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/stable_diffusion.mdx +++ /dev/null @@ -1,271 +0,0 @@ - - -# Effective and efficient diffusion - -[[open-in-colab]] - -Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again. - -This is why it's important to get the most *computational* (speed) and *memory* (GPU RAM) efficiency from the pipeline to reduce the time between inference cycles so you can iterate faster. - -This tutorial walks you through how to generate faster and better with the [`DiffusionPipeline`]. - -Begin by loading the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) model: - -```python -from diffusers import DiffusionPipeline - -model_id = "runwayml/stable-diffusion-v1-5" -pipeline = DiffusionPipeline.from_pretrained(model_id) -``` - -The example prompt you'll use is a portrait of an old warrior chief, but feel free to use your own prompt: - -```python -prompt = "portrait photo of a old warrior chief" -``` - -## Speed - - - -💡 If you don't have access to a GPU, you can use one for free from a GPU provider like [Colab](https://colab.research.google.com/)! - - - -One of the simplest ways to speed up inference is to place the pipeline on a GPU the same way you would with any PyTorch module: - -```python -pipeline = pipeline.to("cuda") -``` - -To make sure you can use the same image and improve on it, use a [`Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) and set a seed for [reproducibility](./using-diffusers/reproducibility): - -```python -generator = torch.Generator("cuda").manual_seed(0) -``` - -Now you can generate an image: - -```python -image = pipeline(prompt, generator=generator).images[0] -image -``` - -
- -
- -This process took ~30 seconds on a T4 GPU (it might be faster if your allocated GPU is better than a T4). By default, the [`DiffusionPipeline`] runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps. - -Let's start by loading the model in `float16` and generate an image: - -```python -import torch - -pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) -pipeline = pipeline.to("cuda") -generator = torch.Generator("cuda").manual_seed(0) -image = pipeline(prompt, generator=generator).images[0] -image -``` - -
- -
- -This time, it only took ~11 seconds to generate the image, which is almost 3x faster than before! - - - -💡 We strongly suggest always running your pipelines in `float16`, and so far, we've rarely seen any degradation in output quality. - - - -Another option is to reduce the number of inference steps. Choosing a more efficient scheduler could help decrease the number of steps without sacrificing output quality. You can find which schedulers are compatible with the current model in the [`DiffusionPipeline`] by calling the `compatibles` method: - -```python -pipeline.scheduler.compatibles -[ - diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, - diffusers.schedulers.scheduling_unipc_multistep.UniPCMultistepScheduler, - diffusers.schedulers.scheduling_k_dpm_2_discrete.KDPM2DiscreteScheduler, - diffusers.schedulers.scheduling_deis_multistep.DEISMultistepScheduler, - diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, - diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler, - diffusers.schedulers.scheduling_ddpm.DDPMScheduler, - diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler, - diffusers.schedulers.scheduling_k_dpm_2_ancestral_discrete.KDPM2AncestralDiscreteScheduler, - diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler, - diffusers.schedulers.scheduling_pndm.PNDMScheduler, - diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler, - diffusers.schedulers.scheduling_ddim.DDIMScheduler, -] -``` - -The Stable Diffusion model uses the [`PNDMScheduler`] by default which usually requires ~50 inference steps, but more performant schedulers like [`DPMSolverMultistepScheduler`], require only ~20 or 25 inference steps. Use the [`ConfigMixin.from_config`] method to load a new scheduler: - -```python -from diffusers import DPMSolverMultistepScheduler - -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) -``` - -Now set the `num_inference_steps` to 20: - -```python -generator = torch.Generator("cuda").manual_seed(0) -image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0] -image -``` - -
- -
- -Great, you've managed to cut the inference time to just 4 seconds! ⚡️ - -## Memory - -The other key to improving pipeline performance is consuming less memory, which indirectly implies more speed, since you're often trying to maximize the number of images generated per second. The easiest way to see how many images you can generate at once is to try out different batch sizes until you get an `OutOfMemoryError` (OOM). - -Create a function that'll generate a batch of images from a list of prompts and `Generators`. Make sure to assign each `Generator` a seed so you can reuse it if it produces a good result. - -```python -def get_inputs(batch_size=1): - generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)] - prompts = batch_size * [prompt] - num_inference_steps = 20 - - return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps} -``` - -You'll also need a function that'll display each batch of images: - -```python -from PIL import image - - -def image_grid(imgs, rows=2, cols=2): - w, h = imgs[0].size - grid = Image.new("RGB", size=(cols * w, rows * h)) - - for i, img in enumerate(imgs): - grid.paste(img, box=(i % cols * w, i // cols * h)) - return grid -``` - -Start with `batch_size=4` and see how much memory you've consumed: - -```python -images = pipeline(**get_inputs(batch_size=4)).images -image_grid(images) -``` - -Unless you have a GPU with more RAM, the code above probably returned an `OOM` error! Most of the memory is taken up by the cross-attention layers. Instead of running this operation in a batch, you can run it sequentially to save a significant amount of memory. All you have to do is configure the pipeline to use the [`~DiffusionPipeline.enable_attention_slicing`] function: - -```python -pipeline.enable_attention_slicing() -``` - -Now try increasing the `batch_size` to 8! - -```python -images = pipeline(**get_inputs(batch_size=8)).images -image_grid(images, rows=2, cols=4) -``` - -
- -
- -Whereas before you couldn't even generate a batch of 4 images, now you can generate a batch of 8 images at ~3.5 seconds per image! This is probably the fastest you can go on a T4 GPU without sacrificing quality. - -## Quality - -In the last two sections, you learned how to optimize the speed of your pipeline by using `fp16`, reducing the number of inference steps by using a more performant scheduler, and enabling attention slicing to reduce memory consumption. Now you're going to focus on how to improve the quality of generated images. - -### Better checkpoints - -The most obvious step is to use better checkpoints. The Stable Diffusion model is a good starting point, and since its official launch, several improved versions have also been released. However, using a newer version doesn't automatically mean you'll get better results. You'll still have to experiment with different checkpoints yourself, and do a little research (such as using [negative prompts](https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/)) to get the best results. - -As the field grows, there are more and more high-quality checkpoints finetuned to produce certain styles. Try exploring the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) and [Diffusers Gallery](https://huggingface.co/spaces/huggingface-projects/diffusers-gallery) to find one you're interested in! - -### Better pipeline components - -You can also try replacing the current pipeline components with a newer version. Let's try loading the latest [autodecoder](https://huggingface.co/stabilityai/stable-diffusion-2-1/tree/main/vae) from Stability AI into the pipeline, and generate some images: - -```python -from diffusers import AutoencoderKL - -vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to("cuda") -pipeline.vae = vae -images = pipeline(**get_inputs(batch_size=8)).images -image_grid(images, rows=2, cols=4) -``` - -
- -
- -### Better prompt engineering - -The text prompt you use to generate an image is super important, so much so that it is called *prompt engineering*. Some considerations to keep during prompt engineering are: - -- How is the image or similar images of the one I want to generate stored on the internet? -- What additional detail can I give that steers the model towards the style I want? - -With this in mind, let's improve the prompt to include color and higher quality details: - -```python -prompt += ", tribal panther make up, blue on red, side profile, looking away, serious eyes" -prompt += " 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta" -``` - -Generate a batch of images with the new prompt: - -```python -images = pipeline(**get_inputs(batch_size=8)).images -image_grid(images, rows=2, cols=4) -``` - -
- -
- -Pretty impressive! Let's tweak the second image - corresponding to the `Generator` with a seed of `1` - a bit more by adding some text about the age of the subject: - -```python -prommpts = [ - "portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta", - "portrait photo of a old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta", - "portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta", - "portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta", -] - -generator = [torch.Generator("cuda").manual_seed(1) for _ in range(len(prompts))] -images = pipeline(prompt=prompts, generator=generator, num_inference_steps=25).images -image_grid(images) -``` - -
- -
- -## Next steps - -In this tutorial, you learned how to optimize a [`DiffusionPipeline`] for computational and memory efficiency as well as improving the quality of generated outputs. If you're interested in making your pipeline even faster, take a look at the following resources: - -- Enable [xFormers](./optimization/xformers) memory efficient attention mechanism for faster speed and reduced memory consumption. -- Learn how in [PyTorch 2.0](./optimization/torch2.0), [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) can yield 2-9% faster inference speed. -- Many optimization techniques for inference are also included in this memory and speed [guide](./optimization/fp16), such as memory offloading. \ No newline at end of file diff --git a/diffusers/docs/source/en/training/controlnet.mdx b/diffusers/docs/source/en/training/controlnet.mdx deleted file mode 100644 index 6b7539b89b07a2771627d426023ccc58185044e1..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/controlnet.mdx +++ /dev/null @@ -1,290 +0,0 @@ - - -# ControlNet - -[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) (ControlNet) by Lvmin Zhang and Maneesh Agrawala. - -This example is based on the [training example in the original ControlNet repository](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md). It trains a ControlNet to fill circles using a [small synthetic dataset](https://huggingface.co/datasets/fusing/fill50k). - -## Installing the dependencies - -Before running the scripts, make sure to install the library's training dependencies. - - - -To successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date. We update the example scripts frequently and install example-specific requirements. - - - -To do this, execute the following steps in a new virtual environment: -```bash -git clone https://github.com/huggingface/diffusers -cd diffusers -pip install -e . -``` - -Then navigate into the example folder and run: -```bash -pip install -r requirements.txt -``` - -And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -Or for a default 🤗Accelerate configuration without answering questions about your environment: - -```bash -accelerate config default -``` - -Or if your environment doesn't support an interactive shell like a notebook: - -```python -from accelerate.utils import write_basic_config - -write_basic_config() -``` - -## Circle filling dataset - -The original dataset is hosted in the ControlNet [repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip), but we re-uploaded it [here](https://huggingface.co/datasets/fusing/fill50k) to be compatible with 🤗 Datasets so that it can handle the data loading within the training script. - -Our training examples use [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1). - -## Training - -Download the following images to condition our training with: - -```sh -wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png - -wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png -``` - - -```bash -export MODEL_DIR="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="path to save model" - -accelerate launch train_controlnet.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --learning_rate=1e-5 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --train_batch_size=4 -``` - -This default configuration requires ~38GB VRAM. - -By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use Weights & -Biases. - -Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM. - -```bash -export MODEL_DIR="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="path to save model" - -accelerate launch train_controlnet.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --learning_rate=1e-5 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 -``` - -## Example results - -#### After 300 steps with batch size 8 - -| | | -|-------------------|:-------------------------:| -| | red circle with blue background | -![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_300_steps.png) | -| | cyan circle with brown floral background | -![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_300_steps.png) | - - -#### After 6000 steps with batch size 8: - -| | | -|-------------------|:-------------------------:| -| | red circle with blue background | -![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_6000_steps.png) | -| | cyan circle with brown floral background | -![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_6000_steps.png) | - -## Training on a 16 GB GPU - -Enable the following optimizations to train on a 16GB GPU: - -- Gradient checkpointing -- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) - -Now you can launch the training script: - -```bash -export MODEL_DIR="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="path to save model" - -accelerate launch train_controlnet.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --learning_rate=1e-5 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --gradient_checkpointing \ - --use_8bit_adam -``` - -## Training on a 12 GB GPU - -Enable the following optimizations to train on a 12GB GPU: -- Gradient checkpointing -- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) -- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) -- set gradients to `None` - -```bash -export MODEL_DIR="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="path to save model" - -accelerate launch train_controlnet.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --learning_rate=1e-5 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --gradient_checkpointing \ - --use_8bit_adam \ - --enable_xformers_memory_efficient_attention \ - --set_grads_to_none -``` - -When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`. - -## Training on an 8 GB GPU - -We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does -save memory, we have not confirmed whether the configuration trains successfully. You will very likely -have to make changes to the config to have a successful training run. - -Enable the following optimizations to train on a 8GB GPU: -- Gradient checkpointing -- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) -- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) -- set gradients to `None` -- DeepSpeed stage 2 with parameter and optimizer offloading -- fp16 mixed precision - -[DeepSpeed](https://www.deepspeed.ai/) can offload tensors from VRAM to either -CPU or NVME. This requires significantly more RAM (about 25 GB). - -You'll have to configure your environment with `accelerate config` to enable DeepSpeed stage 2. - -The configuration file should look like this: - -```yaml -compute_environment: LOCAL_MACHINE -deepspeed_config: - gradient_accumulation_steps: 4 - offload_optimizer_device: cpu - offload_param_device: cpu - zero3_init_flag: false - zero_stage: 2 -distributed_type: DEEPSPEED -``` - - - -See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more DeepSpeed configuration options. - - - -Changing the default Adam optimizer to DeepSpeed's Adam -`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but -it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer -does not seem to be compatible with DeepSpeed at the moment. - -```bash -export MODEL_DIR="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="path to save model" - -accelerate launch train_controlnet.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --gradient_checkpointing \ - --enable_xformers_memory_efficient_attention \ - --set_grads_to_none \ - --mixed_precision fp16 -``` - -## Inference - -The trained model can be run with the [`StableDiffusionControlNetPipeline`]. -Set `base_model_path` and `controlnet_path` to the values `--pretrained_model_name_or_path` and -`--output_dir` were respectively set to in the training script. - -```py -from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler -from diffusers.utils import load_image -import torch - -base_model_path = "path to model" -controlnet_path = "path to controlnet" - -controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16) -pipe = StableDiffusionControlNetPipeline.from_pretrained( - base_model_path, controlnet=controlnet, torch_dtype=torch.float16 -) - -# speed up diffusion process with faster scheduler and memory optimization -pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) -# remove following line if xformers is not installed -pipe.enable_xformers_memory_efficient_attention() - -pipe.enable_model_cpu_offload() - -control_image = load_image("./conditioning_image_1.png") -prompt = "pale golden rod circle with old lace background" - -# generate image -generator = torch.manual_seed(0) -image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0] - -image.save("./output.png") -``` diff --git a/diffusers/docs/source/en/training/dreambooth.mdx b/diffusers/docs/source/en/training/dreambooth.mdx deleted file mode 100644 index 908355e496dcb6d68cf26b7109f0dedb168a8ddb..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/dreambooth.mdx +++ /dev/null @@ -1,472 +0,0 @@ - - -# DreamBooth - -[[open-in-colab]] - -[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views. - -![Dreambooth examples from the project's blog](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg) -Dreambooth examples from the project's blog. - -This guide will show you how to finetune DreamBooth with the [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) model for various GPU sizes, and with Flax. All the training scripts for DreamBooth used in this guide can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) if you're interested in digging deeper and seeing how things work. - -Before running the scripts, make sure you install the library's training dependencies. We also recommend installing 🧨 Diffusers from the `main` GitHub branch: - -```bash -pip install git+https://github.com/huggingface/diffusers -pip install -U -r diffusers/examples/dreambooth/requirements.txt -``` - -xFormers is not part of the training requirements, but we recommend you [install](../optimization/xformers) it if you can because it could make your training faster and less memory intensive. - -After all the dependencies have been set up, initialize a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -To setup a default 🤗 Accelerate environment without choosing any configurations: - -```bash -accelerate config default -``` - -Or if your environment doesn't support an interactive shell like a notebook, you can use: - -```py -from accelerate.utils import write_basic_config - -write_basic_config() -``` - -## Finetuning - - - -DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://huggingface.co/blog/dreambooth) with recommended settings for different subjects to help you choose the appropriate hyperparameters. - - - - - -Let's try DreamBooth with a [few images of a dog](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ); download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path: - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export OUTPUT_DIR="path_to_saved_model" -``` - -Then you can launch the training script (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)) with the following command: - -```bash -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --output_dir=$OUTPUT_DIR \ - --instance_prompt="a photo of sks dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=1 \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --max_train_steps=400 -``` - - -If you have access to TPUs or want to train even faster, you can try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py). The Flax training script doesn't support gradient checkpointing or gradient accumulation, so you'll need a GPU with at least 30GB of memory. - -Before running the script, make sure you have the requirements installed: - -```bash -pip install -U -r requirements.txt -``` - -Now you can launch the training script with the following command: - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export INSTANCE_DIR="path-to-instance-images" -export OUTPUT_DIR="path-to-save-model" - -python train_dreambooth_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --output_dir=$OUTPUT_DIR \ - --instance_prompt="a photo of sks dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --learning_rate=5e-6 \ - --max_train_steps=400 -``` - - - -## Finetuning with prior-preserving loss - -Prior preservation is used to avoid overfitting and language-drift (check out the [paper](https://arxiv.org/abs/2208.12242) to learn more if you're interested). For prior preservation, you use other images of the same class as part of the training process. The nice thing is that you can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path you specify. - -The authors recommend generating `num_epochs * num_samples` images for prior preservation. In most cases, 200-300 images work well. - - - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export CLASS_DIR="path_to_class_images" -export OUTPUT_DIR="path_to_saved_model" - -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=1 \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -python train_dreambooth_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --learning_rate=5e-6 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - - - -## Finetuning the text encoder and UNet - -The script also allows you to finetune the `text_encoder` along with the `unet`. In our experiments (check out the [Training Stable Diffusion with DreamBooth using 🧨 Diffusers](https://huggingface.co/blog/dreambooth) post for more details), this yields much better results, especially when generating images of faces. - - - -Training the text encoder requires additional memory and it won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option. - - - -Pass the `--train_text_encoder` argument to the training script to enable finetuning the `text_encoder` and `unet`: - - - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export CLASS_DIR="path_to_class_images" -export OUTPUT_DIR="path_to_saved_model" - -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_text_encoder \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --use_8bit_adam - --gradient_checkpointing \ - --learning_rate=2e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -python train_dreambooth_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_text_encoder \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --learning_rate=2e-6 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - - - -## Finetuning with LoRA - -You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, on DreamBooth. For more details, take a look at the [LoRA training](./lora#dreambooth) guide. - -## Saving checkpoints while training - -It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the training process. One of the intermediate checkpoints might actually work better than the final model! Pass the following argument to the training script to enable saving checkpoints: - -```bash - --checkpointing_steps=500 -``` - -This saves the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, followed by the number of steps performed so far; for example, `checkpoint-1500` would be a checkpoint saved after 1500 training steps. - -### Resume training from a saved checkpoint - -If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` to the script and specify the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last saved checkpoint (the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps: - -```bash - --resume_from_checkpoint="checkpoint-1500" -``` - -This is a good opportunity to tweak some of your hyperparameters if you wish. - -### Inference from a saved checkpoint - -Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders, and learning rate. - -If you have **`"accelerate>=0.16.0"`** installed, use the following code to run -inference from an intermediate checkpoint. - -```python -from diffusers import DiffusionPipeline, UNet2DConditionModel -from transformers import CLIPTextModel -import torch - -# Load the pipeline with the same arguments (model, revision) that were used for training -model_id = "CompVis/stable-diffusion-v1-4" - -unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet") - -# if you have trained with `--args.train_text_encoder` make sure to also load the text encoder -text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder") - -pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16) -pipeline.to("cuda") - -# Perform inference, or save, or push to the hub -pipeline.save_pretrained("dreambooth-pipeline") -``` - -If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: - -```python -from accelerate import Accelerator -from diffusers import DiffusionPipeline - -# Load the pipeline with the same arguments (model, revision) that were used for training -model_id = "CompVis/stable-diffusion-v1-4" -pipeline = DiffusionPipeline.from_pretrained(model_id) - -accelerator = Accelerator() - -# Use text_encoder if `--train_text_encoder` was used for the initial training -unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder) - -# Restore state from a checkpoint path. You have to use the absolute path here. -accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100") - -# Rebuild the pipeline with the unwrapped models (assignment to .unet and .text_encoder should work too) -pipeline = DiffusionPipeline.from_pretrained( - model_id, - unet=accelerator.unwrap_model(unet), - text_encoder=accelerator.unwrap_model(text_encoder), -) - -# Perform inference, or save, or push to the hub -pipeline.save_pretrained("dreambooth-pipeline") -``` - -## Optimizations for different GPU sizes - -Depending on your hardware, there are a few different ways to optimize DreamBooth on GPUs from 16GB to just 8GB! - -### xFormers - -[xFormers](https://github.com/facebookresearch/xformers) is a toolbox for optimizing Transformers, and it includes a [memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) mechanism that is used in 🧨 Diffusers. You'll need to [install xFormers](./optimization/xformers) and then add the following argument to your training script: - -```bash - --enable_xformers_memory_efficient_attention -``` - -xFormers is not available in Flax. - -### Set gradients to none - -Another way you can lower your memory footprint is to [set the gradients](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) to `None` instead of zero. However, this may change certain behaviors, so if you run into any issues, try removing this argument. Add the following argument to your training script to set the gradients to `None`: - -```bash - --set_grads_to_none -``` - -### 16GB GPU - -With the help of gradient checkpointing and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 8-bit optimizer, it's possible to train DreamBooth on a 16GB GPU. Make sure you have bitsandbytes installed: - -```bash -pip install bitsandbytes -``` - -Then pass the `--use_8bit_adam` option to the training script: - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export CLASS_DIR="path_to_class_images" -export OUTPUT_DIR="path_to_saved_model" - -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=2 --gradient_checkpointing \ - --use_8bit_adam \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - -### 12GB GPU - -To run DreamBooth on a 12GB GPU, you'll need to enable gradient checkpointing, the 8-bit optimizer, xFormers, and set the gradients to `None`: - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path-to-instance-images" -export CLASS_DIR="path-to-class-images" -export OUTPUT_DIR="path-to-save-model" - -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=1 --gradient_checkpointing \ - --use_8bit_adam \ - --enable_xformers_memory_efficient_attention \ - --set_grads_to_none \ - --learning_rate=2e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 -``` - -### 8 GB GPU - -For 8GB GPUs, you'll need the help of [DeepSpeed](https://www.deepspeed.ai/) to offload some -tensors from the VRAM to either the CPU or NVME, enabling training with less GPU memory. - -Run the following command to configure your 🤗 Accelerate environment: - -```bash -accelerate config -``` - -During configuration, confirm that you want to use DeepSpeed. Now it's possible to train on under 8GB VRAM by combining DeepSpeed stage 2, fp16 mixed precision, and offloading the model parameters and the optimizer state to the CPU. The drawback is that this requires more system RAM, about 25 GB. See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options. - -You should also change the default Adam optimizer to DeepSpeed's optimized version of Adam -[`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu) for a substantial speedup. Enabling `DeepSpeedCPUAdam` requires your system's CUDA toolchain version to be the same as the one installed with PyTorch. - -8-bit optimizers don't seem to be compatible with DeepSpeed at the moment. - -Launch training with the following command: - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export INSTANCE_DIR="path_to_training_images" -export CLASS_DIR="path_to_class_images" -export OUTPUT_DIR="path_to_saved_model" - -accelerate launch train_dreambooth.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --class_data_dir=$CLASS_DIR \ - --output_dir=$OUTPUT_DIR \ - --with_prior_preservation --prior_loss_weight=1.0 \ - --instance_prompt="a photo of sks dog" \ - --class_prompt="a photo of dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --sample_batch_size=1 \ - --gradient_accumulation_steps=1 --gradient_checkpointing \ - --learning_rate=5e-6 \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --num_class_images=200 \ - --max_train_steps=800 \ - --mixed_precision=fp16 -``` - -## Inference - -Once you have trained a model, specify the path to where the model is saved, and use it for inference in the [`StableDiffusionPipeline`]. Make sure your prompts include the special `identifier` used during training (`sks` in the previous examples). - -If you have **`"accelerate>=0.16.0"`** installed, you can use the following code to run -inference from an intermediate checkpoint: - -```python -from diffusers import DiffusionPipeline -import torch - -model_id = "path_to_saved_model" -pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") - -prompt = "A photo of sks dog in a bucket" -image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] - -image.save("dog-bucket.png") -``` - -You may also run inference from any of the [saved training checkpoints](#inference-from-a-saved-checkpoint). diff --git a/diffusers/docs/source/en/training/instructpix2pix.mdx b/diffusers/docs/source/en/training/instructpix2pix.mdx deleted file mode 100644 index e6f050b34acf5077cc1c5f0009632e7ca9d9f280..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/instructpix2pix.mdx +++ /dev/null @@ -1,181 +0,0 @@ - - -# InstructPix2Pix - -[InstructPix2Pix](https://arxiv.org/abs/2211.09800) is a method to fine-tune text-conditioned diffusion models such that they can follow an edit instruction for an input image. Models fine-tuned using this method take the following as inputs: - -

- instructpix2pix-inputs -

- -The output is an "edited" image that reflects the edit instruction applied on the input image: - -

- instructpix2pix-output -

- -The `train_instruct_pix2pix.py` script shows how to implement the training procedure and adapt it for Stable Diffusion. - -***Disclaimer: Even though `train_instruct_pix2pix.py` implements the InstructPix2Pix -training procedure while being faithful to the [original implementation](https://github.com/timothybrooks/instruct-pix2pix) we have only tested it on a [small-scale dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples). This can impact the end results. For better results, we recommend longer training runs with a larger dataset. [Here](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) you can find a large dataset for InstructPix2Pix training.*** - -## Running locally with PyTorch - -### Installing the dependencies - -Before running the scripts, make sure to install the library's training dependencies: - -**Important** - -To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: -```bash -git clone https://github.com/huggingface/diffusers -cd diffusers -pip install -e . -``` - -Then cd in the example folder and run -```bash -pip install -r requirements.txt -``` - -And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -Or for a default accelerate configuration without answering questions about your environment - -```bash -accelerate config default -``` - -Or if your environment doesn't support an interactive shell e.g. a notebook - -```python -from accelerate.utils import write_basic_config - -write_basic_config() -``` - -### Toy example - -As mentioned before, we'll use a [small toy dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples) for training. The dataset -is a smaller version of the [original dataset](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) used in the InstructPix2Pix paper. - -Configure environment variables such as the dataset identifier and the Stable Diffusion -checkpoint: - -```bash -export MODEL_NAME="runwayml/stable-diffusion-v1-5" -export DATASET_ID="fusing/instructpix2pix-1000-samples" -``` - -Now, we can launch training: - -```bash -accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$DATASET_ID \ - --enable_xformers_memory_efficient_attention \ - --resolution=256 --random_flip \ - --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ - --max_train_steps=15000 \ - --checkpointing_steps=5000 --checkpoints_total_limit=1 \ - --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ - --conditioning_dropout_prob=0.05 \ - --mixed_precision=fp16 \ - --seed=42 -``` - -Additionally, we support performing validation inference to monitor training progress -with Weights and Biases. You can enable this feature with `report_to="wandb"`: - -```bash -accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$DATASET_ID \ - --enable_xformers_memory_efficient_attention \ - --resolution=256 --random_flip \ - --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ - --max_train_steps=15000 \ - --checkpointing_steps=5000 --checkpoints_total_limit=1 \ - --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ - --conditioning_dropout_prob=0.05 \ - --mixed_precision=fp16 \ - --val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ - --validation_prompt="make the mountains snowy" \ - --seed=42 \ - --report_to=wandb - ``` - - We recommend this type of validation as it can be useful for model debugging. Note that you need `wandb` installed to use this. You can install `wandb` by running `pip install wandb`. - - [Here](https://wandb.ai/sayakpaul/instruct-pix2pix/runs/ctr3kovq), you can find an example training run that includes some validation samples and the training hyperparameters. - - ***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.*** - - ## Inference - - Once training is complete, we can perform inference: - - ```python -import PIL -import requests -import torch -from diffusers import StableDiffusionInstructPix2PixPipeline - -model_id = "your_model_id" # <- replace this -pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") -generator = torch.Generator("cuda").manual_seed(0) - -url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/test_pix2pix_4.png" - - -def download_image(url): - image = PIL.Image.open(requests.get(url, stream=True).raw) - image = PIL.ImageOps.exif_transpose(image) - image = image.convert("RGB") - return image - - -image = download_image(url) -prompt = "wipe out the lake" -num_inference_steps = 20 -image_guidance_scale = 1.5 -guidance_scale = 10 - -edited_image = pipe( - prompt, - image=image, - num_inference_steps=num_inference_steps, - image_guidance_scale=image_guidance_scale, - guidance_scale=guidance_scale, - generator=generator, -).images[0] -edited_image.save("edited_image.png") -``` - -An example model repo obtained using this training script can be found -here - [sayakpaul/instruct-pix2pix](https://huggingface.co/sayakpaul/instruct-pix2pix). - -We encourage you to play with the following three parameters to control -speed and quality during performance: - -* `num_inference_steps` -* `image_guidance_scale` -* `guidance_scale` - -Particularly, `image_guidance_scale` and `guidance_scale` can have a profound impact -on the generated ("edited") image (see [here](https://twitter.com/RisingSayak/status/1628392199196151808?s=20) for an example). diff --git a/diffusers/docs/source/en/training/lora.mdx b/diffusers/docs/source/en/training/lora.mdx deleted file mode 100644 index 1c72fbbc8d584128fabcfd8e29226df2db86d527..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/lora.mdx +++ /dev/null @@ -1,214 +0,0 @@ - - -# Low-Rank Adaptation of Large Language Models (LoRA) - -[[open-in-colab]] - - - -Currently, LoRA is only supported for the attention layers of the [`UNet2DConditionalModel`]. - - - -[Low-Rank Adaptation of Large Language Models (LoRA)](https://arxiv.org/abs/2106.09685) is a training method that accelerates the training of large models while consuming less memory. It adds pairs of rank-decomposition weight matrices (called **update matrices**) to existing weights, and **only** trains those newly added weights. This has a couple of advantages: - -- Previous pretrained weights are kept frozen so the model is not as prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114). -- Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable. -- LoRA matrices are generally added to the attention layers of the original model. 🧨 Diffusers provides the [`~diffusers.loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method to load the LoRA weights into a model's attention layers. You can control the extent to which the model is adapted toward new training images via a `scale` parameter. -- The greater memory-efficiency allows you to run fine-tuning on consumer GPUs like the Tesla T4, RTX 3080 or even the RTX 2080 Ti! GPUs like the T4 are free and readily accessible in Kaggle or Google Colab notebooks. - - - -💡 LoRA is not only limited to attention layers. The authors found that amending -the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why it's common to just add the LoRA weights to the attention layers of a model. Check out the [Using LoRA for efficient Stable Diffusion fine-tuning](https://huggingface.co/blog/lora) blog for more information about how LoRA works! - - - -[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository. 🧨 Diffusers now supports finetuning with LoRA for [text-to-image generation](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora) and [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora). This guide will show you how to do both. - -If you'd like to store or share your model with the community, login to your Hugging Face account (create [one](hf.co/join) if you don't have one already): - -```bash -huggingface-cli login -``` - -## Text-to-image - -Finetuning a model like Stable Diffusion, which has billions of parameters, can be slow and difficult. With LoRA, it is much easier and faster to finetune a diffusion model. It can run on hardware with as little as 11GB of GPU RAM without resorting to tricks such as 8-bit optimizers. - -### Training[[text-to-image-training]] - -Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon. - -To start, make sure you have the `MODEL_NAME` and `DATASET_NAME` environment variables set. The `OUTPUT_DIR` and `HUB_MODEL_ID` variables are optional and specify where to save the model to on the Hub: - -```bash -export MODEL_NAME="runwayml/stable-diffusion-v1-5" -export OUTPUT_DIR="/sddata/finetune/lora/pokemon" -export HUB_MODEL_ID="pokemon-lora" -export DATASET_NAME="lambdalabs/pokemon-blip-captions" -``` - -There are some flags to be aware of before you start training: - -* `--push_to_hub` stores the trained LoRA embeddings on the Hub. -* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)). -* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA. - -Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)): - -```bash -accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$DATASET_NAME \ - --dataloader_num_workers=8 \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --max_train_steps=15000 \ - --learning_rate=1e-04 \ - --max_grad_norm=1 \ - --lr_scheduler="cosine" --lr_warmup_steps=0 \ - --output_dir=${OUTPUT_DIR} \ - --push_to_hub \ - --hub_model_id=${HUB_MODEL_ID} \ - --report_to=wandb \ - --checkpointing_steps=500 \ - --validation_prompt="A pokemon with blue eyes." \ - --seed=1337 -``` - -### Inference[[text-to-image-inference]] - -Now you can use the model for inference by loading the base model in the [`StableDiffusionPipeline`] and then the [`DPMSolverMultistepScheduler`]: - -```py ->>> import torch ->>> from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler - ->>> model_base = "runwayml/stable-diffusion-v1-5" - ->>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16) ->>> pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) -``` - -Load the LoRA weights from your finetuned model *on top of the base model weights*, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the `scale` parameter: - - - -💡 A `scale` value of `0` is the same as not using your LoRA weights and you're only using the base model weights, and a `scale` value of `1` means you're only using the fully finetuned LoRA weights. Values between `0` and `1` interpolates between the two weights. - - - -```py ->>> pipe.unet.load_attn_procs(model_path) ->>> pipe.to("cuda") -# use half the weights from the LoRA finetuned model and half the weights from the base model - ->>> image = pipe( -... "A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5, cross_attention_kwargs={"scale": 0.5} -... ).images[0] -# use the weights from the fully finetuned LoRA model - ->>> image = pipe("A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5).images[0] ->>> image.save("blue_pokemon.png") -``` - -## DreamBooth - -[DreamBooth](https://arxiv.org/abs/2208.12242) is a finetuning technique for personalizing a text-to-image model like Stable Diffusion to generate photorealistic images of a subject in different contexts, given a few images of the subject. However, DreamBooth is very sensitive to hyperparameters and it is easy to overfit. Some important hyperparameters to consider include those that affect the training time (learning rate, number of training steps), and inference time (number of steps, scheduler type). - - - -💡 Take a look at the [Training Stable Diffusion with DreamBooth using 🧨 Diffusers](https://huggingface.co/blog/dreambooth) blog for an in-depth analysis of DreamBooth experiments and recommended settings. - - - -### Training[[dreambooth-training]] - -Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) with DreamBooth and LoRA with some 🐶 [dog images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ). Download and save these images to a directory. - -To start, make sure you have the `MODEL_NAME` and `INSTANCE_DIR` (path to directory containing images) environment variables set. The `OUTPUT_DIR` variables is optional and specifies where to save the model to on the Hub: - -```bash -export MODEL_NAME="runwayml/stable-diffusion-v1-5" -export INSTANCE_DIR="path-to-instance-images" -export OUTPUT_DIR="path-to-save-model" -``` - -There are some flags to be aware of before you start training: - -* `--push_to_hub` stores the trained LoRA embeddings on the Hub. -* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)). -* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA. - -Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)): - -```bash -accelerate launch train_dreambooth_lora.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --output_dir=$OUTPUT_DIR \ - --instance_prompt="a photo of sks dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=1 \ - --checkpointing_steps=100 \ - --learning_rate=1e-4 \ - --report_to="wandb" \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --max_train_steps=500 \ - --validation_prompt="A photo of sks dog in a bucket" \ - --validation_epochs=50 \ - --seed="0" \ - --push_to_hub -``` - -### Inference[[dreambooth-inference]] - -Now you can use the model for inference by loading the base model in the [`StableDiffusionPipeline`]: - -```py ->>> import torch ->>> from diffusers import StableDiffusionPipeline - ->>> model_base = "runwayml/stable-diffusion-v1-5" - ->>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16) -``` - -Load the LoRA weights from your finetuned DreamBooth model *on top of the base model weights*, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the `scale` parameter: - - - -💡 A `scale` value of `0` is the same as not using your LoRA weights and you're only using the base model weights, and a `scale` value of `1` means you're only using the fully finetuned LoRA weights. Values between `0` and `1` interpolates between the two weights. - - - -```py ->>> pipe.unet.load_attn_procs(model_path) ->>> pipe.to("cuda") -# use half the weights from the LoRA finetuned model and half the weights from the base model - ->>> image = pipe( -... "A picture of a sks dog in a bucket.", -... num_inference_steps=25, -... guidance_scale=7.5, -... cross_attention_kwargs={"scale": 0.5}, -... ).images[0] -# use the weights from the fully finetuned LoRA model - ->>> image = pipe("A picture of a sks dog in a bucket.", num_inference_steps=25, guidance_scale=7.5).images[0] ->>> image.save("bucket-dog.png") -``` \ No newline at end of file diff --git a/diffusers/docs/source/en/training/overview.mdx b/diffusers/docs/source/en/training/overview.mdx deleted file mode 100644 index 5ad3a1f06cc1cd7c4ec8b66923186f80e714790a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/overview.mdx +++ /dev/null @@ -1,76 +0,0 @@ - - -# 🧨 Diffusers Training Examples - -Diffusers training examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library -for a variety of use cases. - -**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, -please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines) - -Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. -More specifically, this means: - -- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script. -- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. -- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. -- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling -point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. - -We provide **official** examples that cover the most popular tasks of diffusion models. -*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. -If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you! - -Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: - -- [Unconditional Training](./unconditional_training) -- [Text-to-Image Training](./text2image) -- [Text Inversion](./text_inversion) -- [Dreambooth](./dreambooth) -- [LoRA Support](./lora) -- [ControlNet](./controlnet) - -If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive. - -| Task | 🤗 Accelerate | 🤗 Datasets | Colab -|---|---|:---:|:---:| -| [**Unconditional Image Generation**](./unconditional_training) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [**Text-to-Image fine-tuning**](./text2image) | ✅ | ✅ | -| [**Textual Inversion**](./text_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) -| [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) -| [**Training with LoRA**](./lora) | ✅ | - | - | -| [**ControlNet**](./controlnet) | ✅ | ✅ | - | - -## Community - -In addition, we provide **community** examples, which are examples added and maintained by our community. -Community examples can consist of both *training* examples or *inference* pipelines. -For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. -Examples that are useful for the community, but are either not yet deemed popular or not yet following our above philosophy should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. -**Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) to show to the community how you like to use `diffusers` 🪄. - -## Important note - -To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: - -```bash -git clone https://github.com/huggingface/diffusers -cd diffusers -pip install . -``` - -Then cd in the example folder of your choice and run - -```bash -pip install -r requirements.txt -``` diff --git a/diffusers/docs/source/en/training/text2image.mdx b/diffusers/docs/source/en/training/text2image.mdx deleted file mode 100644 index 851be61bcf973d46d7a57bd6efd39802899ab46b..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/text2image.mdx +++ /dev/null @@ -1,208 +0,0 @@ - - - -# Text-to-image - - - -The text-to-image fine-tuning script is experimental. It's easy to overfit and run into issues like catastrophic forgetting. We recommend you explore different hyperparameters to get the best results on your dataset. - - - -Text-to-image models like Stable Diffusion generate an image from a text prompt. This guide will show you how to finetune the [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) model on your own dataset with PyTorch and Flax. All the training scripts for text-to-image finetuning used in this guide can be found in this [repository](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) if you're interested in taking a closer look. - -Before running the scripts, make sure to install the library's training dependencies: - -```bash -pip install git+https://github.com/huggingface/diffusers.git -pip install -U -r requirements.txt -``` - -And initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -If you have already cloned the repo, then you won't need to go through these steps. Instead, you can pass the path to your local checkout to the training script and it will be loaded from there. - -## Hardware requirements - -Using `gradient_checkpointing` and `mixed_precision`, it should be possible to finetune the model on a single 24GB GPU. For higher `batch_size`'s and faster training, it's better to use GPUs with more than 30GB of GPU memory. You can also use JAX/Flax for fine-tuning on TPUs or GPUs, which will be covered [below](#flax-jax-finetuning). - -You can reduce your memory footprint even more by enabling memory efficient attention with xFormers. Make sure you have [xFormers installed](./optimization/xformers) and pass the `--enable_xformers_memory_efficient_attention` flag to the training script. - -xFormers is not available for Flax. - -## Upload model to Hub - -Store your model on the Hub by adding the following argument to the training script: - -```bash - --push_to_hub -``` - -## Save and load checkpoints - -It is a good idea to regularly save checkpoints in case anything happens during training. To save a checkpoint, pass the following argument to the training script: - -```bash - --checkpointing_steps=500 -``` - -Every 500 steps, the full training state is saved in a subfolder in the `output_dir`. The checkpoint has the format `checkpoint-` followed by the number of steps trained so far. For example, `checkpoint-1500` is a checkpoint saved after 1500 training steps. - -To load a checkpoint to resume training, pass the argument `--resume_from_checkpoint` to the training script and specify the checkpoint you want to resume from. For example, the following argument resumes training from the checkpoint saved after 1500 training steps: - -```bash - --resume_from_checkpoint="checkpoint-1500" -``` - -## Fine-tuning - - - -Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this: - - -{"path": "../../../../examples/text_to_image/README.md", -"language": "bash", -"start-after": "accelerate_snippet_start", -"end-before": "accelerate_snippet_end", -"dedent": 0} - - -To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder). - -Modify the script if you want to use custom loading logic. We left pointers in the code in the appropriate places to help you. 🤗 The example script below shows how to finetune on a local dataset in `TRAIN_DIR` and where to save the model to in `OUTPUT_DIR`: - -```bash -export MODEL_NAME="CompVis/stable-diffusion-v1-4" -export TRAIN_DIR="path_to_your_dataset" -export OUTPUT_DIR="path_to_save_model" - -accelerate launch train_text_to_image.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_data_dir=$TRAIN_DIR \ - --use_ema \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --gradient_checkpointing \ - --mixed_precision="fp16" \ - --max_train_steps=15000 \ - --learning_rate=1e-05 \ - --max_grad_norm=1 \ - --lr_scheduler="constant" --lr_warmup_steps=0 \ - --output_dir=${OUTPUT_DIR} -``` - - -With Flax, it's possible to train a Stable Diffusion model faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). This is very efficient on TPU hardware but works great on GPUs too. The Flax training script doesn't support features like gradient checkpointing or gradient accumulation yet, so you'll need a GPU with at least 30GB of memory or a TPU v3. - -Before running the script, make sure you have the requirements installed: - -```bash -pip install -U -r requirements_flax.txt -``` - -Now you can launch the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py) like this: - -```bash -export MODEL_NAME="runwayml/stable-diffusion-v1-5" -export dataset_name="lambdalabs/pokemon-blip-captions" - -python train_text_to_image_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$dataset_name \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --max_train_steps=15000 \ - --learning_rate=1e-05 \ - --max_grad_norm=1 \ - --output_dir="sd-pokemon-model" -``` - -To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder). - -Modify the script if you want to use custom loading logic. We left pointers in the code in the appropriate places to help you. 🤗 The example script below shows how to finetune on a local dataset in `TRAIN_DIR`: - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export TRAIN_DIR="path_to_your_dataset" - -python train_text_to_image_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_data_dir=$TRAIN_DIR \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --mixed_precision="fp16" \ - --max_train_steps=15000 \ - --learning_rate=1e-05 \ - --max_grad_norm=1 \ - --output_dir="sd-pokemon-model" -``` - - - -## LoRA - -You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, for fine-tuning text-to-image models. For more details, take a look at the [LoRA training](lora#text-to-image) guide. - -## Inference - -Now you can load the fine-tuned model for inference by passing the model path or model name on the Hub to the [`StableDiffusionPipeline`]: - - - -```python -from diffusers import StableDiffusionPipeline - -model_path = "path_to_saved_model" -pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16) -pipe.to("cuda") - -image = pipe(prompt="yoda").images[0] -image.save("yoda-pokemon.png") -``` - - -```python -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline - -model_path = "path_to_saved_model" -pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16) - -prompt = "yoda pokemon" -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 50 - -num_samples = jax.device_count() -prompt = num_samples * [prompt] -prompt_ids = pipeline.prepare_inputs(prompt) - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -image.save("yoda-pokemon.png") -``` - - diff --git a/diffusers/docs/source/en/training/text_inversion.mdx b/diffusers/docs/source/en/training/text_inversion.mdx deleted file mode 100644 index 68c613849301934579d5b2353b7ec902e6cad76f..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/text_inversion.mdx +++ /dev/null @@ -1,215 +0,0 @@ - - - - -# Textual Inversion - -[[open-in-colab]] - -[Textual Inversion](https://arxiv.org/abs/2208.01618) is a technique for capturing novel concepts from a small number of example images. While the technique was originally demonstrated with a [latent diffusion model](https://github.com/CompVis/latent-diffusion), it has since been applied to other model variants like [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion). The learned concepts can be used to better control the images generated from text-to-image pipelines. It learns new "words" in the text encoder's embedding space, which are used within text prompts for personalized image generation. - -![Textual Inversion example](https://textual-inversion.github.io/static/images/editing/colorful_teapot.JPG) -By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation (image source). - -This guide will show you how to train a [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) model with Textual Inversion. All the training scripts for Textual Inversion used in this guide can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) if you're interested in taking a closer look at how things work under the hood. - - - -There is a community-created collection of trained Textual Inversion models in the [Stable Diffusion Textual Inversion Concepts Library](https://huggingface.co/sd-concepts-library) which are readily available for inference. Over time, this'll hopefully grow into a useful resource as more concepts are added! - - - -Before you begin, make sure you install the library's training dependencies: - -```bash -pip install diffusers accelerate transformers -``` - -After all the dependencies have been set up, initialize a [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -To setup a default 🤗 Accelerate environment without choosing any configurations: - -```bash -accelerate config default -``` - -Or if your environment doesn't support an interactive shell like a notebook, you can use: - -```bash -from accelerate.utils import write_basic_config - -write_basic_config() -``` - -Finally, you try and [install xFormers](https://huggingface.co/docs/diffusers/main/en/training/optimization/xformers) to reduce your memory footprint with xFormers memory-efficient attention. Once you have xFormers installed, add the `--enable_xformers_memory_efficient_attention` argument to the training script. xFormers is not supported for Flax. - -## Upload model to Hub - -If you want to store your model on the Hub, add the following argument to the training script: - -```bash ---push_to_hub -``` - -## Save and load checkpoints - -It is often a good idea to regularly save checkpoints of your model during training. This way, you can resume training from a saved checkpoint if your training is interrupted for any reason. To save a checkpoint, pass the following argument to the training script to save the full training state in a subfolder in `output_dir` every 500 steps: - -```bash ---checkpointing_steps=500 -``` - -To resume training from a saved checkpoint, pass the following argument to the training script and the specific checkpoint you'd like to resume from: - -```bash ---resume_from_checkpoint="checkpoint-1500" -``` - -## Finetuning - -For your training dataset, download these [images of a cat statue](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and store them in a directory. - -Set the `MODEL_NAME` environment variable to the model repository id, and the `DATA_DIR` environment variable to the path of the directory containing the images. Now you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py): - - - -💡 A full training run takes ~1 hour on one V100 GPU. While you're waiting for the training to complete, feel free to check out [how Textual Inversion works](#how-it-works) in the section below if you're curious! - - - - - -```bash -export MODEL_NAME="runwayml/stable-diffusion-v1-5" -export DATA_DIR="path-to-dir-containing-images" - -accelerate launch textual_inversion.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_data_dir=$DATA_DIR \ - --learnable_property="object" \ - --placeholder_token="" --initializer_token="toy" \ - --resolution=512 \ - --train_batch_size=1 \ - --gradient_accumulation_steps=4 \ - --max_train_steps=3000 \ - --learning_rate=5.0e-04 --scale_lr \ - --lr_scheduler="constant" \ - --lr_warmup_steps=0 \ - --output_dir="textual_inversion_cat" -``` - - -If you have access to TPUs, try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py) to train even faster (this'll also work for GPUs). With the same configuration settings, the Flax training script should be at least 70% faster than the PyTorch training script! ⚡️ - -Before you begin, make sure you install the Flax specific dependencies: - -```bash -pip install -U -r requirements_flax.txt -``` - -Then you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py): - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export DATA_DIR="path-to-dir-containing-images" - -python textual_inversion_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_data_dir=$DATA_DIR \ - --learnable_property="object" \ - --placeholder_token="" --initializer_token="toy" \ - --resolution=512 \ - --train_batch_size=1 \ - --max_train_steps=3000 \ - --learning_rate=5.0e-04 --scale_lr \ - --output_dir="textual_inversion_cat" -``` - - - -### Intermediate logging - -If you're interested in following along with your model training progress, you can save the generated images from the training process. Add the following arguments to the training script to enable intermediate logging: - -- `validation_prompt`, the prompt used to generate samples (this is set to `None` by default and intermediate logging is disabled) -- `num_validation_images`, the number of sample images to generate -- `validation_steps`, the number of steps before generating `num_validation_images` from the `validation_prompt` - -```bash ---validation_prompt="A backpack" ---num_validation_images=4 ---validation_steps=100 -``` - -## Inference - -Once you have trained a model, you can use it for inference with the [`StableDiffusionPipeline`]. Make sure you include the `placeholder_token` in your prompt, in this case, it is ``. - - - -```python -from diffusers import StableDiffusionPipeline - -model_id = "path-to-your-trained-model" -pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") - -prompt = "A backpack" - -image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] - -image.save("cat-backpack.png") -``` - - -```python -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline - -model_path = "path-to-your-trained-model" -pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16) - -prompt = "A backpack" -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 50 - -num_samples = jax.device_count() -prompt = num_samples * [prompt] -prompt_ids = pipeline.prepare_inputs(prompt) - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -image.save("cat-backpack.png") -``` - - - -## How it works - -![Diagram from the paper showing overview](https://textual-inversion.github.io/static/images/training/training.JPG) -Architecture overview from the Textual Inversion blog post. - -Usually, text prompts are tokenized into an embedding before being passed to a model, which is often a transformer. Textual Inversion does something similar, but it learns a new token embedding, `v*`, from a special token `S*` in the diagram above. The model output is used to condition the diffusion model, which helps the diffusion model understand the prompt and new concepts from just a few example images. - -To do this, Textual Inversion uses a generator model and noisy versions of the training images. The generator tries to predict less noisy versions of the images, and the token embedding `v*` is optimized based on how well the generator does. If the token embedding successfully captures the new concept, it gives more useful information to the diffusion model and helps create clearer images with less noise. This optimization process typically occurs after several thousand steps of exposure to a variety of prompt and image variants. diff --git a/diffusers/docs/source/en/training/unconditional_training.mdx b/diffusers/docs/source/en/training/unconditional_training.mdx deleted file mode 100644 index 26517fd1fcf8cf00819817a3630b570be041affd..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/training/unconditional_training.mdx +++ /dev/null @@ -1,201 +0,0 @@ - - -# Unconditional image generation - -Unconditional image generation is not conditioned on any text or images, unlike text- or image-to-image models. It only generates images that resemble its training data distribution. - - - - -This guide will show you how to train an unconditional image generation model on existing datasets as well as your own custom dataset. All the training scripts for unconditional image generation can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation) if you're interested in learning more about the training details. - -Before running the script, make sure you install the library's training dependencies: - -```bash -pip install diffusers[training] accelerate datasets -``` - -Next, initialize an 🤗 [Accelerate](https://github.com/huggingface/accelerate/) environment with: - -```bash -accelerate config -``` - -To setup a default 🤗 Accelerate environment without choosing any configurations: - -```bash -accelerate config default -``` - -Or if your environment doesn't support an interactive shell like a notebook, you can use: - -```bash -from accelerate.utils import write_basic_config - -write_basic_config() -``` - -## Upload model to Hub - -You can upload your model on the Hub by adding the following argument to the training script: - -```bash ---push_to_hub -``` - -## Save and load checkpoints - -It is a good idea to regularly save checkpoints in case anything happens during training. To save a checkpoint, pass the following argument to the training script: - -```bash ---checkpointing_steps=500 -``` - -The full training state is saved in a subfolder in the `output_dir` every 500 steps, which allows you to load a checkpoint and resume training if you pass the `--resume_from_checkpoint` argument to the training script: - -```bash ---resume_from_checkpoint="checkpoint-1500" -``` - -## Finetuning - -You're ready to launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py) now! Specify the dataset name to finetune on with the `--dataset_name` argument and then save it to the path in `--output_dir`. - - - -💡 A full training run takes 2 hours on 4xV100 GPUs. - - - -For example, to finetune on the [Oxford Flowers](https://huggingface.co/datasets/huggan/flowers-102-categories) dataset: - -```bash -accelerate launch train_unconditional.py \ - --dataset_name="huggan/flowers-102-categories" \ - --resolution=64 \ - --output_dir="ddpm-ema-flowers-64" \ - --train_batch_size=16 \ - --num_epochs=100 \ - --gradient_accumulation_steps=1 \ - --learning_rate=1e-4 \ - --lr_warmup_steps=500 \ - --mixed_precision=no \ - --push_to_hub -``` - -
- -
- -Or if you want to train your model on the [Pokemon](https://huggingface.co/datasets/huggan/pokemon) dataset: - -```bash -accelerate launch train_unconditional.py \ - --dataset_name="huggan/pokemon" \ - --resolution=64 \ - --output_dir="ddpm-ema-pokemon-64" \ - --train_batch_size=16 \ - --num_epochs=100 \ - --gradient_accumulation_steps=1 \ - --learning_rate=1e-4 \ - --lr_warmup_steps=500 \ - --mixed_precision=no \ - --push_to_hub -``` - -
- -
- -## Finetuning with your own data - -There are two ways to finetune a model on your own dataset: - -- provide your own folder of images to the `--train_data_dir` argument -- upload your dataset to the Hub and pass the dataset repository id to the `--dataset_name` argument. - - - -💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide. - - - -Below, we explain both in more detail. - -### Provide the dataset as a folder - -If you provide your own dataset as a folder, the script expects the following directory structure: - -```bash -data_dir/xxx.png -data_dir/xxy.png -data_dir/[...]/xxz.png -``` - -Pass the path to the folder containing the images to the `--train_data_dir` argument and launch the training: - -```bash -accelerate launch train_unconditional.py \ - --train_data_dir \ - -``` - -Internally, the script uses the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) to automatically build a dataset from the folder. - -### Upload your data to the Hub - - - -💡 For more details and context about creating and uploading a dataset to the Hub, take a look at the [Image search with 🤗 Datasets](https://huggingface.co/blog/image-search-datasets) post. - - - -To upload your dataset to the Hub, you can start by creating one with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images, from 🤗 Datasets: - -```python -from datasets import load_dataset - -# example 1: local folder -dataset = load_dataset("imagefolder", data_dir="path_to_your_folder") - -# example 2: local files (supported formats are tar, gzip, zip, xz, rar, zstd) -dataset = load_dataset("imagefolder", data_files="path_to_zip_file") - -# example 3: remote files (supported formats are tar, gzip, zip, xz, rar, zstd) -dataset = load_dataset( - "imagefolder", - data_files="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip", -) - -# example 4: providing several splits -dataset = load_dataset( - "imagefolder", data_files={"train": ["path/to/file1", "path/to/file2"], "test": ["path/to/file3", "path/to/file4"]} -) -``` - -Then you can use the [`~datasets.Dataset.push_to_hub`] method to upload it to the Hub: - -```python -# assuming you have ran the huggingface-cli login command in a terminal -dataset.push_to_hub("name_of_your_dataset") - -# if you want to push to a private repo, simply pass private=True: -dataset.push_to_hub("name_of_your_dataset", private=True) -``` - -Now train your model by simply setting the `--dataset_name` argument to the name of your dataset on the Hub. \ No newline at end of file diff --git a/diffusers/docs/source/en/tutorials/basic_training.mdx b/diffusers/docs/source/en/tutorials/basic_training.mdx deleted file mode 100644 index 435de38d832f240127d9240208cd91b7fc1e07cf..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/tutorials/basic_training.mdx +++ /dev/null @@ -1,415 +0,0 @@ - - -[[open-in-colab]] - -# Train a diffusion model - -Unconditional image generation is a popular application of diffusion models that generates images that look like those in the dataset used for training. Typically, the best results are obtained from finetuning a pretrained model on a specific dataset. You can find many of these checkpoints on the [Hub](https://huggingface.co/search/full-text?q=unconditional-image-generation&type=model), but if you can't find one you like, you can always train your own! - -This tutorial will teach you how to train a [`UNet2DModel`] from scratch on a subset of the [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) dataset to generate your own 🦋 butterflies 🦋. - - - -💡 This training tutorial is based on the [Training with 🧨 Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook. For additional details and context about diffusion models like how they work, check out the notebook! - - - -Before you begin, make sure you have 🤗 Datasets installed to load and preprocess image datasets, and 🤗 Accelerate, to simplify training on any number of GPUs. The following command will also install [TensorBoard](https://www.tensorflow.org/tensorboard) to visualize training metrics (you can also use [Weights & Biases](https://docs.wandb.ai/) to track your training). - -```bash -!pip install diffusers[training] -``` - -We encourage you to share your model with the community, and in order to do that, you'll need to login to your Hugging Face account (create one [here](https://hf.co/join) if you don't already have one!). You can login from a notebook and enter your token when prompted: - -```py ->>> from huggingface_hub import notebook_login - ->>> notebook_login() -``` - -Or login in from the terminal: - -```bash -huggingface-cli login -``` - -Since the model checkpoints are quite large, install [Git-LFS](https://git-lfs.com/) to version these large files: - -```bash -!sudo apt -qq install git-lfs -!git config --global credential.helper store -``` - -## Training configuration - -For convenience, create a `TrainingConfig` class containing the training hyperparameters (feel free to adjust them): - -```py ->>> from dataclasses import dataclass - - ->>> @dataclass -... class TrainingConfig: -... image_size = 128 # the generated image resolution -... train_batch_size = 16 -... eval_batch_size = 16 # how many images to sample during evaluation -... num_epochs = 50 -... gradient_accumulation_steps = 1 -... learning_rate = 1e-4 -... lr_warmup_steps = 500 -... save_image_epochs = 10 -... save_model_epochs = 30 -... mixed_precision = "fp16" # `no` for float32, `fp16` for automatic mixed precision -... output_dir = "ddpm-butterflies-128" # the model name locally and on the HF Hub - -... push_to_hub = True # whether to upload the saved model to the HF Hub -... hub_private_repo = False -... overwrite_output_dir = True # overwrite the old model when re-running the notebook -... seed = 0 - - ->>> config = TrainingConfig() -``` - -## Load the dataset - -You can easily load the [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) dataset with the 🤗 Datasets library: - -```py ->>> from datasets import load_dataset - ->>> config.dataset_name = "huggan/smithsonian_butterflies_subset" ->>> dataset = load_dataset(config.dataset_name, split="train") -``` - - - -💡 You can find additional datasets from the [HugGan Community Event](https://huggingface.co/huggan) or you can use your own dataset by creating a local [`ImageFolder`](https://huggingface.co/docs/datasets/image_dataset#imagefolder). Set `config.dataset_name` to the repository id of the dataset if it is from the HugGan Community Event, or `imagefolder` if you're using your own images. - - - -🤗 Datasets uses the [`~datasets.Image`] feature to automatically decode the image data and load it as a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html) which we can visualize: - -```py ->>> import matplotlib.pyplot as plt - ->>> fig, axs = plt.subplots(1, 4, figsize=(16, 4)) ->>> for i, image in enumerate(dataset[:4]["image"]): -... axs[i].imshow(image) -... axs[i].set_axis_off() ->>> fig.show() -``` - -
- -
- -The images are all different sizes though, so you'll need to preprocess them first: - -* `Resize` changes the image size to the one defined in `config.image_size`. -* `RandomHorizontalFlip` augments the dataset by randomly mirroring the images. -* `Normalize` is important to rescale the pixel values into a [-1, 1] range, which is what the model expects. - -```py ->>> from torchvision import transforms - ->>> preprocess = transforms.Compose( -... [ -... transforms.Resize((config.image_size, config.image_size)), -... transforms.RandomHorizontalFlip(), -... transforms.ToTensor(), -... transforms.Normalize([0.5], [0.5]), -... ] -... ) -``` - -Use 🤗 Datasets' [`~datasets.Dataset.set_transform`] method to apply the `preprocess` function on the fly during training: - -```py ->>> def transform(examples): -... images = [preprocess(image.convert("RGB")) for image in examples["image"]] -... return {"images": images} - - ->>> dataset.set_transform(transform) -``` - -Feel free to visualize the images again to confirm that they've been resized. Now you're ready to wrap the dataset in a [DataLoader](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader) for training! - -```py ->>> import torch - ->>> train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=config.train_batch_size, shuffle=True) -``` - -## Create a UNet2DModel - -Pretrained models in 🧨 Diffusers are easily created from their model class with the parameters you want. For example, to create a [`UNet2DModel`]: - -```py ->>> from diffusers import UNet2DModel - ->>> model = UNet2DModel( -... sample_size=config.image_size, # the target image resolution -... in_channels=3, # the number of input channels, 3 for RGB images -... out_channels=3, # the number of output channels -... layers_per_block=2, # how many ResNet layers to use per UNet block -... block_out_channels=(128, 128, 256, 256, 512, 512), # the number of output channels for each UNet block -... down_block_types=( -... "DownBlock2D", # a regular ResNet downsampling block -... "DownBlock2D", -... "DownBlock2D", -... "DownBlock2D", -... "AttnDownBlock2D", # a ResNet downsampling block with spatial self-attention -... "DownBlock2D", -... ), -... up_block_types=( -... "UpBlock2D", # a regular ResNet upsampling block -... "AttnUpBlock2D", # a ResNet upsampling block with spatial self-attention -... "UpBlock2D", -... "UpBlock2D", -... "UpBlock2D", -... "UpBlock2D", -... ), -... ) -``` - -It is often a good idea to quickly check the sample image shape matches the model output shape: - -```py ->>> sample_image = dataset[0]["images"].unsqueeze(0) ->>> print("Input shape:", sample_image.shape) -Input shape: torch.Size([1, 3, 128, 128]) - ->>> print("Output shape:", model(sample_image, timestep=0).sample.shape) -Output shape: torch.Size([1, 3, 128, 128]) -``` - -Great! Next, you'll need a scheduler to add some noise to the image. - -## Create a scheduler - -The scheduler behaves differently depending on whether you're using the model for training or inference. During inference, the scheduler generates image from the noise. During training, the scheduler takes a model output - or a sample - from a specific point in the diffusion process and applies noise to the image according to a *noise schedule* and an *update rule*. - -Let's take a look at the [`DDPMScheduler`] and use the `add_noise` method to add some random noise to the `sample_image` from before: - -```py ->>> import torch ->>> from PIL import Image ->>> from diffusers import DDPMScheduler - ->>> noise_scheduler = DDPMScheduler(num_train_timesteps=1000) ->>> noise = torch.randn(sample_image.shape) ->>> timesteps = torch.LongTensor([50]) ->>> noisy_image = noise_scheduler.add_noise(sample_image, noise, timesteps) - ->>> Image.fromarray(((noisy_image.permute(0, 2, 3, 1) + 1.0) * 127.5).type(torch.uint8).numpy()[0]) -``` - -
- -
- -The training objective of the model is to predict the noise added to the image. The loss at this step can be calculated by: - -```py ->>> import torch.nn.functional as F - ->>> noise_pred = model(noisy_image, timesteps).sample ->>> loss = F.mse_loss(noise_pred, noise) -``` - -## Train the model - -By now, you have most of the pieces to start training the model and all that's left is putting everything together. - -First, you'll need an optimizer and a learning rate scheduler: - -```py ->>> from diffusers.optimization import get_cosine_schedule_with_warmup - ->>> optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate) ->>> lr_scheduler = get_cosine_schedule_with_warmup( -... optimizer=optimizer, -... num_warmup_steps=config.lr_warmup_steps, -... num_training_steps=(len(train_dataloader) * config.num_epochs), -... ) -``` - -Then, you'll need a way to evaluate the model. For evaluation, you can use the [`DDPMPipeline`] to generate a batch of sample images and save it as a grid: - -```py ->>> from diffusers import DDPMPipeline ->>> import math ->>> import os - - ->>> def make_grid(images, rows, cols): -... w, h = images[0].size -... grid = Image.new("RGB", size=(cols * w, rows * h)) -... for i, image in enumerate(images): -... grid.paste(image, box=(i % cols * w, i // cols * h)) -... return grid - - ->>> def evaluate(config, epoch, pipeline): -... # Sample some images from random noise (this is the backward diffusion process). -... # The default pipeline output type is `List[PIL.Image]` -... images = pipeline( -... batch_size=config.eval_batch_size, -... generator=torch.manual_seed(config.seed), -... ).images - -... # Make a grid out of the images -... image_grid = make_grid(images, rows=4, cols=4) - -... # Save the images -... test_dir = os.path.join(config.output_dir, "samples") -... os.makedirs(test_dir, exist_ok=True) -... image_grid.save(f"{test_dir}/{epoch:04d}.png") -``` - -Now you can wrap all these components together in a training loop with 🤗 Accelerate for easy TensorBoard logging, gradient accumulation, and mixed precision training. To upload the model to the Hub, write a function to get your repository name and information and then push it to the Hub. - - - -💡 The training loop below may look intimidating and long, but it'll be worth it later when you launch your training in just one line of code! If you can't wait and want to start generating images, feel free to copy and run the code below. You can always come back and examine the training loop more closely later, like when you're waiting for your model to finish training. 🤗 - - - -```py ->>> from accelerate import Accelerator ->>> from huggingface_hub import HfFolder, Repository, whoami ->>> from tqdm.auto import tqdm ->>> from pathlib import Path ->>> import os - - ->>> def get_full_repo_name(model_id: str, organization: str = None, token: str = None): -... if token is None: -... token = HfFolder.get_token() -... if organization is None: -... username = whoami(token)["name"] -... return f"{username}/{model_id}" -... else: -... return f"{organization}/{model_id}" - - ->>> def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler): -... # Initialize accelerator and tensorboard logging -... accelerator = Accelerator( -... mixed_precision=config.mixed_precision, -... gradient_accumulation_steps=config.gradient_accumulation_steps, -... log_with="tensorboard", -... logging_dir=os.path.join(config.output_dir, "logs"), -... ) -... if accelerator.is_main_process: -... if config.push_to_hub: -... repo_name = get_full_repo_name(Path(config.output_dir).name) -... repo = Repository(config.output_dir, clone_from=repo_name) -... elif config.output_dir is not None: -... os.makedirs(config.output_dir, exist_ok=True) -... accelerator.init_trackers("train_example") - -... # Prepare everything -... # There is no specific order to remember, you just need to unpack the -... # objects in the same order you gave them to the prepare method. -... model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( -... model, optimizer, train_dataloader, lr_scheduler -... ) - -... global_step = 0 - -... # Now you train the model -... for epoch in range(config.num_epochs): -... progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process) -... progress_bar.set_description(f"Epoch {epoch}") - -... for step, batch in enumerate(train_dataloader): -... clean_images = batch["images"] -... # Sample noise to add to the images -... noise = torch.randn(clean_images.shape).to(clean_images.device) -... bs = clean_images.shape[0] - -... # Sample a random timestep for each image -... timesteps = torch.randint( -... 0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device -... ).long() - -... # Add noise to the clean images according to the noise magnitude at each timestep -... # (this is the forward diffusion process) -... noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps) - -... with accelerator.accumulate(model): -... # Predict the noise residual -... noise_pred = model(noisy_images, timesteps, return_dict=False)[0] -... loss = F.mse_loss(noise_pred, noise) -... accelerator.backward(loss) - -... accelerator.clip_grad_norm_(model.parameters(), 1.0) -... optimizer.step() -... lr_scheduler.step() -... optimizer.zero_grad() - -... progress_bar.update(1) -... logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step} -... progress_bar.set_postfix(**logs) -... accelerator.log(logs, step=global_step) -... global_step += 1 - -... # After each epoch you optionally sample some demo images with evaluate() and save the model -... if accelerator.is_main_process: -... pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler) - -... if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1: -... evaluate(config, epoch, pipeline) - -... if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1: -... if config.push_to_hub: -... repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True) -... else: -... pipeline.save_pretrained(config.output_dir) -``` - -Phew, that was quite a bit of code! But you're finally ready to launch the training with 🤗 Accelerate's [`~accelerate.notebook_launcher`] function. Pass the function the training loop, all the training arguments, and the number of processes (you can change this value to the number of GPUs available to you) to use for training: - -```py ->>> from accelerate import notebook_launcher - ->>> args = (config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler) - ->>> notebook_launcher(train_loop, args, num_processes=1) -``` - -Once training is complete, take a look at the final 🦋 images 🦋 generated by your diffusion model! - -```py ->>> import glob - ->>> sample_images = sorted(glob.glob(f"{config.output_dir}/samples/*.png")) ->>> Image.open(sample_images[-1]) -``` - -
- -
- -## Next steps - -Unconditional image generation is one example of a task that can be trained. You can explore other tasks and training techniques by visiting the [🧨 Diffusers Training Examples](./training/overview) page. Here are some examples of what you can learn: - -* [Textual Inversion](./training/text_inversion), an algorithm that teaches a model a specific visual concept and integrates it into the generated image. -* [DreamBooth](./training/dreambooth), a technique for generating personalized images of a subject given several input images of the subject. -* [Guide](./training/text2image) to finetuning a Stable Diffusion model on your own dataset. -* [Guide](./training/lora) to using LoRA, a memory-efficient technique for finetuning really large models faster. diff --git a/diffusers/docs/source/en/tutorials/tutorial_overview.mdx b/diffusers/docs/source/en/tutorials/tutorial_overview.mdx deleted file mode 100644 index 0cec9a317ddbef7488204f9e8cd6c7f07aca6b79..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/tutorials/tutorial_overview.mdx +++ /dev/null @@ -1,23 +0,0 @@ - - -# Overview - -Welcome to 🧨 Diffusers! If you're new to diffusion models and generative AI, and want to learn more, then you've come to the right place. These beginner-friendly tutorials are designed to provide a gentle introduction to diffusion models and help you understand the library fundamentals - the core components and how 🧨 Diffusers is meant to be used. - -You'll learn how to use a pipeline for inference to rapidly generate things, and then deconstruct that pipeline to really understand how to use the library as a modular toolbox for building your own diffusion systems. In the next lesson, you'll learn how to train your own diffusion model to generate what you want. - -After completing the tutorials, you'll have gained the necessary skills to start exploring the library on your own and see how to use it for your own projects and applications. - -Feel free to join our community on [Discord](https://discord.com/invite/JfAtkvEtRb) or the [forums](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) to connect and collaborate with other users and developers! - -Let's start diffusing! 🧨 \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/audio.mdx b/diffusers/docs/source/en/using-diffusers/audio.mdx deleted file mode 100644 index e1d669882fc46258f3d198f4ceb0a5e747ce6990..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/audio.mdx +++ /dev/null @@ -1,16 +0,0 @@ - - -# Using Diffusers for audio - -[`DanceDiffusionPipeline`] and [`AudioDiffusionPipeline`] can be used to generate -audio rapidly! More coming soon! \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/conditional_image_generation.mdx b/diffusers/docs/source/en/using-diffusers/conditional_image_generation.mdx deleted file mode 100644 index 0b5c02415d87a7c65164c69f73a692a7aa2e33ed..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/conditional_image_generation.mdx +++ /dev/null @@ -1,60 +0,0 @@ - - -# Conditional image generation - -[[open-in-colab]] - -Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise. - -The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. - -Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download. - -In this guide, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256): - -```python ->>> from diffusers import DiffusionPipeline - ->>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256") -``` - -The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. -Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. -You can move the generator object to a GPU, just like you would in PyTorch: - -```python ->>> generator.to("cuda") -``` - -Now you can use the `generator` on your text prompt: - -```python ->>> image = generator("An image of a squirrel in Picasso style").images[0] -``` - -The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. - -You can save the image by calling: - -```python ->>> image.save("image_of_squirrel_painting.png") -``` - -Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality! - - \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/contribute_pipeline.mdx b/diffusers/docs/source/en/using-diffusers/contribute_pipeline.mdx deleted file mode 100644 index ce3f3e8232529e294ce1308d230b96dc79818cd4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/contribute_pipeline.mdx +++ /dev/null @@ -1,169 +0,0 @@ - - -# How to build a community pipeline - -*Note*: this page was built from the GitHub Issue on Community Pipelines [#841](https://github.com/huggingface/diffusers/issues/841). - -Let's make an example! -Say you want to define a pipeline that just does a single forward pass to a U-Net and then calls a scheduler only once (Note, this doesn't make any sense from a scientific point of view, but only represents an example of how things work under the hood). - -Cool! So you open your favorite IDE and start creating your pipeline 💻. -First, what model weights and configurations do we need? -We have a U-Net and a scheduler, so our pipeline should take a U-Net and a scheduler as an argument. -Also, as stated above, you'd like to be able to load weights and the scheduler config for Hub and share your code with others, so we'll inherit from `DiffusionPipeline`: - -```python -from diffusers import DiffusionPipeline -import torch - - -class UnetSchedulerOneForwardPipeline(DiffusionPipeline): - def __init__(self, unet, scheduler): - super().__init__() -``` - -Now, we must save the `unet` and `scheduler` in a config file so that you can save your pipeline with `save_pretrained`. -Therefore, make sure you add every component that is save-able to the `register_modules` function: - -```python -from diffusers import DiffusionPipeline -import torch - - -class UnetSchedulerOneForwardPipeline(DiffusionPipeline): - def __init__(self, unet, scheduler): - super().__init__() - - self.register_modules(unet=unet, scheduler=scheduler) -``` - -Cool, the init is done! 🔥 Now, let's go into the forward pass, which we recommend defining as `__call__` . Here you're given all the creative freedom there is. For our amazing "one-step" pipeline, we simply create a random image and call the unet once and the scheduler once: - -```python -from diffusers import DiffusionPipeline -import torch - - -class UnetSchedulerOneForwardPipeline(DiffusionPipeline): - def __init__(self, unet, scheduler): - super().__init__() - - self.register_modules(unet=unet, scheduler=scheduler) - - def __call__(self): - image = torch.randn( - (1, self.unet.in_channels, self.unet.sample_size, self.unet.sample_size), - ) - timestep = 1 - - model_output = self.unet(image, timestep).sample - scheduler_output = self.scheduler.step(model_output, timestep, image).prev_sample - - return scheduler_output -``` - -Cool, that's it! 🚀 You can now run this pipeline by passing a `unet` and a `scheduler` to the init: - -```python -from diffusers import DDPMScheduler, Unet2DModel - -scheduler = DDPMScheduler() -unet = UNet2DModel() - -pipeline = UnetSchedulerOneForwardPipeline(unet=unet, scheduler=scheduler) - -output = pipeline() -``` - -But what's even better is that you can load pre-existing weights into the pipeline if they match exactly your pipeline structure. This is e.g. the case for [https://huggingface.co/google/ddpm-cifar10-32](https://huggingface.co/google/ddpm-cifar10-32) so that we can do the following: - -```python -pipeline = UnetSchedulerOneForwardPipeline.from_pretrained("google/ddpm-cifar10-32") - -output = pipeline() -``` - -We want to share this amazing pipeline with the community, so we would open a PR request to add the following code under `one_step_unet.py` to [https://github.com/huggingface/diffusers/tree/main/examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) . - -```python -from diffusers import DiffusionPipeline -import torch - - -class UnetSchedulerOneForwardPipeline(DiffusionPipeline): - def __init__(self, unet, scheduler): - super().__init__() - - self.register_modules(unet=unet, scheduler=scheduler) - - def __call__(self): - image = torch.randn( - (1, self.unet.in_channels, self.unet.sample_size, self.unet.sample_size), - ) - timestep = 1 - - model_output = self.unet(image, timestep).sample - scheduler_output = self.scheduler.step(model_output, timestep, image).prev_sample - - return scheduler_output -``` - -Our amazing pipeline got merged here: [#840](https://github.com/huggingface/diffusers/pull/840). -Now everybody that has `diffusers >= 0.4.0` installed can use our pipeline magically 🪄 as follows: - -```python -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet") -pipe() -``` - -Another way to upload your custom_pipeline, besides sending a PR, is uploading the code that contains it to the Hugging Face Hub, [as exemplified here](https://huggingface.co/docs/diffusers/using-diffusers/custom_pipeline_overview#loading-custom-pipelines-from-the-hub). - -**Try it out now - it works!** - -In general, you will want to create much more sophisticated pipelines, so we recommend looking at existing pipelines here: [https://github.com/huggingface/diffusers/tree/main/examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community). - -IMPORTANT: -You can use whatever package you want in your community pipeline file - as long as the user has it installed, everything will work fine. Make sure you have one and only one pipeline class that inherits from `DiffusionPipeline` as this will be automatically detected. - -## How do community pipelines work? -A community pipeline is a class that has to inherit from ['DiffusionPipeline']: -and that has been added to `examples/community` [files](https://github.com/huggingface/diffusers/tree/main/examples/community). -The community can load the pipeline code via the custom_pipeline argument from DiffusionPipeline. See docs [here](https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.custom_pipeline): - -This means: -The model weights and configs of the pipeline should be loaded from the `pretrained_model_name_or_path` [argument](https://huggingface.co/docs/diffusers/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path): -whereas the code that powers the community pipeline is defined in a file added in [`examples/community`](https://github.com/huggingface/diffusers/tree/main/examples/community). - -Now, it might very well be that only some of your pipeline components weights can be downloaded from an official repo. -The other components should then be passed directly to init as is the case for the ClIP guidance notebook [here](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb#scrollTo=z9Kglma6hjki). - -The magic behind all of this is that we load the code directly from GitHub. You can check it out in more detail if you follow the functionality defined here: - -```python -# 2. Load the pipeline class, if using custom module then load it from the hub -# if we load from explicit class, let's use it -if custom_pipeline is not None: - pipeline_class = get_class_from_dynamic_module( - custom_pipeline, module_file=CUSTOM_PIPELINE_FILE_NAME, cache_dir=custom_pipeline - ) -elif cls != DiffusionPipeline: - pipeline_class = cls -else: - diffusers_module = importlib.import_module(cls.__module__.split(".")[0]) - pipeline_class = getattr(diffusers_module, config_dict["_class_name"]) -``` - -This is why a community pipeline merged to GitHub will be directly available to all `diffusers` packages. - diff --git a/diffusers/docs/source/en/using-diffusers/controlling_generation.mdx b/diffusers/docs/source/en/using-diffusers/controlling_generation.mdx deleted file mode 100644 index b1ba17cd2c671c89e8d88acfed013cc69007aac5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/controlling_generation.mdx +++ /dev/null @@ -1,167 +0,0 @@ - - -# Controlled generation - -Controlling outputs generated by diffusion models has been long pursued by the community and is now an active research topic. In many popular diffusion models, subtle changes in inputs, both images and text prompts, can drastically change outputs. In an ideal world we want to be able to control how semantics are preserved and changed. - -Most examples of preserving semantics reduce to being able to accurately map a change in input to a change in output. I.e. adding an adjective to a subject in a prompt preserves the entire image, only modifying the changed subject. Or, image variation of a particular subject preserves the subject's pose. - -Additionally, there are qualities of generated images that we would like to influence beyond semantic preservation. I.e. in general, we would like our outputs to be of good quality, adhere to a particular style, or be realistic. - -We will document some of the techniques `diffusers` supports to control generation of diffusion models. Much is cutting edge research and can be quite nuanced. If something needs clarifying or you have a suggestion, don't hesitate to open a discussion on the [forum](https://discuss.huggingface.co/) or a [GitHub issue](https://github.com/huggingface/diffusers/issues). - -We provide a high level explanation of how the generation can be controlled as well as a snippet of the technicals. For more in depth explanations on the technicals, the original papers which are linked from the pipelines are always the best resources. - -Depending on the use case, one should choose a technique accordingly. In many cases, these techniques can be combined. For example, one can combine Textual Inversion with SEGA to provide more semantic guidance to the outputs generated using Textual Inversion. - -Unless otherwise mentioned, these are techniques that work with existing models and don't require their own weights. - -1. [Instruct Pix2Pix](#instruct-pix2pix) -2. [Pix2Pix Zero](#pix2pixzero) -3. [Attend and Excite](#attend-and-excite) -4. [Semantic Guidance](#semantic-guidance) -5. [Self-attention Guidance](#self-attention-guidance) -6. [Depth2Image](#depth2image) -7. [MultiDiffusion Panorama](#multidiffusion-panorama) -8. [DreamBooth](#dreambooth) -9. [Textual Inversion](#textual-inversion) -10. [ControlNet](#controlnet) -11. [Prompt Weighting](#prompt-weighting) - -## Instruct Pix2Pix - -[Paper](https://arxiv.org/abs/2211.09800) - -[Instruct Pix2Pix](../api/pipelines/stable_diffusion/pix2pix) is fine-tuned from stable diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image. -Instruct Pix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts. - -See [here](../api/pipelines/stable_diffusion/pix2pix) for more information on how to use it. - -## Pix2Pix Zero - -[Paper](https://arxiv.org/abs/2302.03027) - -[Pix2Pix Zero](../api/pipelines/stable_diffusion/pix2pix_zero) allows modifying an image so that one concept or subject is translated to another one while preserving general image semantics. - -The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation. - -Pix2Pix Zero can be used both to edit synthetic images as well as real images. -- To edit synthetic images, one first generates an image given a caption. -Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image. -- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies ddim inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image. - - - -Pix2Pix Zero is the first model that allows "zero-shot" image editing. This means that the model -can edit an image in less than a minute on a consumer GPU as shown [here](../api/pipelines/stable_diffusion/pix2pix_zero#usage-example). - - - -As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall -pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img). - -See [here](../api/pipelines/stable_diffusion/pix2pix_zero) for more information on how to use it. - -## Attend and Excite - -[Paper](https://arxiv.org/abs/2301.13826) - -[Attend and Excite](../api/pipelines/stable_diffusion/attend_and_excite) allows subjects in the prompt to be faithfully represented in the final image. - -A set of token indices are given as input, corresponding to the subjects in the prompt that need to be present in the image. During denoising, each token index is guaranteed to have a minimum attention threshold for at least one patch of the image. The intermediate latents are iteratively optimized during the denoising process to strengthen the attention of the most neglected subject token until the attention threshold is passed for all subject tokens. - -Like Pix2Pix Zero, Attend and Excite also involves a mini optimization loop (leaving the pre-trained weights untouched) in its pipeline and can require more memory than the usual `StableDiffusionPipeline`. - -See [here](../api/pipelines/stable_diffusion/attend_and_excite) for more information on how to use it. - -## Semantic Guidance (SEGA) - -[Paper](https://arxiv.org/abs/2301.12247) - -SEGA allows applying or removing one or more concepts from an image. The strength of the concept can also be controlled. I.e. the smile concept can be used to incrementally increase or decrease the smile of a portrait. - -Similar to how classifier free guidance provides guidance via empty prompt inputs, SEGA provides guidance on conceptual prompts. Multiple of these conceptual prompts can be applied simultaneously. Each conceptual prompt can either add or remove their concept depending on if the guidance is applied positively or negatively. - -Unlike Pix2Pix Zero or Attend and Excite, SEGA directly interacts with the diffusion process instead of performing any explicit gradient-based optimization. - -See [here](../api/pipelines/semantic_stable_diffusion) for more information on how to use it. - -## Self-attention Guidance (SAG) - -[Paper](https://arxiv.org/abs/2210.00939) - -[Self-attention Guidance](../api/pipelines/stable_diffusion/self_attention_guidance) improves the general quality of images. - -SAG provides guidance from predictions not conditioned on high-frequency details to fully conditioned images. The high frequency details are extracted out of the UNet self-attention maps. - -See [here](../api/pipelines/stable_diffusion/self_attention_guidance) for more information on how to use it. - -## Depth2Image - -[Project](https://huggingface.co/stabilityai/stable-diffusion-2-depth) - -[Depth2Image](../pipelines/stable_diffusion_2#depthtoimage) is fine-tuned from Stable Diffusion to better preserve semantics for text guided image variation. - -It conditions on a monocular depth estimate of the original image. - -See [here](../api/pipelines/stable_diffusion_2#depthtoimage) for more information on how to use it. - - - -An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former -involves fine-tuning the pre-trained weights while the latter does not. This means that you can -apply Pix2Pix Zero to any of the available Stable Diffusion models. - - - -## MultiDiffusion Panorama - -[Paper](https://arxiv.org/abs/2302.08113) - -MultiDiffusion defines a new generation process over a pre-trained diffusion model. This process binds together multiple diffusion generation methods that can be readily applied to generate high quality and diverse images. Results adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. -[MultiDiffusion Panorama](../api/pipelines/stable_diffusion/panorama) allows to generate high-quality images at arbitrary aspect ratios (e.g., panoramas). - -See [here](../api/pipelines/stable_diffusion/panorama) for more information on how to use it to generate panoramic images. - -## Fine-tuning your own models - -In addition to pre-trained models, Diffusers has training scripts for fine-tuning models on user-provided data. - -### DreamBooth - -[DreamBooth](../training/dreambooth) fine-tunes a model to teach it about a new subject. I.e. a few pictures of a person can be used to generate images of that person in different styles. - -See [here](../training/dreambooth) for more information on how to use it. - -### Textual Inversion - -[Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style. - -See [here](../training/text_inversion) for more information on how to use it. - -## ControlNet - -[Paper](https://arxiv.org/abs/2302.05543) - -[ControlNet](../api/pipelines/stable_diffusion/controlnet) is an auxiliary network which adds an extra condition. -There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles, -depth maps, and semantic segmentations. - -See [here](../api/pipelines/stable_diffusion/controlnet) for more information on how to use it. - -## Prompt Weighting - -Prompt weighting is a simple technique that puts more attention weight on certain parts of the text -input. - -For a more in-detail explanation and examples, see [here](../using-diffusers/weighted_prompts). diff --git a/diffusers/docs/source/en/using-diffusers/custom_pipeline_examples.mdx b/diffusers/docs/source/en/using-diffusers/custom_pipeline_examples.mdx deleted file mode 100644 index 2dfa71f0d33cd4a4faebc05ec35712c39fe340f5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/custom_pipeline_examples.mdx +++ /dev/null @@ -1,280 +0,0 @@ - - -# Custom Pipelines - -> **For more information about community pipelines, please have a look at [this issue](https://github.com/huggingface/diffusers/issues/841).** - -**Community** examples consist of both inference and training examples that have been added by the community. -Please have a look at the following table to get an overview of all community examples. Click on the **Code Example** to get a copy-and-paste ready code example that you can try out. -If a community doesn't work as expected, please open an issue and ping the author on it. - -| Example | Description | Code Example | Colab | Author | -|:---------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------:| -| CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) | -| One Step U-Net (Dummy) | Example showcasing of how to use Community Pipelines (see https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | -| Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) | -| Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | -| Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | - | [SkyTNT](https://github.com/SkyTNT) | -| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) - -To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly. -```py -pipe = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", custom_pipeline="filename_in_the_community_folder" -) -``` - -## Example usages - -### CLIP Guided Stable Diffusion - -CLIP guided stable diffusion can help to generate more realistic images -by guiding stable diffusion at every denoising step with an additional CLIP model. - -The following code requires roughly 12GB of GPU RAM. - -```python -from diffusers import DiffusionPipeline -from transformers import CLIPImageProcessor, CLIPModel -import torch - - -feature_extractor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K") -clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16) - - -guided_pipeline = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - custom_pipeline="clip_guided_stable_diffusion", - clip_model=clip_model, - feature_extractor=feature_extractor, - torch_dtype=torch.float16, -) -guided_pipeline.enable_attention_slicing() -guided_pipeline = guided_pipeline.to("cuda") - -prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece" - -generator = torch.Generator(device="cuda").manual_seed(0) -images = [] -for i in range(4): - image = guided_pipeline( - prompt, - num_inference_steps=50, - guidance_scale=7.5, - clip_guidance_scale=100, - num_cutouts=4, - use_cutouts=False, - generator=generator, - ).images[0] - images.append(image) - -# save images locally -for i, img in enumerate(images): - img.save(f"./clip_guided_sd/image_{i}.png") -``` - -The `images` list contains a list of PIL images that can be saved locally or displayed directly in a google colab. -Generated images tend to be of higher qualtiy than natively using stable diffusion. E.g. the above script generates the following images: - -![clip_guidance](https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/clip_guidance/merged_clip_guidance.jpg). - -### One Step Unet - -The dummy "one-step-unet" can be run as follows: - -```python -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet") -pipe() -``` - -**Note**: This community pipeline is not useful as a feature, but rather just serves as an example of how community pipelines can be added (see https://github.com/huggingface/diffusers/issues/841). - -### Stable Diffusion Interpolation - -The following code can be run on a GPU of at least 8GB VRAM and should take approximately 5 minutes. - -```python -from diffusers import DiffusionPipeline -import torch - -pipe = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - torch_dtype=torch.float16, - safety_checker=None, # Very important for videos...lots of false positives while interpolating - custom_pipeline="interpolate_stable_diffusion", -).to("cuda") -pipe.enable_attention_slicing() - -frame_filepaths = pipe.walk( - prompts=["a dog", "a cat", "a horse"], - seeds=[42, 1337, 1234], - num_interpolation_steps=16, - output_dir="./dreams", - batch_size=4, - height=512, - width=512, - guidance_scale=8.5, - num_inference_steps=50, -) -``` - -The output of the `walk(...)` function returns a list of images saved under the folder as defined in `output_dir`. You can use these images to create videos of stable diffusion. - -> **Please have a look at https://github.com/nateraw/stable-diffusion-videos for more in-detail information on how to create videos using stable diffusion as well as more feature-complete functionality.** - -### Stable Diffusion Mega - -The Stable Diffusion Mega Pipeline lets you use the main use cases of the stable diffusion pipeline in a single class. - -```python -#!/usr/bin/env python3 -from diffusers import DiffusionPipeline -import PIL -import requests -from io import BytesIO -import torch - - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -pipe = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - custom_pipeline="stable_diffusion_mega", - torch_dtype=torch.float16, -) -pipe.to("cuda") -pipe.enable_attention_slicing() - - -### Text-to-Image - -images = pipe.text2img("An astronaut riding a horse").images - -### Image-to-Image - -init_image = download_image( - "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" -) - -prompt = "A fantasy landscape, trending on artstation" - -images = pipe.img2img(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images - -### Inpainting - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) - -prompt = "a cat sitting on a bench" -images = pipe.inpaint(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.75).images -``` - -As shown above this one pipeline can run all both "text-to-image", "image-to-image", and "inpainting" in one pipeline. - -### Long Prompt Weighting Stable Diffusion - -The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]" -The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class. - -#### pytorch - -```python -from diffusers import DiffusionPipeline -import torch - -pipe = DiffusionPipeline.from_pretrained( - "hakurei/waifu-diffusion", custom_pipeline="lpw_stable_diffusion", torch_dtype=torch.float16 -) -pipe = pipe.to("cuda") - -prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms" -neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry" - -pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0] -``` - -#### onnxruntime - -```python -from diffusers import DiffusionPipeline -import torch - -pipe = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - custom_pipeline="lpw_stable_diffusion_onnx", - revision="onnx", - provider="CUDAExecutionProvider", -) - -prompt = "a photo of an astronaut riding a horse on mars, best quality" -neg_prompt = "lowres, bad anatomy, error body, error hair, error arm, error hands, bad hands, error fingers, bad fingers, missing fingers, error legs, bad legs, multiple legs, missing legs, error lighting, error shadow, error reflection, text, error, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry" - -pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0] -``` - -if you see `Token indices sequence length is longer than the specified maximum sequence length for this model ( *** > 77 ) . Running this sequence through the model will result in indexing errors`. Do not worry, it is normal. - -### Speech to Image - -The following code can generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion. - -```Python -import torch - -import matplotlib.pyplot as plt -from datasets import load_dataset -from diffusers import DiffusionPipeline -from transformers import ( - WhisperForConditionalGeneration, - WhisperProcessor, -) - - -device = "cuda" if torch.cuda.is_available() else "cpu" - -ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") - -audio_sample = ds[3] - -text = audio_sample["text"].lower() -speech_data = audio_sample["audio"]["array"] - -model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to(device) -processor = WhisperProcessor.from_pretrained("openai/whisper-small") - -diffuser_pipeline = DiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - custom_pipeline="speech_to_image_diffusion", - speech_model=model, - speech_processor=processor, - - torch_dtype=torch.float16, -) - -diffuser_pipeline.enable_attention_slicing() -diffuser_pipeline = diffuser_pipeline.to(device) - -output = diffuser_pipeline(speech_data) -plt.imshow(output.images[0]) -``` -This example produces the following image: - -![image](https://user-images.githubusercontent.com/45072645/196901736-77d9c6fc-63ee-4072-90b0-dc8b903d63e3.png) \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/custom_pipeline_overview.mdx b/diffusers/docs/source/en/using-diffusers/custom_pipeline_overview.mdx deleted file mode 100644 index 5c342a5a88e9d3b5aede1873a2ef577c2feb81fe..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/custom_pipeline_overview.mdx +++ /dev/null @@ -1,121 +0,0 @@ - - -# Loading and Adding Custom Pipelines - -Diffusers allows you to conveniently load any custom pipeline from the Hugging Face Hub as well as any [official community pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community) -via the [`DiffusionPipeline`] class. - -## Loading custom pipelines from the Hub - -Custom pipelines can be easily loaded from any model repository on the Hub that defines a diffusion pipeline in a `pipeline.py` file. -Let's load a dummy pipeline from [hf-internal-testing/diffusers-dummy-pipeline](https://huggingface.co/hf-internal-testing/diffusers-dummy-pipeline). - -All you need to do is pass the custom pipeline repo id with the `custom_pipeline` argument alongside the repo from where you wish to load the pipeline modules. - -```python -from diffusers import DiffusionPipeline - -pipeline = DiffusionPipeline.from_pretrained( - "google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline" -) -``` - -This will load the custom pipeline as defined in the [model repository](https://huggingface.co/hf-internal-testing/diffusers-dummy-pipeline/blob/main/pipeline.py). - - - -By loading a custom pipeline from the Hugging Face Hub, you are trusting that the code you are loading -is safe 🔒. Make sure to check out the code online before loading & running it automatically. - - - -## Loading official community pipelines - -Community pipelines are summarized in the [community examples folder](https://github.com/huggingface/diffusers/tree/main/examples/community). - -Similarly, you need to pass both the *repo id* from where you wish to load the weights as well as the `custom_pipeline` argument. Here the `custom_pipeline` argument should consist simply of the filename of the community pipeline excluding the `.py` suffix, *e.g.* `clip_guided_stable_diffusion`. - -Since community pipelines are often more complex, one can mix loading weights from an official *repo id* -and passing pipeline modules directly. - -```python -from diffusers import DiffusionPipeline -from transformers import CLIPImageProcessor, CLIPModel - -clip_model_id = "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" - -feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id) -clip_model = CLIPModel.from_pretrained(clip_model_id) - -pipeline = DiffusionPipeline.from_pretrained( - "runwayml/stable-diffusion-v1-5", - custom_pipeline="clip_guided_stable_diffusion", - clip_model=clip_model, - feature_extractor=feature_extractor, -) -``` - -## Adding custom pipelines to the Hub - -To add a custom pipeline to the Hub, all you need to do is to define a pipeline class that inherits -from [`DiffusionPipeline`] in a `pipeline.py` file. -Make sure that the whole pipeline is encapsulated within a single class and that the `pipeline.py` file -has only one such class. - -Let's quickly define an example pipeline. - - -```python -import torch -from diffusers import DiffusionPipeline - - -class MyPipeline(DiffusionPipeline): - def __init__(self, unet, scheduler): - super().__init__() - - self.register_modules(unet=unet, scheduler=scheduler) - - @torch.no_grad() - def __call__(self, batch_size: int = 1, num_inference_steps: int = 50): - # Sample gaussian noise to begin loop - image = torch.randn((batch_size, self.unet.in_channels, self.unet.sample_size, self.unet.sample_size)) - - image = image.to(self.device) - - # set step values - self.scheduler.set_timesteps(num_inference_steps) - - for t in self.progress_bar(self.scheduler.timesteps): - # 1. predict noise model_output - model_output = self.unet(image, t).sample - - # 2. predict previous mean of image x_t-1 and add variance depending on eta - # eta corresponds to η in paper and should be between [0, 1] - # do x_t -> x_t-1 - image = self.scheduler.step(model_output, t, image, eta).prev_sample - - image = (image / 2 + 0.5).clamp(0, 1) - image = image.cpu().permute(0, 2, 3, 1).numpy() - - return image -``` - -Now you can upload this short file under the name `pipeline.py` in your preferred [model repository](https://huggingface.co/docs/hub/models-uploading). For Stable Diffusion pipelines, you may also [join the community organisation for shared pipelines](https://huggingface.co/organizations/sd-diffusers-pipelines-library/share/BUPyDUuHcciGTOKaExlqtfFcyCZsVFdrjr) to upload yours. -Finally, we can load the custom pipeline by passing the model repository name, *e.g.* `sd-diffusers-pipelines-library/my_custom_pipeline` alongside the model repository from where we want to load the `unet` and `scheduler` components. - -```python -my_pipeline = DiffusionPipeline.from_pretrained( - "google/ddpm-cifar10-32", custom_pipeline="patrickvonplaten/my_custom_pipeline" -) -``` diff --git a/diffusers/docs/source/en/using-diffusers/depth2img.mdx b/diffusers/docs/source/en/using-diffusers/depth2img.mdx deleted file mode 100644 index a4141644b006d5ec7cb96f827365a597a7ba02c7..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/depth2img.mdx +++ /dev/null @@ -1,56 +0,0 @@ - - -# Text-guided depth-to-image generation - -[[open-in-colab]] - -The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS). - -Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]: - -```python -import torch -import requests -from PIL import Image - -from diffusers import StableDiffusionDepth2ImgPipeline - -pipe = StableDiffusionDepth2ImgPipeline.from_pretrained( - "stabilityai/stable-diffusion-2-depth", - torch_dtype=torch.float16, -).to("cuda") -``` - -Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated: - -```python -url = "http://images.cocodataset.org/val2017/000000039769.jpg" -init_image = Image.open(requests.get(url, stream=True).raw) -prompt = "two tigers" -n_prompt = "bad, deformed, ugly, bad anatomy" -image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0] -image -``` - -| Input | Output | -|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| -| | | - -Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map! - - diff --git a/diffusers/docs/source/en/using-diffusers/img2img.mdx b/diffusers/docs/source/en/using-diffusers/img2img.mdx deleted file mode 100644 index 71540fbf5dd9ef203158bf5531e327b27915d5a4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/img2img.mdx +++ /dev/null @@ -1,99 +0,0 @@ - - -# Text-guided image-to-image generation - -[[open-in-colab]] - -The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. - -Before you begin, make sure you have all the necessary libraries installed: - -```bash -!pip install diffusers transformers ftfy accelerate -``` - -Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model like [`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion). - -```python -import torch -import requests -from PIL import Image -from io import BytesIO -from diffusers import StableDiffusionImg2ImgPipeline - -device = "cuda" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained("nitrosocke/Ghibli-Diffusion", torch_dtype=torch.float16).to( - device -) -``` - -Download and preprocess an initial image so you can pass it to the pipeline: - -```python -url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image.thumbnail((768, 768)) -init_image -``` - -
- -
- - - -💡 `strength` is a value between 0.0 and 1.0 that controls the amount of noise added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. - - - -Define the prompt (for this checkpoint finetuned on Ghibli-style art, you need to prefix the prompt with the `ghibli style` tokens) and run the pipeline: - -```python -prompt = "ghibli style, a fantasy landscape with castles" -generator = torch.Generator(device=device).manual_seed(1024) -image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0] -image -``` - -
- -
- -You can also try experimenting with a different scheduler to see how that affects the output: - -```python -from diffusers import LMSDiscreteScheduler - -lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config) -pipe.scheduler = lms -generator = torch.Generator(device=device).manual_seed(1024) -image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0] -image -``` - -
- -
- -Check out the Spaces below, and try generating images with different values for `strength`. You'll notice that using lower values for `strength` produces images that are more similar to the original image. - -Feel free to also switch the scheduler to the [`LMSDiscreteScheduler`] and see how that affects the output. - - diff --git a/diffusers/docs/source/en/using-diffusers/inpaint.mdx b/diffusers/docs/source/en/using-diffusers/inpaint.mdx deleted file mode 100644 index 41a6d4b7e1b26ad857556cab2eb5c057cee5b3d4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/inpaint.mdx +++ /dev/null @@ -1,76 +0,0 @@ - - -# Text-guided image-inpainting - -[[open-in-colab]] - -The [`StableDiffusionInpaintPipeline`] allows you to edit specific parts of an image by providing a mask and a text prompt. It uses a version of Stable Diffusion, like [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting) specifically trained for inpainting tasks. - -Get started by loading an instance of the [`StableDiffusionInpaintPipeline`]: - -```python -import PIL -import requests -import torch -from io import BytesIO - -from diffusers import StableDiffusionInpaintPipeline - -pipeline = StableDiffusionInpaintPipeline.from_pretrained( - "runwayml/stable-diffusion-inpainting", - torch_dtype=torch.float16, -) -pipeline = pipeline.to("cuda") -``` - -Download an image and a mask of a dog which you'll eventually replace: - -```python -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) -``` - -Now you can create a prompt to replace the mask with something else: - -```python -prompt = "Face of a yellow cat, high resolution, sitting on a park bench" -image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0] -``` - -`image` | `mask_image` | `prompt` | output | -:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:| -drawing | drawing | ***Face of a yellow cat, high resolution, sitting on a park bench*** | drawing | - - - - -A previous experimental implementation of inpainting used a different, lower-quality process. To ensure backwards compatibility, loading a pretrained pipeline that doesn't contain the new model will still apply the old inpainting method. - - - -Check out the Spaces below to try out image inpainting yourself! - - diff --git a/diffusers/docs/source/en/using-diffusers/kerascv.mdx b/diffusers/docs/source/en/using-diffusers/kerascv.mdx deleted file mode 100644 index 06981cc8fdd1c5dca658c5f8a6379a020514ae7f..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/kerascv.mdx +++ /dev/null @@ -1,179 +0,0 @@ - - -# Using KerasCV Stable Diffusion Checkpoints in Diffusers - - - -This is an experimental feature. - - - -[KerasCV](https://github.com/keras-team/keras-cv/) provides APIs for implementing various computer vision workflows. It -also provides the Stable Diffusion [v1 and v2](https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion) -models. Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. However, as of this writing, KerasCV offers limited support to experiment with Stable Diffusion models for inference and deployment. On the other hand, -Diffusers provides tooling dedicated to this purpose (and more), such as different [noise schedulers](https://huggingface.co/docs/diffusers/using-diffusers/schedulers), [flash attention](https://huggingface.co/docs/diffusers/optimization/xformers), and [other -optimization techniques](https://huggingface.co/docs/diffusers/optimization/fp16). - -How about fine-tuning Stable Diffusion models in KerasCV and exporting them such that they become compatible with Diffusers to combine the -best of both worlds? We have created a [tool](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) that -lets you do just that! It takes KerasCV Stable Diffusion checkpoints and exports them to Diffusers-compatible checkpoints. -More specifically, it first converts the checkpoints to PyTorch and then wraps them into a -[`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) which is ready -for inference. Finally, it pushes the converted checkpoints to a repository on the Hugging Face Hub. - -We welcome you to try out the tool [here](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers) -and share feedback via [discussions](https://huggingface.co/spaces/sayakpaul/convert-kerascv-sd-diffusers/discussions/new). - -## Getting Started - -First, you need to obtain the fine-tuned KerasCV Stable Diffusion checkpoints. We provide an -overview of the different ways Stable Diffusion models can be fine-tuned [using `diffusers`](https://huggingface.co/docs/diffusers/training/overview). For the Keras implementation of some of these methods, you can check out these resources: - -* [Teach StableDiffusion new concepts via Textual Inversion](https://keras.io/examples/generative/fine_tune_via_textual_inversion/) -* [Fine-tuning Stable Diffusion](https://keras.io/examples/generative/finetune_stable_diffusion/) -* [DreamBooth](https://keras.io/examples/generative/dreambooth/) -* [Prompt-to-Prompt editing](https://github.com/miguelCalado/prompt-to-prompt-tensorflow) - -Stable Diffusion is comprised of the following models: - -* Text encoder -* UNet -* VAE - -Depending on the fine-tuning task, we may fine-tune one or more of these components (the VAE is almost always left untouched). Here are some common combinations: - -* DreamBooth: UNet and text encoder -* Classical text to image fine-tuning: UNet -* Textual Inversion: Just the newly initialized embeddings in the text encoder - -### Performing the Conversion - -Let's use [this checkpoint](https://huggingface.co/sayakpaul/textual-inversion-kerasio/resolve/main/textual_inversion_kerasio.h5) which was generated -by conducting Textual Inversion with the following "placeholder token": ``. - -On the tool, we supply the following things: - -* Path(s) to download the fine-tuned checkpoint(s) (KerasCV) -* An HF token -* Placeholder token (only applicable for Textual Inversion) - -
- -
- -As soon as you hit "Submit", the conversion process will begin. Once it's complete, you should see the following: - -
- -
- -If you click the [link](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline/tree/main), you -should see something like so: - -
- -
- -If you head over to the [model card of the repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline), the -following should appear: - -
- -
- - - -Note that we're not specifying the UNet weights here since the UNet is not fine-tuned during Textual Inversion. - - - -And that's it! You now have your fine-tuned KerasCV Stable Diffusion model in Diffusers 🧨. - -## Using the Converted Model in Diffusers - -Just beside the model card of the [repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline), -you'd notice an inference widget to try out the model directly from the UI 🤗 - -
- -
- -On the top right hand side, we provide a "Use in Diffusers" button. If you click the button, you should see the following code-snippet: - -```py -from diffusers import DiffusionPipeline - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -``` - -The model is in standard `diffusers` format. Let's perform inference! - -```py -from diffusers import DiffusionPipeline - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -pipeline.to("cuda") - -placeholder_token = "" -prompt = f"two {placeholder_token} getting married, photorealistic, high quality" -image = pipeline(prompt, num_inference_steps=50).images[0] -``` - -And we get: - -
- -
- -_**Note that if you specified a `placeholder_token` while performing the conversion, the tool will log it accordingly. Refer -to the model card of [this repository](https://huggingface.co/sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline) -as an example.**_ - -We welcome you to use the tool for various Stable Diffusion fine-tuning scenarios and let us know your feedback! Here are some examples -of Diffusers checkpoints that were obtained using the tool: - -* [sayakpaul/text-unet-dogs-kerascv_sd_diffusers_pipeline](https://huggingface.co/sayakpaul/text-unet-dogs-kerascv_sd_diffusers_pipeline) (DreamBooth with both the text encoder and UNet fine-tuned) -* [sayakpaul/unet-dogs-kerascv_sd_diffusers_pipeline](https://huggingface.co/sayakpaul/unet-dogs-kerascv_sd_diffusers_pipeline) (DreamBooth with only the UNet fine-tuned) - -## Incorporating Diffusers Goodies 🎁 - -Diffusers provides various options that one can leverage to experiment with different inference setups. One particularly -useful option is the use of a different noise scheduler during inference other than what was used during fine-tuning. -Let's try out the [`DPMSolverMultistepScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/multistep_dpm_solver) -which is different from the one ([`DDPMScheduler`](https://huggingface.co/docs/diffusers/main/en/api/schedulers/ddpm)) used during -fine-tuning. - -You can read more details about this process in [this section](https://huggingface.co/docs/diffusers/using-diffusers/schedulers). - -```py -from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler - -pipeline = DiffusionPipeline.from_pretrained("sayakpaul/textual-inversion-cat-kerascv_sd_diffusers_pipeline") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) -pipeline.to("cuda") - -placeholder_token = "" -prompt = f"two {placeholder_token} getting married, photorealistic, high quality" -image = pipeline(prompt, num_inference_steps=50).images[0] -``` - -
- -
- -One can also continue fine-tuning from these Diffusers checkpoints by leveraging some relevant tools from Diffusers. Refer [here](https://huggingface.co/docs/diffusers/training/overview) for -more details. For inference-specific optimizations, refer [here](https://huggingface.co/docs/diffusers/main/en/optimization/fp16). - -## Known Limitations - -* Only Stable Diffusion v1 checkpoints are supported for conversion in this tool. diff --git a/diffusers/docs/source/en/using-diffusers/loading.mdx b/diffusers/docs/source/en/using-diffusers/loading.mdx deleted file mode 100644 index 9a3e09f71a1c74163f9600f08d75ab2b4ba57351..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/loading.mdx +++ /dev/null @@ -1,657 +0,0 @@ - - -# Loading - -A core premise of the diffusers library is to make diffusion models **as accessible as possible**. -Accessibility is therefore achieved by providing an API to load complete diffusion pipelines as well as individual components with a single line of code. - -In the following we explain in-detail how to easily load: - -- *Complete Diffusion Pipelines* via the [`DiffusionPipeline.from_pretrained`] -- *Diffusion Models* via [`ModelMixin.from_pretrained`] -- *Schedulers* via [`SchedulerMixin.from_pretrained`] - -## Loading pipelines - -The [`DiffusionPipeline`] class is the easiest way to access any diffusion model that is [available on the Hub](https://huggingface.co/models?library=diffusers). Let's look at an example on how to download [Runway's Stable Diffusion model](https://huggingface.co/runwayml/stable-diffusion-v1-5). - -```python -from diffusers import DiffusionPipeline - -repo_id = "runwayml/stable-diffusion-v1-5" -pipe = DiffusionPipeline.from_pretrained(repo_id) -``` - -Here [`DiffusionPipeline`] automatically detects the correct pipeline (*i.e.* [`StableDiffusionPipeline`]), downloads and caches all required configuration and weight files (if not already done so), and finally returns a pipeline instance, called `pipe`. -The pipeline instance can then be called using [`StableDiffusionPipeline.__call__`] (i.e., `pipe("image of a astronaut riding a horse")`) for text-to-image generation. - -Instead of using the generic [`DiffusionPipeline`] class for loading, you can also load the appropriate pipeline class directly. The code snippet above yields the same instance as when doing: - -```python -from diffusers import StableDiffusionPipeline - -repo_id = "runwayml/stable-diffusion-v1-5" -pipe = StableDiffusionPipeline.from_pretrained(repo_id) -``` - - - -Many checkpoints, such as [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) can be used for multiple tasks, *e.g.* *text-to-image* or *image-to-image*. -If you want to use those checkpoints for a task that is different from the default one, you have to load it directly from the corresponding task-specific pipeline class: - -```python -from diffusers import StableDiffusionImg2ImgPipeline - -repo_id = "runwayml/stable-diffusion-v1-5" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained(repo_id) -``` - - - - -Diffusion pipelines like `StableDiffusionPipeline` or `StableDiffusionImg2ImgPipeline` consist of multiple components. These components can be both parameterized models, such as `"unet"`, `"vae"` and `"text_encoder"`, tokenizers or schedulers. -These components often interact in complex ways with each other when using the pipeline in inference, *e.g.* for [`StableDiffusionPipeline`] the inference call is explained [here](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work). -The purpose of the [pipeline classes](./api/overview#diffusers-summary) is to wrap the complexity of these diffusion systems and give the user an easy-to-use API while staying flexible for customization, as will be shown later. - - - -### Loading pipelines locally - -If you prefer to have complete control over the pipeline and its corresponding files or, as said before, if you want to use pipelines that require an access request without having to be connected to the Hugging Face Hub, -we recommend loading pipelines locally. - -To load a diffusion pipeline locally, you first need to manually download the whole folder structure on your local disk and then pass a local path to the [`DiffusionPipeline.from_pretrained`]. Let's again look at an example for -[Runway's Stable Diffusion Diffusion model](https://huggingface.co/runwayml/stable-diffusion-v1-5). - -First, you should make use of [`git-lfs`](https://git-lfs.github.com/) to download the whole folder structure that has been uploaded to the [model repository](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main): - -``` -git lfs install -git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -``` - -The command above will create a local folder called `./stable-diffusion-v1-5` on your disk. -Now, all you have to do is to simply pass the local folder path to `from_pretrained`: - -```python -from diffusers import DiffusionPipeline - -repo_id = "./stable-diffusion-v1-5" -stable_diffusion = DiffusionPipeline.from_pretrained(repo_id) -``` - -If `repo_id` is a local path, as it is the case here, [`DiffusionPipeline.from_pretrained`] will automatically detect it and therefore not try to download any files from the Hub. -While we usually recommend to load weights directly from the Hub to be certain to stay up to date with the newest changes, loading pipelines locally should be preferred if one -wants to stay anonymous, self-contained applications, etc... - -### Loading customized pipelines - -Advanced users that want to load customized versions of diffusion pipelines can do so by swapping any of the default components, *e.g.* the scheduler, with other scheduler classes. -A classical use case of this functionality is to swap the scheduler. [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) uses the [`PNDMScheduler`] by default which is generally not the most performant scheduler. Since the release -of stable diffusion, multiple improved schedulers have been published. To use those, the user has to manually load their preferred scheduler and pass it into [`DiffusionPipeline.from_pretrained`]. - -*E.g.* to use [`EulerDiscreteScheduler`] or [`DPMSolverMultistepScheduler`] to have a better quality vs. generation speed trade-off for inference, one could load them as follows: - -```python -from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler - -repo_id = "runwayml/stable-diffusion-v1-5" - -scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler") -# or -# scheduler = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler") - -stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler) -``` - -Three things are worth paying attention to here. -- First, the scheduler is loaded with [`SchedulerMixin.from_pretrained`] -- Second, the scheduler is loaded with a function argument, called `subfolder="scheduler"` as the configuration of stable diffusion's scheduling is defined in a [subfolder of the official pipeline repository](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/scheduler) -- Third, the scheduler instance can simply be passed with the `scheduler` keyword argument to [`DiffusionPipeline.from_pretrained`]. This works because the [`StableDiffusionPipeline`] defines its scheduler with the `scheduler` attribute. It's not possible to use a different name, such as `sampler=scheduler` since `sampler` is not a defined keyword for [`StableDiffusionPipeline.__init__`] - -Not only the scheduler components can be customized for diffusion pipelines; in theory, all components of a pipeline can be customized. In practice, however, it often only makes sense to switch out a component that has **compatible** alternatives to what the pipeline expects. -Many scheduler classes are compatible with each other as can be seen [here](https://github.com/huggingface/diffusers/blob/0dd8c6b4dbab4069de9ed1cafb53cbd495873879/src/diffusers/schedulers/scheduling_ddim.py#L112). This is not always the case for other components, such as the `"unet"`. - -One special case that can also be customized is the `"safety_checker"` of stable diffusion. If you believe the safety checker doesn't serve you any good, you can simply disable it by passing `None`: - -```python -from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler - -stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, safety_checker=None) -``` - -Another common use case is to reuse the same components in multiple pipelines, *e.g.* the weights and configurations of [`"runwayml/stable-diffusion-v1-5"`](https://huggingface.co/runwayml/stable-diffusion-v1-5) can be used for both [`StableDiffusionPipeline`] and [`StableDiffusionImg2ImgPipeline`] and we might not want to -use the exact same weights into RAM twice. In this case, customizing all the input instances would help us -to only load the weights into RAM once: - -```python -from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline - -model_id = "runwayml/stable-diffusion-v1-5" -stable_diffusion_txt2img = StableDiffusionPipeline.from_pretrained(model_id) - -components = stable_diffusion_txt2img.components - -# weights are not reloaded into RAM -stable_diffusion_img2img = StableDiffusionImg2ImgPipeline(**components) -``` - -Note how the above code snippet makes use of [`DiffusionPipeline.components`]. - -### Loading variants - -Diffusion Pipeline checkpoints can offer variants of the "main" diffusion pipeline checkpoint. -Such checkpoint variants are usually variations of the checkpoint that have advantages for specific use-cases and that are so similar to the "main" checkpoint that they **should not** be put in a new checkpoint. -A variation of a checkpoint has to have **exactly** the same serialization format and **exactly** the same model structure, including all weights having the same tensor shapes. - -Examples of variations are different floating point types and non-ema weights. I.e. "fp16", "bf16", and "no_ema" are common variations. - -#### Let's first talk about whats **not** checkpoint variant, - -Checkpoint variants do **not** include different serialization formats (such as [safetensors](https://huggingface.co/docs/diffusers/main/en/using-diffusers/using_safetensors)) as weights in different serialization formats are -identical to the weights of the "main" checkpoint, just loaded in a different framework. - -Also variants do not correspond to different model structures, *e.g.* [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) is not a variant of [stable-diffusion-2-0](https://huggingface.co/stabilityai/stable-diffusion-2) since the model structure is different (Stable Diffusion 1-5 uses a different `CLIPTextModel` compared to Stable Diffusion 2.0). - -Pipeline checkpoints that are identical in model structure, but have been trained on different datasets, trained with vastly different training setups and thus correspond to different official releases (such as [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)) should probably be stored in individual repositories instead of as variations of each other. - -#### So what are checkpoint variants then? - -Checkpoint variants usually consist of the checkpoint stored in "*low-precision, low-storage*" dtype so that less bandwith is required to download them, or of *non-exponential-averaged* weights that shall be used when continuing fine-tuning from the checkpoint. -Both use cases have clear advantages when their weights are considered variants: they share the same serialization format as the reference weights, and they correspond to a specialization of the "main" checkpoint which does not warrant a new model repository. -A checkpoint stored in [torch's half-precision / float16 format](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) requires only half the bandwith and storage when downloading the checkpoint, -**but** cannot be used when continuing training or when running the checkpoint on CPU. -Similarly the *non-exponential-averaged* (or non-EMA) version of the checkpoint should be used when continuing fine-tuning of the model checkpoint, **but** should not be used when using the checkpoint for inference. - -#### How to save and load variants - -Saving a diffusion pipeline as a variant can be done by providing [`DiffusionPipeline.save_pretrained`] with the `variant` argument. -The `variant` extends the weight name by the provided variation, by changing the default weight name from `diffusion_pytorch_model.bin` to `diffusion_pytorch_model.{variant}.bin` or from `diffusion_pytorch_model.safetensors` to `diffusion_pytorch_model.{variant}.safetensors`. By doing so, one creates a variant of the pipeline checkpoint that can be loaded **instead** of the "main" pipeline checkpoint. - -Let's have a look at how we could create a float16 variant of a pipeline. First, we load -the "main" variant of a checkpoint (stored in `float32` precision) into mixed precision format, using `torch_dtype=torch.float16`. - -```py -from diffusers import DiffusionPipeline -import torch - -pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) -``` - -Now all model components of the pipeline are stored in half-precision dtype. We can now save the -pipeline under a `"fp16"` variant as follows: - -```py -pipe.save_pretrained("./stable-diffusion-v1-5", variant="fp16") -``` - -If we don't save into an existing `stable-diffusion-v1-5` folder the new folder would look as follows: - -``` -stable-diffusion-v1-5 -├── feature_extractor -│   └── preprocessor_config.json -├── model_index.json -├── safety_checker -│   ├── config.json -│   └── pytorch_model.fp16.bin -├── scheduler -│   └── scheduler_config.json -├── text_encoder -│   ├── config.json -│   └── pytorch_model.fp16.bin -├── tokenizer -│   ├── merges.txt -│   ├── special_tokens_map.json -│   ├── tokenizer_config.json -│   └── vocab.json -├── unet -│   ├── config.json -│   └── diffusion_pytorch_model.fp16.bin -└── vae - ├── config.json - └── diffusion_pytorch_model.fp16.bin -``` - -As one can see, all model files now have a `.fp16.bin` extension instead of just `.bin`. -The variant now has to be loaded by also passing a `variant="fp16"` to [`DiffusionPipeline.from_pretrained`], e.g.: - - -```py -DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16) -``` - -works just fine, while: - -```py -DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", torch_dtype=torch.float16) -``` - -throws an Exception: -``` -OSError: Error no file named diffusion_pytorch_model.bin found in directory ./stable-diffusion-v1-45/vae since we **only** stored the model -``` - -This is expected as we don't have any "non-variant" checkpoint files saved locally. -However, the whole idea of pipeline variants is that they can co-exist with the "main" variant, -so one would typically also save the "main" variant in the same folder. Let's do this: - -```py -pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -pipe.save_pretrained("./stable-diffusion-v1-5") -``` - -and upload the pipeline to the Hub under [diffusers/stable-diffusion-variants](https://huggingface.co/diffusers/stable-diffusion-variants). -The file structure [on the Hub](https://huggingface.co/diffusers/stable-diffusion-variants/tree/main) now looks as follows: - -``` -├── feature_extractor -│   └── preprocessor_config.json -├── model_index.json -├── safety_checker -│   ├── config.json -│   ├── pytorch_model.bin -│   └── pytorch_model.fp16.bin -├── scheduler -│   └── scheduler_config.json -├── text_encoder -│   ├── config.json -│   ├── pytorch_model.bin -│   └── pytorch_model.fp16.bin -├── tokenizer -│   ├── merges.txt -│   ├── special_tokens_map.json -│   ├── tokenizer_config.json -│   └── vocab.json -├── unet -│   ├── config.json -│   ├── diffusion_pytorch_model.bin -│   ├── diffusion_pytorch_model.fp16.bin -└── vae - ├── config.json - ├── diffusion_pytorch_model.bin - └── diffusion_pytorch_model.fp16.bin -``` - -We can now both download the "main" and the "fp16" variant from the Hub. Both: - -```py -pipe = DiffusionPipeline.from_pretrained("diffusers/stable-diffusion-variants") -``` - -and - -```py -pipe = DiffusionPipeline.from_pretrained("diffusers/stable-diffusion-variants", variant="fp16") -``` - -work. - - - -Note that Diffusers never downloads more checkpoints than needed. E.g. when downloading -the "main" variant, none of the "fp16.bin" files are downloaded and cached. -Only when the user specifies `variant="fp16"` are those files downloaded and cached. - - - -Finally, there are cases where only some of the checkpoint files of the pipeline are of a certain -variation. E.g. it's usually only the UNet checkpoint that has both a *exponential-mean-averaged* (EMA) and a *non-exponential-mean-averaged* (non-EMA) version. All other model components, e.g. the text encoder, safety checker or variational auto-encoder usually don't have such a variation. -In such a case, one would upload just the UNet's checkpoint file with a `non_ema` version format (as done [here](https://huggingface.co/diffusers/stable-diffusion-variants/blob/main/unet/diffusion_pytorch_model.non_ema.bin)) and upon calling: - -```python -pipe = DiffusionPipeline.from_pretrained("diffusers/stable-diffusion-variants", variant="non_ema") -``` - -the model will use only the "non_ema" checkpoint variant if it is available - otherwise it'll load the -"main" variation. In the above example, `variant="non_ema"` would therefore download the following file structure: - -``` -├── feature_extractor -│   └── preprocessor_config.json -├── model_index.json -├── safety_checker -│   ├── config.json -│   ├── pytorch_model.bin -├── scheduler -│   └── scheduler_config.json -├── text_encoder -│   ├── config.json -│   ├── pytorch_model.bin -├── tokenizer -│   ├── merges.txt -│   ├── special_tokens_map.json -│   ├── tokenizer_config.json -│   └── vocab.json -├── unet -│   ├── config.json -│   └── diffusion_pytorch_model.non_ema.bin -└── vae - ├── config.json - ├── diffusion_pytorch_model.bin -``` - -In a nutshell, using `variant="{variant}"` will download all files that match the `{variant}` and if for a model component such a file variant is not present it will download the "main" variant. If neither a "main" or `{variant}` variant is available, an error will the thrown. - -### How does loading work? - -As a class method, [`DiffusionPipeline.from_pretrained`] is responsible for two things: -- Download the latest version of the folder structure required to run the `repo_id` with `diffusers` and cache them. If the latest folder structure is available in the local cache, [`DiffusionPipeline.from_pretrained`] will simply reuse the cache and **not** re-download the files. -- Load the cached weights into the _correct_ pipeline class – one of the [officially supported pipeline classes](./api/overview#diffusers-summary) - and return an instance of the class. The _correct_ pipeline class is thereby retrieved from the `model_index.json` file. - -The underlying folder structure of diffusion pipelines corresponds 1-to-1 to their corresponding class instances, *e.g.* [`StableDiffusionPipeline`] for [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5). -This can be better understood by looking at an example. Let's load a pipeline class instance `pipe` and print it: - -```python -from diffusers import DiffusionPipeline - -repo_id = "runwayml/stable-diffusion-v1-5" -pipe = DiffusionPipeline.from_pretrained(repo_id) -print(pipe) -``` - -*Output*: -``` -StableDiffusionPipeline { - "feature_extractor": [ - "transformers", - "CLIPImageProcessor" - ], - "safety_checker": [ - "stable_diffusion", - "StableDiffusionSafetyChecker" - ], - "scheduler": [ - "diffusers", - "PNDMScheduler" - ], - "text_encoder": [ - "transformers", - "CLIPTextModel" - ], - "tokenizer": [ - "transformers", - "CLIPTokenizer" - ], - "unet": [ - "diffusers", - "UNet2DConditionModel" - ], - "vae": [ - "diffusers", - "AutoencoderKL" - ] -} -``` - -First, we see that the official pipeline is the [`StableDiffusionPipeline`], and second we see that the `StableDiffusionPipeline` consists of 7 components: -- `"feature_extractor"` of class `CLIPImageProcessor` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPImageProcessor). -- `"safety_checker"` as defined [here](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32). -- `"scheduler"` of class [`PNDMScheduler`]. -- `"text_encoder"` of class `CLIPTextModel` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel). -- `"tokenizer"` of class `CLIPTokenizer` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTokenizer). -- `"unet"` of class [`UNet2DConditionModel`]. -- `"vae"` of class [`AutoencoderKL`]. - -Let's now compare the pipeline instance to the folder structure of the model repository `runwayml/stable-diffusion-v1-5`. Looking at the folder structure of [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main) on the Hub and excluding model and saving format variants, we can see it matches 1-to-1 the printed out instance of `StableDiffusionPipeline` above: - -``` -. -├── feature_extractor -│   └── preprocessor_config.json -├── model_index.json -├── safety_checker -│   ├── config.json -│   └── pytorch_model.bin -├── scheduler -│   └── scheduler_config.json -├── text_encoder -│   ├── config.json -│   └── pytorch_model.bin -├── tokenizer -│   ├── merges.txt -│   ├── special_tokens_map.json -│   ├── tokenizer_config.json -│   └── vocab.json -├── unet -│   ├── config.json -│   ├── diffusion_pytorch_model.bin -└── vae - ├── config.json - ├── diffusion_pytorch_model.bin -``` - -Each attribute of the instance of `StableDiffusionPipeline` has its configuration and possibly weights defined in a subfolder that is called **exactly** like the class attribute (`"feature_extractor"`, `"safety_checker"`, `"scheduler"`, `"text_encoder"`, `"tokenizer"`, `"unet"`, `"vae"`). Importantly, every pipeline expects a `model_index.json` file that tells the `DiffusionPipeline` both: -- which pipeline class should be loaded, and -- what sub-classes from which library are stored in which subfolders - -In the case of `runwayml/stable-diffusion-v1-5` the `model_index.json` is therefore defined as follows: - -``` -{ - "_class_name": "StableDiffusionPipeline", - "_diffusers_version": "0.6.0", - "feature_extractor": [ - "transformers", - "CLIPImageProcessor" - ], - "safety_checker": [ - "stable_diffusion", - "StableDiffusionSafetyChecker" - ], - "scheduler": [ - "diffusers", - "PNDMScheduler" - ], - "text_encoder": [ - "transformers", - "CLIPTextModel" - ], - "tokenizer": [ - "transformers", - "CLIPTokenizer" - ], - "unet": [ - "diffusers", - "UNet2DConditionModel" - ], - "vae": [ - "diffusers", - "AutoencoderKL" - ] -} -``` - -- `_class_name` tells `DiffusionPipeline` which pipeline class should be loaded. -- `_diffusers_version` can be useful to know under which `diffusers` version this model was created. -- Every component of the pipeline is then defined under the form: -``` -"name" : [ - "library", - "class" -] -``` - - The `"name"` field corresponds both to the name of the subfolder in which the configuration and weights are stored as well as the attribute name of the pipeline class (as can be seen [here](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/bert) and [here](https://github.com/huggingface/diffusers/blob/cd502b25cf0debac6f98d27a6638ef95208d1ea2/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py#L42)) - - The `"library"` field corresponds to the name of the library, *e.g.* `diffusers` or `transformers` from which the `"class"` should be loaded - - The `"class"` field corresponds to the name of the class, *e.g.* [`CLIPTokenizer`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTokenizer) or [`UNet2DConditionModel`] - - - -## Loading models - -Models as defined under [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) can be loaded via the [`ModelMixin.from_pretrained`] function. The API is very similar the [`DiffusionPipeline.from_pretrained`] and works in the same way: -- Download the latest version of the model weights and configuration with `diffusers` and cache them. If the latest files are available in the local cache, [`ModelMixin.from_pretrained`] will simply reuse the cache and **not** re-download the files. -- Load the cached weights into the _defined_ model class - one of [the existing model classes](./api/models) - and return an instance of the class. - -In constrast to [`DiffusionPipeline.from_pretrained`], models rely on fewer files that usually don't require a folder structure, but just a `diffusion_pytorch_model.bin` and `config.json` file. - -Let's look at an example: - -```python -from diffusers import UNet2DConditionModel - -repo_id = "runwayml/stable-diffusion-v1-5" -model = UNet2DConditionModel.from_pretrained(repo_id, subfolder="unet") -``` - -Note how we have to define the `subfolder="unet"` argument to tell [`ModelMixin.from_pretrained`] that the model weights are located in a [subfolder of the repository](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/unet). - -As explained in [Loading customized pipelines]("./using-diffusers/loading#loading-customized-pipelines"), one can pass a loaded model to a diffusion pipeline, via [`DiffusionPipeline.from_pretrained`]: - -```python -from diffusers import DiffusionPipeline - -repo_id = "runwayml/stable-diffusion-v1-5" -pipe = DiffusionPipeline.from_pretrained(repo_id, unet=model) -``` - -If the model files can be found directly at the root level, which is usually only the case for some very simple diffusion models, such as [`google/ddpm-cifar10-32`](https://huggingface.co/google/ddpm-cifar10-32), we don't -need to pass a `subfolder` argument: - -```python -from diffusers import UNet2DModel - -repo_id = "google/ddpm-cifar10-32" -model = UNet2DModel.from_pretrained(repo_id) -``` - -As motivated in [How to save and load variants?](#how-to-save-and-load-variants), models can load and -save variants. To load a model variant, one should pass the `variant` function argument to [`ModelMixin.from_pretrained`]. Analogous, to save a model variant, one should pass the `variant` function argument to [`ModelMixin.save_pretrained`]: - -```python -from diffusers import UNet2DConditionModel - -model = UNet2DConditionModel.from_pretrained( - "diffusers/stable-diffusion-variants", subfolder="unet", variant="non_ema" -) -model.save_pretrained("./local-unet", variant="non_ema") -``` - -## Loading schedulers - -Schedulers rely on [`SchedulerMixin.from_pretrained`]. Schedulers are **not parameterized** or **trained**, but instead purely defined by a configuration file. -For consistency, we use the same method name as we do for models or pipelines, but no weights are loaded in this case. - -In constrast to pipelines or models, loading schedulers does not consume any significant amount of memory and the same configuration file can often be used for a variety of different schedulers. -For example, all of: - -- [`DDPMScheduler`] -- [`DDIMScheduler`] -- [`PNDMScheduler`] -- [`LMSDiscreteScheduler`] -- [`EulerDiscreteScheduler`] -- [`EulerAncestralDiscreteScheduler`] -- [`DPMSolverMultistepScheduler`] - -are compatible with [`StableDiffusionPipeline`] and therefore the same scheduler configuration file can be loaded in any of those classes: - -```python -from diffusers import StableDiffusionPipeline -from diffusers import ( - DDPMScheduler, - DDIMScheduler, - PNDMScheduler, - LMSDiscreteScheduler, - EulerDiscreteScheduler, - EulerAncestralDiscreteScheduler, - DPMSolverMultistepScheduler, -) - -repo_id = "runwayml/stable-diffusion-v1-5" - -ddpm = DDPMScheduler.from_pretrained(repo_id, subfolder="scheduler") -ddim = DDIMScheduler.from_pretrained(repo_id, subfolder="scheduler") -pndm = PNDMScheduler.from_pretrained(repo_id, subfolder="scheduler") -lms = LMSDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler") -euler_anc = EulerAncestralDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler") -euler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler") -dpm = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler") - -# replace `dpm` with any of `ddpm`, `ddim`, `pndm`, `lms`, `euler_anc`, `euler` -pipeline = StableDiffusionPipeline.from_pretrained(repo_id, scheduler=dpm) -``` diff --git a/diffusers/docs/source/en/using-diffusers/loading_overview.mdx b/diffusers/docs/source/en/using-diffusers/loading_overview.mdx deleted file mode 100644 index df870505219bb7faa10f809fb788705ec5a99f28..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/loading_overview.mdx +++ /dev/null @@ -1,17 +0,0 @@ - - -# Overview - -🧨 Diffusers offers many pipelines, models, and schedulers for generative tasks. To make loading these components as simple as possible, we provide a single and unified method - `from_pretrained()` - that loads any of these components from either the Hugging Face [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) or your local machine. Whenever you load a pipeline or model, the latest files are automatically downloaded and cached so you can quickly reuse them next time without redownloading the files. - -This section will show you everything you need to know about loading pipelines, how to load different components in a pipeline, how to load checkpoint variants, and how to load community pipelines. You'll also learn how to load schedulers and compare the speed and quality trade-offs of using different schedulers. Finally, you'll see how to convert and load KerasCV checkpoints so you can use them in PyTorch with 🧨 Diffusers. \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/other-modalities.mdx b/diffusers/docs/source/en/using-diffusers/other-modalities.mdx deleted file mode 100644 index ec879c49b1060c7ade1a0eb7e82de87c95d1b957..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/other-modalities.mdx +++ /dev/null @@ -1,21 +0,0 @@ - - -# Using Diffusers with other modalities - -Diffusers is in the process of expanding to modalities other than images. - -Example type | Colab | Pipeline | -:-------------------------:|:-------------------------:|:-------------------------:| -[Molecule conformation](https://www.nature.com/subjects/molecular-conformation#:~:text=Definition,to%20changes%20in%20their%20environment.) generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/geodiff_molecule_conformation.ipynb) | ❌ - -More coming soon! \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/pipeline_overview.mdx b/diffusers/docs/source/en/using-diffusers/pipeline_overview.mdx deleted file mode 100644 index ca98fc3f4b63fa344f232690a0536028d668c875..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/pipeline_overview.mdx +++ /dev/null @@ -1,17 +0,0 @@ - - -# Overview - -A pipeline is an end-to-end class that provides a quick and easy way to use a diffusion system for inference by bundling independently trained models and schedulers together. Certain combinations of models and schedulers define specific pipeline types, like [`StableDiffusionPipeline`] or [`StableDiffusionControlNetPipeline`], with specific capabilities. All pipeline types inherit from the base [`DiffusionPipeline`] class; pass it any checkpoint, and it'll automatically detect the pipeline type and load the necessary components. - -This section introduces you to some of the tasks supported by our pipelines such as unconditional image generation and different techniques and variations of text-to-image generation. You'll also learn how to gain more control over the generation process by setting a seed for reproducibility and weighting prompts to adjust the influence certain words in the prompt has over the output. Finally, you'll see how you can create a community pipeline for a custom task like generating images from speech. \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/reproducibility.mdx b/diffusers/docs/source/en/using-diffusers/reproducibility.mdx deleted file mode 100644 index 35191c13928992c7be9723660c4367ba72156761..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/reproducibility.mdx +++ /dev/null @@ -1,151 +0,0 @@ - - -# Create reproducible pipelines - -Reproducibility is important for testing, replicating results, and can even be used to [improve image quality](reusing_seeds). However, the randomness in diffusion models is a desired property because it allows the pipeline to generate different images every time it is run. While you can't expect to get the exact same results across platforms, you can expect results to be reproducible across releases and platforms within a certain tolerance range. Even then, tolerance varies depending on the diffusion pipeline and checkpoint. - -This is why it's important to understand how to control sources of randomness in diffusion models. - - - -💡 We strongly recommend reading PyTorch's [statement about reproducibility](https://pytorch.org/docs/stable/notes/randomness.html): - -> Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds. - - - -## Inference - -During inference, pipelines rely heavily on random sampling operations which include creating the -Gaussian noise tensors to denoise and adding noise to the scheduling step. - -Take a look at the tensor values in the [`DDIMPipeline`] after two inference steps: - -```python -from diffusers import DDIMPipeline -import numpy as np - -model_id = "google/ddpm-cifar10-32" - -# load model and scheduler -ddim = DDIMPipeline.from_pretrained(model_id) - -# run pipeline for just two steps and return numpy tensor -image = ddim(num_inference_steps=2, output_type="np").images -print(np.abs(image).sum()) -``` - -Running the code above prints one value, but if you run it again you get a different value. What is going on here? - -Every time the pipeline is run, [`torch.randn`](https://pytorch.org/docs/stable/generated/torch.randn.html) uses a different random seed to create Gaussian noise which is denoised stepwise. This leads to a different result each time it is run, which is great for diffusion pipelines since it generates a different random image each time. - -But if you need to reliably generate the same image, that'll depend on whether you're running the pipeline on a CPU or GPU. - -### CPU - -To generate reproducible results on a CPU, you'll need to use a PyTorch [`Generator`](https://pytorch.org/docs/stable/generated/torch.randn.html) and set a seed: - -```python -import torch -from diffusers import DDIMPipeline -import numpy as np - -model_id = "google/ddpm-cifar10-32" - -# load model and scheduler -ddim = DDIMPipeline.from_pretrained(model_id) - -# create a generator for reproducibility -generator = torch.Generator(device="cpu").manual_seed(0) - -# run pipeline for just two steps and return numpy tensor -image = ddim(num_inference_steps=2, output_type="np", generator=generator).images -print(np.abs(image).sum()) -``` - -Now when you run the code above, it always prints a value of `1491.1711` no matter what because the `Generator` object with the seed is passed to all the random functions of the pipeline. - -If you run this code example on your specific hardware and PyTorch version, you should get a similar, if not the same, result. - - - -💡 It might be a bit unintuitive at first to pass `Generator` objects to the pipeline instead of -just integer values representing the seed, but this is the recommended design when dealing with -probabilistic models in PyTorch as `Generator`'s are *random states* that can be -passed to multiple pipelines in a sequence. - - - -### GPU - -Writing a reproducible pipeline on a GPU is a bit trickier, and full reproducibility across different hardware is not guaranteed because matrix multiplication - which diffusion pipelines require a lot of - is less deterministic on a GPU than a CPU. For example, if you run the same code example above on a GPU: - -```python -import torch -from diffusers import DDIMPipeline -import numpy as np - -model_id = "google/ddpm-cifar10-32" - -# load model and scheduler -ddim = DDIMPipeline.from_pretrained(model_id) -ddim.to("cuda") - -# create a generator for reproducibility -generator = torch.Generator(device="cuda").manual_seed(0) - -# run pipeline for just two steps and return numpy tensor -image = ddim(num_inference_steps=2, output_type="np", generator=generator).images -print(np.abs(image).sum()) -``` - -The result is not the same even though you're using an identical seed because the GPU uses a different random number generator than the CPU. - -To circumvent this problem, 🧨 Diffusers has a [`randn_tensor`](#diffusers.utils.randn_tensor) function for creating random noise on the CPU, and then moving the tensor to a GPU if necessary. The `randn_tensor` function is used everywhere inside the pipeline, allowing the user to **always** pass a CPU `Generator` even if the pipeline is run on a GPU. - -You'll see the results are much closer now! - -```python -import torch -from diffusers import DDIMPipeline -import numpy as np - -model_id = "google/ddpm-cifar10-32" - -# load model and scheduler -ddim = DDIMPipeline.from_pretrained(model_id) -ddim.to("cuda") - -# create a generator for reproducibility; notice you don't place it on the GPU! -generator = torch.manual_seed(0) - -# run pipeline for just two steps and return numpy tensor -image = ddim(num_inference_steps=2, output_type="np", generator=generator).images -print(np.abs(image).sum()) -``` - - - -💡 If reproducibility is important, we recommend always passing a CPU generator. -The performance loss is often neglectable, and you'll generate much more similar -values than if the pipeline had been run on a GPU. - - - -Finally, for more complex pipelines such as [`UnCLIPPipeline`], these are often extremely -susceptible to precision error propagation. Don't expect similar results across -different GPU hardware or PyTorch versions. In this case, you'll need to run -exactly the same hardware and PyTorch version for full reproducibility. - -## randn_tensor -[[autodoc]] diffusers.utils.randn_tensor diff --git a/diffusers/docs/source/en/using-diffusers/reusing_seeds.mdx b/diffusers/docs/source/en/using-diffusers/reusing_seeds.mdx deleted file mode 100644 index eea0fd7e3e9d562ff56fdf6c4e5170dbeeb81c8a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/reusing_seeds.mdx +++ /dev/null @@ -1,63 +0,0 @@ - - -# Improve image quality with deterministic generation - -A common way to improve the quality of generated images is with *deterministic batch generation*, generate a batch of images and select one image to improve with a more detailed prompt in a second round of inference. The key is to pass a list of [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator)'s to the pipeline for batched image generation, and tie each `Generator` to a seed so you can reuse it for an image. - -Let's use [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5) for example, and generate several versions of the following prompt: - -```py -prompt = "Labrador in the style of Vermeer" -``` - -Instantiate a pipeline with [`DiffusionPipeline.from_pretrained`] and place it on a GPU (if available): - -```python ->>> from diffusers import DiffusionPipeline - ->>> pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) ->>> pipe = pipe.to("cuda") -``` - -Now, define four different `Generator`'s and assign each `Generator` a seed (`0` to `3`) so you can reuse a `Generator` later for a specific image: - -```python ->>> import torch - ->>> generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(4)] -``` - -Generate the images and have a look: - -```python ->>> images = pipe(prompt, generator=generator, num_images_per_prompt=4).images ->>> images -``` - -![img](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/reusabe_seeds.jpg) - -In this example, you'll improve upon the first image - but in reality, you can use any image you want (even the image with double sets of eyes!). The first image used the `Generator` with seed `0`, so you'll reuse that `Generator` for the second round of inference. To improve the quality of the image, add some additional text to the prompt: - -```python -prompt = [prompt + t for t in [", highly realistic", ", artsy", ", trending", ", colorful"]] -generator = [torch.Generator(device="cuda").manual_seed(0) for i in range(4)] -``` - -Create four generators with seed `0`, and generate another batch of images, all of which should look like the first image from the previous round! - -```python ->>> images = pipe(prompt, generator=generator).images ->>> images -``` - -![img](https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/reusabe_seeds_2.jpg) diff --git a/diffusers/docs/source/en/using-diffusers/rl.mdx b/diffusers/docs/source/en/using-diffusers/rl.mdx deleted file mode 100644 index 0cbf46b2a36729c9348f6c4ea7d5f8549712b40d..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/rl.mdx +++ /dev/null @@ -1,25 +0,0 @@ - - -# Using Diffusers for reinforcement learning - -Support for one RL model and related pipelines is included in the `experimental` source of diffusers. -More models and examples coming soon! - -# Diffuser Value-guided Planning - -You can run the model from [*Planning with Diffusion for Flexible Behavior Synthesis*](https://arxiv.org/abs/2205.09991) with Diffusers. -The script is located in the [RL Examples](https://github.com/huggingface/diffusers/tree/main/examples/rl) folder. - -Or, run this example in Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb) - -[[autodoc]] diffusers.experimental.ValueGuidedRLPipeline \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/schedulers.mdx b/diffusers/docs/source/en/using-diffusers/schedulers.mdx deleted file mode 100644 index e17d826c7dab12b5d58511b6c9d552d978dd1b9c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/schedulers.mdx +++ /dev/null @@ -1,314 +0,0 @@ - - -# Schedulers - -Diffusion pipelines are inherently a collection of diffusion models and schedulers that are partly independent from each other. This means that one is able to switch out parts of the pipeline to better customize -a pipeline to one's use case. The best example of this is the [Schedulers](../api/schedulers/overview.mdx). - -Whereas diffusion models usually simply define the forward pass from noise to a less noisy sample, -schedulers define the whole denoising process, *i.e.*: -- How many denoising steps? -- Stochastic or deterministic? -- What algorithm to use to find the denoised sample - -They can be quite complex and often define a trade-off between **denoising speed** and **denoising quality**. -It is extremely difficult to measure quantitatively which scheduler works best for a given diffusion pipeline, so it is often recommended to simply try out which works best. - -The following paragraphs show how to do so with the 🧨 Diffusers library. - -## Load pipeline - -Let's start by loading the stable diffusion pipeline. -Remember that you have to be a registered user on the 🤗 Hugging Face Hub, and have "click-accepted" the [license](https://huggingface.co/runwayml/stable-diffusion-v1-5) in order to use stable diffusion. - -```python -from huggingface_hub import login -from diffusers import DiffusionPipeline -import torch - -# first we need to login with our access token -login() - -# Now we can download the pipeline -pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16) -``` - -Next, we move it to GPU: - -```python -pipeline.to("cuda") -``` - -## Access the scheduler - -The scheduler is always one of the components of the pipeline and is usually called `"scheduler"`. -So it can be accessed via the `"scheduler"` property. - -```python -pipeline.scheduler -``` - -**Output**: -``` -PNDMScheduler { - "_class_name": "PNDMScheduler", - "_diffusers_version": "0.8.0.dev0", - "beta_end": 0.012, - "beta_schedule": "scaled_linear", - "beta_start": 0.00085, - "clip_sample": false, - "num_train_timesteps": 1000, - "set_alpha_to_one": false, - "skip_prk_steps": true, - "steps_offset": 1, - "trained_betas": null -} -``` - -We can see that the scheduler is of type [`PNDMScheduler`]. -Cool, now let's compare the scheduler in its performance to other schedulers. -First we define a prompt on which we will test all the different schedulers: - -```python -prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition." -``` - -Next, we create a generator from a random seed that will ensure that we can generate similar images as well as run the pipeline: - -```python -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator).images[0] -image -``` - -

-
- -
-

- - -## Changing the scheduler - -Now we show how easy it is to change the scheduler of a pipeline. Every scheduler has a property [`SchedulerMixin.compatibles`] -which defines all compatible schedulers. You can take a look at all available, compatible schedulers for the Stable Diffusion pipeline as follows. - -```python -pipeline.scheduler.compatibles -``` - -**Output**: -``` -[diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler, - diffusers.schedulers.scheduling_ddim.DDIMScheduler, - diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler, - diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler, - diffusers.schedulers.scheduling_pndm.PNDMScheduler, - diffusers.schedulers.scheduling_ddpm.DDPMScheduler, - diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler] -``` - -Cool, lots of schedulers to look at. Feel free to have a look at their respective class definitions: - -- [`LMSDiscreteScheduler`], -- [`DDIMScheduler`], -- [`DPMSolverMultistepScheduler`], -- [`EulerDiscreteScheduler`], -- [`PNDMScheduler`], -- [`DDPMScheduler`], -- [`EulerAncestralDiscreteScheduler`]. - -We will now compare the input prompt with all other schedulers. To change the scheduler of the pipeline you can make use of the -convenient [`ConfigMixin.config`] property in combination with the [`ConfigMixin.from_config`] function. - -```python -pipeline.scheduler.config -``` - -returns a dictionary of the configuration of the scheduler: - -**Output**: -``` -FrozenDict([('num_train_timesteps', 1000), - ('beta_start', 0.00085), - ('beta_end', 0.012), - ('beta_schedule', 'scaled_linear'), - ('trained_betas', None), - ('skip_prk_steps', True), - ('set_alpha_to_one', False), - ('steps_offset', 1), - ('_class_name', 'PNDMScheduler'), - ('_diffusers_version', '0.8.0.dev0'), - ('clip_sample', False)]) -``` - -This configuration can then be used to instantiate a scheduler -of a different class that is compatible with the pipeline. Here, -we change the scheduler to the [`DDIMScheduler`]. - -```python -from diffusers import DDIMScheduler - -pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config) -``` - -Cool, now we can run the pipeline again to compare the generation quality. - -```python -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator).images[0] -image -``` - -

-
- -
-

- -If you are a JAX/Flax user, please check [this section](#changing-the-scheduler-in-flax) instead. - -## Compare schedulers - -So far we have tried running the stable diffusion pipeline with two schedulers: [`PNDMScheduler`] and [`DDIMScheduler`]. -A number of better schedulers have been released that can be run with much fewer steps, let's compare them here: - -[`LMSDiscreteScheduler`] usually leads to better results: - -```python -from diffusers import LMSDiscreteScheduler - -pipeline.scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config) - -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator).images[0] -image -``` - -

-
- -
-

- - -[`EulerDiscreteScheduler`] and [`EulerAncestralDiscreteScheduler`] can generate high quality results with as little as 30 steps. - -```python -from diffusers import EulerDiscreteScheduler - -pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0] -image -``` - -

-
- -
-

- - -and: - -```python -from diffusers import EulerAncestralDiscreteScheduler - -pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config) - -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator, num_inference_steps=30).images[0] -image -``` - -

-
- -
-

- - -At the time of writing this doc [`DPMSolverMultistepScheduler`] gives arguably the best speed/quality trade-off and can be run with as little -as 20 steps. - -```python -from diffusers import DPMSolverMultistepScheduler - -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) - -generator = torch.Generator(device="cuda").manual_seed(8) -image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0] -image -``` - -

-
- -
-

- -As you can see most images look very similar and are arguably of very similar quality. It often really depends on the specific use case which scheduler to choose. A good approach is always to run multiple different -schedulers to compare results. - -## Changing the Scheduler in Flax - -If you are a JAX/Flax user, you can also change the default pipeline scheduler. This is a complete example of how to run inference using the Flax Stable Diffusion pipeline and the super-fast [DDPM-Solver++ scheduler](../api/schedulers/multistep_dpm_solver): - -```Python -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard - -from diffusers import FlaxStableDiffusionPipeline, FlaxDPMSolverMultistepScheduler - -model_id = "runwayml/stable-diffusion-v1-5" -scheduler, scheduler_state = FlaxDPMSolverMultistepScheduler.from_pretrained( - model_id, - subfolder="scheduler" -) -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( - model_id, - scheduler=scheduler, - revision="bf16", - dtype=jax.numpy.bfloat16, -) -params["scheduler"] = scheduler_state - -# Generate 1 image per parallel device (8 on TPUv2-8 or TPUv3-8) -prompt = "a photo of an astronaut riding a horse on mars" -num_samples = jax.device_count() -prompt_ids = pipeline.prepare_inputs([prompt] * num_samples) - -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 25 - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -``` - - - -The following Flax schedulers are _not yet compatible_ with the Flax Stable Diffusion Pipeline: - -- `FlaxLMSDiscreteScheduler` -- `FlaxDDPMScheduler` - - diff --git a/diffusers/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx b/diffusers/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx deleted file mode 100644 index e0332fdc6496cd0193320617afc9e8a55b78cc73..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx +++ /dev/null @@ -1,250 +0,0 @@ -# 🧨 Stable Diffusion in JAX / Flax ! - -[[open-in-colab]] - -🤗 Hugging Face [Diffusers](https://github.com/huggingface/diffusers) supports Flax since version `0.5.1`! This allows for super fast inference on Google TPUs, such as those available in Colab, Kaggle or Google Cloud Platform. - -This notebook shows how to run inference using JAX / Flax. If you want more details about how Stable Diffusion works or want to run it in GPU, please refer to [this notebook](https://huggingface.co/docs/diffusers/stable_diffusion). - -First, make sure you are using a TPU backend. If you are running this notebook in Colab, select `Runtime` in the menu above, then select the option "Change runtime type" and then select `TPU` under the `Hardware accelerator` setting. - -Note that JAX is not exclusive to TPUs, but it shines on that hardware because each TPU server has 8 TPU accelerators working in parallel. - -## Setup - -First make sure diffusers is installed. - -```bash -!pip install jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy -!pip install diffusers -``` - -```python -import jax.tools.colab_tpu - -jax.tools.colab_tpu.setup_tpu() -import jax -``` - -```python -num_devices = jax.device_count() -device_type = jax.devices()[0].device_kind - -print(f"Found {num_devices} JAX devices of type {device_type}.") -assert ( - "TPU" in device_type -), "Available device is not a TPU, please select TPU from Edit > Notebook settings > Hardware accelerator" -``` - -```python out -Found 8 JAX devices of type Cloud TPU. -``` - -Then we import all the dependencies. - -```python -import numpy as np -import jax -import jax.numpy as jnp - -from pathlib import Path -from jax import pmap -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from PIL import Image - -from huggingface_hub import notebook_login -from diffusers import FlaxStableDiffusionPipeline -``` - -## Model Loading - -TPU devices support `bfloat16`, an efficient half-float type. We'll use it for our tests, but you can also use `float32` to use full precision instead. - -```python -dtype = jnp.bfloat16 -``` - -Flax is a functional framework, so models are stateless and parameters are stored outside them. Loading the pre-trained Flax pipeline will return both the pipeline itself and the model weights (or parameters). We are using a `bf16` version of the weights, which leads to type warnings that you can safely ignore. - -```python -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - revision="bf16", - dtype=dtype, -) -``` - -## Inference - -Since TPUs usually have 8 devices working in parallel, we'll replicate our prompt as many times as devices we have. Then we'll perform inference on the 8 devices at once, each responsible for generating one image. Thus, we'll get 8 images in the same amount of time it takes for one chip to generate a single one. - -After replicating the prompt, we obtain the tokenized text ids by invoking the `prepare_inputs` function of the pipeline. The length of the tokenized text is set to 77 tokens, as required by the configuration of the underlying CLIP Text model. - -```python -prompt = "A cinematic film still of Morgan Freeman starring as Jimi Hendrix, portrait, 40mm lens, shallow depth of field, close up, split lighting, cinematic" -prompt = [prompt] * jax.device_count() -prompt_ids = pipeline.prepare_inputs(prompt) -prompt_ids.shape -``` - -```python out -(8, 77) -``` - -### Replication and parallelization - -Model parameters and inputs have to be replicated across the 8 parallel devices we have. The parameters dictionary is replicated using `flax.jax_utils.replicate`, which traverses the dictionary and changes the shape of the weights so they are repeated 8 times. Arrays are replicated using `shard`. - -```python -p_params = replicate(params) -``` - -```python -prompt_ids = shard(prompt_ids) -prompt_ids.shape -``` - -```python out -(8, 1, 77) -``` - -That shape means that each one of the `8` devices will receive as an input a `jnp` array with shape `(1, 77)`. `1` is therefore the batch size per device. In TPUs with sufficient memory, it could be larger than `1` if we wanted to generate multiple images (per chip) at once. - -We are almost ready to generate images! We just need to create a random number generator to pass to the generation function. This is the standard procedure in Flax, which is very serious and opinionated about random numbers – all functions that deal with random numbers are expected to receive a generator. This ensures reproducibility, even when we are training across multiple distributed devices. - -The helper function below uses a seed to initialize a random number generator. As long as we use the same seed, we'll get the exact same results. Feel free to use different seeds when exploring results later in the notebook. - -```python -def create_key(seed=0): - return jax.random.PRNGKey(seed) -``` - -We obtain a rng and then "split" it 8 times so each device receives a different generator. Therefore, each device will create a different image, and the full process is reproducible. - -```python -rng = create_key(0) -rng = jax.random.split(rng, jax.device_count()) -``` - -JAX code can be compiled to an efficient representation that runs very fast. However, we need to ensure that all inputs have the same shape in subsequent calls; otherwise, JAX will have to recompile the code, and we wouldn't be able to take advantage of the optimized speed. - -The Flax pipeline can compile the code for us if we pass `jit = True` as an argument. It will also ensure that the model runs in parallel in the 8 available devices. - -The first time we run the following cell it will take a long time to compile, but subequent calls (even with different inputs) will be much faster. For example, it took more than a minute to compile in a TPU v2-8 when I tested, but then it takes about **`7s`** for future inference runs. - -``` -%%time -images = pipeline(prompt_ids, p_params, rng, jit=True)[0] -``` - -```python out -CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s -Wall time: 1min 29s -``` - -The returned array has shape `(8, 1, 512, 512, 3)`. We reshape it to get rid of the second dimension and obtain 8 images of `512 × 512 × 3` and then convert them to PIL. - -```python -images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:]) -images = pipeline.numpy_to_pil(images) -``` - -### Visualization - -Let's create a helper function to display images in a grid. - -```python -def image_grid(imgs, rows, cols): - w, h = imgs[0].size - grid = Image.new("RGB", size=(cols * w, rows * h)) - for i, img in enumerate(imgs): - grid.paste(img, box=(i % cols * w, i // cols * h)) - return grid -``` - -```python -image_grid(images, 2, 4) -``` - -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_38_output_0.jpeg) - - -## Using different prompts - -We don't have to replicate the _same_ prompt in all the devices. We can do whatever we want: generate 2 prompts 4 times each, or even generate 8 different prompts at once. Let's do that! - -First, we'll refactor the input preparation code into a handy function: - -```python -prompts = [ - "Labrador in the style of Hokusai", - "Painting of a squirrel skating in New York", - "HAL-9000 in the style of Van Gogh", - "Times Square under water, with fish and a dolphin swimming around", - "Ancient Roman fresco showing a man working on his laptop", - "Close-up photograph of young black woman against urban background, high quality, bokeh", - "Armchair in the shape of an avocado", - "Clown astronaut in space, with Earth in the background", -] -``` - -```python -prompt_ids = pipeline.prepare_inputs(prompts) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, p_params, rng, jit=True).images -images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:]) -images = pipeline.numpy_to_pil(images) - -image_grid(images, 2, 4) -``` - -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_43_output_0.jpeg) - - -## How does parallelization work? - -We said before that the `diffusers` Flax pipeline automatically compiles the model and runs it in parallel on all available devices. We'll now briefly look inside that process to show how it works. - -JAX parallelization can be done in multiple ways. The easiest one revolves around using the `jax.pmap` function to achieve single-program, multiple-data (SPMD) parallelization. It means we'll run several copies of the same code, each on different data inputs. More sophisticated approaches are possible, we invite you to go over the [JAX documentation](https://jax.readthedocs.io/en/latest/index.html) and the [`pjit` pages](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html?highlight=pjit) to explore this topic if you are interested! - -`jax.pmap` does two things for us: -- Compiles (or `jit`s) the code, as if we had invoked `jax.jit()`. This does not happen when we call `pmap`, but the first time the pmapped function is invoked. -- Ensures the compiled code runs in parallel in all the available devices. - -To show how it works we `pmap` the `_generate` method of the pipeline, which is the private method that runs generates images. Please, note that this method may be renamed or removed in future releases of `diffusers`. - -```python -p_generate = pmap(pipeline._generate) -``` - -After we use `pmap`, the prepared function `p_generate` will conceptually do the following: -* Invoke a copy of the underlying function `pipeline._generate` in each device. -* Send each device a different portion of the input arguments. That's what sharding is used for. In our case, `prompt_ids` has shape `(8, 1, 77, 768)`. This array will be split in `8` and each copy of `_generate` will receive an input with shape `(1, 77, 768)`. - -We can code `_generate` completely ignoring the fact that it will be invoked in parallel. We just care about our batch size (`1` in this example) and the dimensions that make sense for our code, and don't have to change anything to make it work in parallel. - -The same way as when we used the pipeline call, the first time we run the following cell it will take a while, but then it will be much faster. - -``` -%%time -images = p_generate(prompt_ids, p_params, rng) -images = images.block_until_ready() -images.shape -``` - -```python out -CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s -Wall time: 1min 15s -``` - -```python -images.shape -``` - -```python out -(8, 1, 512, 512, 3) -``` - -We use `block_until_ready()` to correctly measure inference time, because JAX uses asynchronous dispatch and returns control to the Python loop as soon as it can. You don't need to use that in your code; blocking will occur automatically when you want to use the result of a computation that has not yet been materialized. \ No newline at end of file diff --git a/diffusers/docs/source/en/using-diffusers/unconditional_image_generation.mdx b/diffusers/docs/source/en/using-diffusers/unconditional_image_generation.mdx deleted file mode 100644 index c0888f94c6c135e429feb42d2026962d3a257f5f..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/unconditional_image_generation.mdx +++ /dev/null @@ -1,69 +0,0 @@ - - -# Unconditional image generation - -[[open-in-colab]] - -Unconditional image generation is a relatively straightforward task. The model only generates images - without any additional context like text or an image - resembling the training data it was trained on. - -The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. - -Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. -You can use any of the 🧨 Diffusers [checkpoints](https://huggingface.co/models?library=diffusers&sort=downloads) from the Hub (the checkpoint you'll use generates images of butterflies). - - - -💡 Want to train your own unconditional image generation model? Take a look at the training [guide](training/unconditional_training) to learn how to generate your own images. - - - -In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239): - -```python ->>> from diffusers import DiffusionPipeline - ->>> generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128") -``` - -The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. -Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. -You can move the generator object to a GPU, just like you would in PyTorch: - -```python ->>> generator.to("cuda") -``` - -Now you can use the `generator` to generate an image: - -```python ->>> image = generator().images[0] -``` - -The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. - -You can save the image by calling: - -```python ->>> image.save("generated_image.png") -``` - -Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality! - - - - diff --git a/diffusers/docs/source/en/using-diffusers/using_safetensors b/diffusers/docs/source/en/using-diffusers/using_safetensors deleted file mode 100644 index b6b165dabc728b885d8f7f097af808d8a2270b2c..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/using_safetensors +++ /dev/null @@ -1,19 +0,0 @@ -# What is safetensors ? - -[safetensors](https://github.com/huggingface/safetensors) is a different format -from the classic `.bin` which uses Pytorch which uses pickle. - -Pickle is notoriously unsafe which allow any malicious file to execute arbitrary code. -The hub itself tries to prevent issues from it, but it's not a silver bullet. - -`safetensors` first and foremost goal is to make loading machine learning models *safe* -in the sense that no takeover of your computer can be done. - -# Why use safetensors ? - -**Safety** can be one reason, if you're attempting to use a not well known model and -you're not sure about the source of the file. - -And a secondary reason, is **the speed of loading**. Safetensors can load models much faster -than regular pickle files. If you spend a lot of times switching models, this can be -a huge timesave. diff --git a/diffusers/docs/source/en/using-diffusers/using_safetensors.mdx b/diffusers/docs/source/en/using-diffusers/using_safetensors.mdx deleted file mode 100644 index b522f3236fbb43ea19b088adede40c9677fb274a..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/using_safetensors.mdx +++ /dev/null @@ -1,87 +0,0 @@ -# What is safetensors ? - -[safetensors](https://github.com/huggingface/safetensors) is a different format -from the classic `.bin` which uses Pytorch which uses pickle. It contains the -exact same data, which is just the model weights (or tensors). - -Pickle is notoriously unsafe which allow any malicious file to execute arbitrary code. -The hub itself tries to prevent issues from it, but it's not a silver bullet. - -`safetensors` first and foremost goal is to make loading machine learning models *safe* -in the sense that no takeover of your computer can be done. - -Hence the name. - -# Why use safetensors ? - -**Safety** can be one reason, if you're attempting to use a not well known model and -you're not sure about the source of the file. - -And a secondary reason, is **the speed of loading**. Safetensors can load models much faster -than regular pickle files. If you spend a lot of times switching models, this can be -a huge timesave. - -Numbers taken AMD EPYC 7742 64-Core Processor -``` -from diffusers import StableDiffusionPipeline - -pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1") - -# Loaded in safetensors 0:00:02.033658 -# Loaded in Pytorch 0:00:02.663379 -``` - -This is for the entire loading time, the actual weights loading time to load 500MB: - -``` -Safetensors: 3.4873ms -PyTorch: 172.7537ms -``` - -Performance in general is a tricky business, and there are a few things to understand: - -- If you're using the model for the first time from the hub, you will have to download the weights. - That's extremely likely to be much slower than any loading method, therefore you will not see any difference -- If you're loading the model for the first time (let's say after a reboot) then your machine will have to - actually read the disk. It's likely to be as slow in both cases. Again the speed difference may not be as visible (this depends on hardware and the actual model). -- The best performance benefit is when the model was already loaded previously on your computer and you're switching from one model to another. Your OS, is trying really hard not to read from disk, since this is slow, so it will keep the files around in RAM, making it loading again much faster. Since safetensors is doing zero-copy of the tensors, reloading will be faster than pytorch since it has at least once extra copy to do. - -# How to use safetensors ? - -If you have `safetensors` installed, and all the weights are available in `safetensors` format, \ -then by default it will use that instead of the pytorch weights. - -If you are really paranoid about this, the ultimate weapon would be disabling `torch.load`: -```python -import torch - - -def _raise(): - raise RuntimeError("I don't want to use pickle") - - -torch.load = lambda *args, **kwargs: _raise() -``` - -# I want to use model X but it doesn't have safetensors weights. - -Just go to this [space](https://huggingface.co/spaces/diffusers/convert). -This will create a new PR with the weights, let's say `refs/pr/22`. - -This space will download the pickled version, convert it, and upload it on the hub as a PR. -If anything bad is contained in the file, it's Huggingface hub that will get issues, not your own computer. -And we're equipped with dealing with it. - -Then in order to use the model, even before the branch gets accepted by the original author you can do: - -```python -from diffusers import DiffusionPipeline - -pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", revision="refs/pr/22") -``` - -or you can test it directly online with this [space](https://huggingface.co/spaces/diffusers/check_pr). - -And that's it ! - -Anything unclear, concerns, or found a bugs ? [Open an issue](https://github.com/huggingface/diffusers/issues/new/choose) diff --git a/diffusers/docs/source/en/using-diffusers/weighted_prompts.mdx b/diffusers/docs/source/en/using-diffusers/weighted_prompts.mdx deleted file mode 100644 index c1316dc9f47d867e7f500e6f882977bcbadf97cb..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/weighted_prompts.mdx +++ /dev/null @@ -1,98 +0,0 @@ - - -# Weighting prompts - -Text-guided diffusion models generate images based on a given text prompt. The text prompt -can include multiple concepts that the model should generate and it's often desirable to weight -certain parts of the prompt more or less. - -Diffusion models work by conditioning the cross attention layers of the diffusion model with contextualized text embeddings (see the [Stable Diffusion Guide for more information](../stable-diffusion)). -Thus a simple way to emphasize (or de-emphasize) certain parts of the prompt is by increasing or reducing the scale of the text embedding vector that corresponds to the relevant part of the prompt. -This is called "prompt-weighting" and has been a highly demanded feature by the community (see issue [here](https://github.com/huggingface/diffusers/issues/2431)). - -## How to do prompt-weighting in Diffusers - -We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/v0.14.0/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline. - -The [compel library](https://github.com/damian0815/compel) provides an easy way to emphasize or de-emphasize portions of the prompt for you. We strongly recommend it instead of preparing the embeddings yourself. - -Let's look at a simple example. Imagine you want to generate an image of `"a red cat playing with a ball"` as -follows: - -```py -from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler - -pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") -pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) - -prompt = "a red cat playing with a ball" - -generator = torch.Generator(device="cpu").manual_seed(33) - -image = pipe(prompt, generator=generator, num_inference_steps=20).images[0] -image -``` - -This gives you: - -![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_0.png) - -As you can see, there is no "ball" in the image. Let's emphasize this part! - -For this we should install the `compel` library: - -``` -pip install compel -``` - -and then create a `Compel` object: - -```py -from compel import Compel - -compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder) -``` - -Now we emphasize the part "ball" with the `"++"` syntax: - -```py -prompt = "a red cat playing with a ball++" -``` - -and instead of passing this to the pipeline directly, we have to process it using `compel_proc`: - -```py -prompt_embeds = compel_proc(prompt) -``` - -Now we can pass `prompt_embeds` directly to the pipeline: - -```py -generator = torch.Generator(device="cpu").manual_seed(33) - -images = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0] -image -``` - -We now get the following image which has a "ball"! - -![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_1.png) - -Similarly, we de-emphasize parts of the sentence by using the `--` suffix for words, feel free to give it -a try! - -If your favorite pipeline does not have a `prompt_embeds` input, please make sure to open an issue, the -diffusers team tries to be as responsive as possible. - -Also, please check out the documentation of the [compel](https://github.com/damian0815/compel) library for -more information. diff --git a/diffusers/docs/source/en/using-diffusers/write_own_pipeline.mdx b/diffusers/docs/source/en/using-diffusers/write_own_pipeline.mdx deleted file mode 100644 index 3c993ed53a2ab3f15fb2053d42be60573d6e4b42..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/en/using-diffusers/write_own_pipeline.mdx +++ /dev/null @@ -1,290 +0,0 @@ - - -# Understanding pipelines, models and schedulers - -[[open-in-colab]] - -🧨 Diffusers is designed to be a user-friendly and flexible toolbox for building diffusion systems tailored to your use-case. At the core of the toolbox are models and schedulers. While the [`DiffusionPipeline`] bundles these components together for convenience, you can also unbundle the pipeline and use the models and schedulers separately to create new diffusion systems. - -In this tutorial, you'll learn how to use models and schedulers to assemble a diffusion system for inference, starting with a basic pipeline and then progressing to the Stable Diffusion pipeline. - -## Deconstruct a basic pipeline - -A pipeline is a quick and easy way to run a model for inference, requiring no more than four lines of code to generate an image: - -```py ->>> from diffusers import DDPMPipeline - ->>> ddpm = DDPMPipeline.from_pretrained("google/ddpm-cat-256").to("cuda") ->>> image = ddpm(num_inference_steps=25).images[0] ->>> image -``` - -
- Image of cat created from DDPMPipeline -
- -That was super easy, but how did the pipeline do that? Let's breakdown the pipeline and take a look at what's happening under the hood. - -In the example above, the pipeline contains a UNet model and a DDPM scheduler. The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps. - -To recreate the pipeline with the model and scheduler separately, let's write our own denoising process. - -1. Load the model and scheduler: - - ```py - >>> from diffusers import DDPMScheduler, UNet2DModel - - >>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256") - >>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda") - ``` - -2. Set the number of timesteps to run the denoising process for: - - ```py - >>> scheduler.set_timesteps(50) - ``` - -3. Setting the scheduler timesteps creates a tensor with evenly spaced elements in it, 50 in this example. Each element corresponds to a timestep at which the model denoises an image. When you create the denoising loop later, you'll iterate over this tensor to denoise an image: - - ```py - >>> scheduler.timesteps - tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720, - 700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440, - 420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160, - 140, 120, 100, 80, 60, 40, 20, 0]) - ``` - -4. Create some random noise with the same shape as the desired output: - - ```py - >>> import torch - - >>> sample_size = model.config.sample_size - >>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") - ``` - -4. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array. - - ```py - >>> input = noise - - >>> for t in scheduler.timesteps: - ... with torch.no_grad(): - ... noisy_residual = model(input, t).sample - >>> previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample - >>> input = previous_noisy_sample - ``` - - This is the entire denoising process, and you can use this same pattern to write any diffusion system. - -5. The last step is to convert the denoised output into an image: - - ```py - >>> from PIL import Image - >>> import numpy as np - - >>> image = (input / 2 + 0.5).clamp(0, 1) - >>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0] - >>> image = Image.fromarray((image * 255)).round().astype("uint8") - >>> image - ``` - -In the next section, you'll put your skills to the test and breakdown the more complex Stable Diffusion pipeline. The steps are more or less the same. You'll initialize the necessary components, and set the number of timesteps to create a `timestep` array. The `timestep` array is used in the denoising loop, and for each element in this array, the model predicts a less noisy image. The denoising loop iterates over the `timestep`'s, and at each timestep, it outputs a noisy residual and the scheduler uses it to predict a less noisy image at the previous timestep. This process is repeated until you reach the end of the `timestep` array. - -Let's try it out! - -## Deconstruct the Stable Diffusion pipeline - -Stable Diffusion is a text-to-image *latent diffusion* model. It is called a latent diffusion model because it works with a lower-dimensional representation of the image instead of the actual pixel space, which makes it more memory efficient. The encoder compresses the image into a smaller representation, and a decoder to convert the compressed representation back into an image. For text-to-image models, you'll need a tokenizer and an encoder to generate text embeddings. From the previous example, you already know you need a UNet model and a scheduler. - -As you can see, this is already more complex than the DDPM pipeline which only contains a UNet model. The Stable Diffusion model has three separate pretrained models. - - - -💡 Read the [How does Stable Diffusion work?](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work) blog for more details about how the VAE, UNet, and text encoder models. - - - -Now that you know what you need for the Stable Diffusion pipeline, load all these components with the [`~ModelMixin.from_pretrained`] method. You can find them in the pretrained [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint, and each component is stored in a separate subfolder: - -```py ->>> from PIL import Image ->>> import torch ->>> from transformers import CLIPTextModel, CLIPTokenizer ->>> from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler - ->>> vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae") ->>> tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="tokenizer") ->>> text_encoder = CLIPTextModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="text_encoder") ->>> unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet") -``` - -Instead of the default [`PNDMScheduler`], exchange it for the [`UniPCMultistepScheduler`] to see how easy it is to plug a different scheduler in: - -```py ->>> from diffusers import UniPCMultistepScheduler - ->>> scheduler = UniPCMultistepScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler") -``` - -To speed up inference, move the models to a GPU since, unlike the scheduler, they have trainable weights: - -```py ->>> torch_device = "cuda" ->>> vae.to(torch_device) ->>> text_encoder.to(torch_device) ->>> unet.to(torch_device) -``` - -### Create text embeddings - -The next step is to tokenize the text to generate embeddings. The text is used to condition the UNet model and steer the diffusion process towards something that resembles the input prompt. - - - -💡 The `guidance_scale` parameter determines how much weight should be given to the prompt when generating an image. - - - -Feel free to choose any prompt you like if you want to generate something else! - -```py ->>> prompt = ["a photograph of an astronaut riding a horse"] ->>> height = 512 # default height of Stable Diffusion ->>> width = 512 # default width of Stable Diffusion ->>> num_inference_steps = 25 # Number of denoising steps ->>> guidance_scale = 7.5 # Scale for classifier-free guidance ->>> generator = torch.manual_seed(0) # Seed generator to create the inital latent noise ->>> batch_size = len(prompt) -``` - -Tokenize the text and generate the embeddings from the prompt: - -```py ->>> text_input = tokenizer( -... prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt" -... ) - ->>> with torch.no_grad(): -... text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0] -``` - -You'll also need to generate the *unconditional text embeddings* which are the embeddings for the padding token. These need to have the same shape (`batch_size` and `seq_length`) as the conditional `text_embeddings`: - -```py ->>> max_length = text_input.input_ids.shape[-1] ->>> uncond_input = tokenizer([""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt") ->>> uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0] -``` - -Let's concatenate the conditional and unconditional embeddings into a batch to avoid doing two forward passes: - -```py ->>> text_embeddings = torch.cat([uncond_embeddings, text_embeddings]) -``` - -### Create random noise - -Next, generate some initial random noise as a starting point for the diffusion process. This is the latent representation of the image, and it'll be gradually denoised. At this point, the `latent` image is smaller than the final image size but that's okay though because the model will transform it into the final 512x512 image dimensions later. - - - -💡 The height and width are divided by 8 because the `vae` model has 3 down-sampling layers. You can check by running the following: - -```py -2 ** (len(vae.config.block_out_channels) - 1) == 8 -``` - - - -```py ->>> latents = torch.randn( -... (batch_size, unet.in_channels, height // 8, width // 8), -... generator=generator, -... ) ->>> latents = latents.to(torch_device) -``` - -### Denoise the image - -Start by scaling the input with the initial noise distribution, *sigma*, the noise scale value, which is required for improved schedulers like [`UniPCMultistepScheduler`]: - -```py ->>> latents = latents * scheduler.init_noise_sigma -``` - -The last step is to create the denoising loop that'll progressively transform the pure noise in `latents` to an image described by your prompt. Remember, the denoising loop needs to do three things: - -1. Set the scheduler's timesteps to use during denoising. -2. Iterate over the timesteps. -3. At each timestep, call the UNet model to predict the noise residual and pass it to the scheduler to compute the previous noisy sample. - -```py ->>> from tqdm.auto import tqdm - ->>> scheduler.set_timesteps(num_inference_steps) - ->>> for t in tqdm(scheduler.timesteps): -... # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes. -... latent_model_input = torch.cat([latents] * 2) - -... latent_model_input = scheduler.scale_model_input(latent_model_input, timestep=t) - -... # predict the noise residual -... with torch.no_grad(): -... noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample - -... # perform guidance -... noise_pred_uncond, noise_pred_text = noise_pred.chunk(2) -... noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond) - -... # compute the previous noisy sample x_t -> x_t-1 -... latents = scheduler.step(noise_pred, t, latents).prev_sample -``` - -### Decode the image - -The final step is to use the `vae` to decode the latent representation into an image and get the decoded output with `sample`: - -```py -# scale and decode the image latents with vae -latents = 1 / 0.18215 * latents -with torch.no_grad(): - image = vae.decode(latents).sample -``` - -Lastly, convert the image to a `PIL.Image` to see your generated image! - -```py ->>> image = (image / 2 + 0.5).clamp(0, 1) ->>> image = image.detach().cpu().permute(0, 2, 3, 1).numpy() ->>> images = (image * 255).round().astype("uint8") ->>> pil_images = [Image.fromarray(image) for image in images] ->>> pil_images[0] -``` - -
- -
- -## Next steps - -From basic to complex pipelines, you've seen that all you really need to write your own diffusion system is a denoising loop. The loop should set the scheduler's timesteps, iterate over them, and alternate between calling the UNet model to predict the noise residual and passing it to the scheduler to compute the previous noisy sample. - -This is really what 🧨 Diffusers is designed for: to make it intuitive and easy to write your own diffusion system using models and schedulers. - -For your next steps, feel free to: - -* Learn how to [build and contribute a pipeline](using-diffusers/#contribute_pipeline) to 🧨 Diffusers. We can't wait and see what you'll come up with! -* Explore [existing pipelines](./api/pipelines/overview) in the library, and see if you can deconstruct and build a pipeline from scratch using the models and schedulers separately. \ No newline at end of file diff --git a/diffusers/docs/source/ko/_toctree.yml b/diffusers/docs/source/ko/_toctree.yml deleted file mode 100644 index a1c0c690eb94c5963bf1c4d6fd374ea19339a316..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/ko/_toctree.yml +++ /dev/null @@ -1,193 +0,0 @@ -- sections: - - local: index - title: "🧨 Diffusers" - - local: quicktour - title: "훑어보기" - - local: installation - title: "설치" - title: "시작하기" -- sections: - - sections: - - local: in_translation - title: "Loading Pipelines, Models, and Schedulers" - - local: in_translation - title: "Using different Schedulers" - - local: in_translation - title: "Configuring Pipelines, Models, and Schedulers" - - local: in_translation - title: "Loading and Adding Custom Pipelines" - title: "불러오기 & 허브 (번역 예정)" - - sections: - - local: in_translation - title: "Unconditional Image Generation" - - local: in_translation - title: "Text-to-Image Generation" - - local: in_translation - title: "Text-Guided Image-to-Image" - - local: in_translation - title: "Text-Guided Image-Inpainting" - - local: in_translation - title: "Text-Guided Depth-to-Image" - - local: in_translation - title: "Reusing seeds for deterministic generation" - - local: in_translation - title: "Community Pipelines" - - local: in_translation - title: "How to contribute a Pipeline" - title: "추론을 위한 파이프라인 (번역 예정)" - - sections: - - local: in_translation - title: "Reinforcement Learning" - - local: in_translation - title: "Audio" - - local: in_translation - title: "Other Modalities" - title: "Taking Diffusers Beyond Images" - title: "Diffusers 사용법 (번역 예정)" -- sections: - - local: in_translation - title: "Memory and Speed" - - local: in_translation - title: "xFormers" - - local: in_translation - title: "ONNX" - - local: in_translation - title: "OpenVINO" - - local: in_translation - title: "MPS" - - local: in_translation - title: "Habana Gaudi" - title: "최적화/특수 하드웨어 (번역 예정)" -- sections: - - local: in_translation - title: "Overview" - - local: in_translation - title: "Unconditional Image Generation" - - local: in_translation - title: "Textual Inversion" - - local: in_translation - title: "Dreambooth" - - local: in_translation - title: "Text-to-image fine-tuning" - title: "학습 (번역 예정)" -- sections: - - local: in_translation - title: "Stable Diffusion" - - local: in_translation - title: "Philosophy" - - local: in_translation - title: "How to contribute?" - title: "개념 설명 (번역 예정)" -- sections: - - sections: - - local: in_translation - title: "Models" - - local: in_translation - title: "Diffusion Pipeline" - - local: in_translation - title: "Logging" - - local: in_translation - title: "Configuration" - - local: in_translation - title: "Outputs" - title: "Main Classes" - - - sections: - - local: in_translation - title: "Overview" - - local: in_translation - title: "AltDiffusion" - - local: in_translation - title: "Cycle Diffusion" - - local: in_translation - title: "DDIM" - - local: in_translation - title: "DDPM" - - local: in_translation - title: "Latent Diffusion" - - local: in_translation - title: "Unconditional Latent Diffusion" - - local: in_translation - title: "PaintByExample" - - local: in_translation - title: "PNDM" - - local: in_translation - title: "Score SDE VE" - - sections: - - local: in_translation - title: "Overview" - - local: in_translation - title: "Text-to-Image" - - local: in_translation - title: "Image-to-Image" - - local: in_translation - title: "Inpaint" - - local: in_translation - title: "Depth-to-Image" - - local: in_translation - title: "Image-Variation" - - local: in_translation - title: "Super-Resolution" - title: "Stable Diffusion" - - local: in_translation - title: "Stable Diffusion 2" - - local: in_translation - title: "Safe Stable Diffusion" - - local: in_translation - title: "Stochastic Karras VE" - - local: in_translation - title: "Dance Diffusion" - - local: in_translation - title: "UnCLIP" - - local: in_translation - title: "Versatile Diffusion" - - local: in_translation - title: "VQ Diffusion" - - local: in_translation - title: "RePaint" - - local: in_translation - title: "Audio Diffusion" - title: "파이프라인 (번역 예정)" - - sections: - - local: in_translation - title: "Overview" - - local: in_translation - title: "DDIM" - - local: in_translation - title: "DDPM" - - local: in_translation - title: "Singlestep DPM-Solver" - - local: in_translation - title: "Multistep DPM-Solver" - - local: in_translation - title: "Heun Scheduler" - - local: in_translation - title: "DPM Discrete Scheduler" - - local: in_translation - title: "DPM Discrete Scheduler with ancestral sampling" - - local: in_translation - title: "Stochastic Kerras VE" - - local: in_translation - title: "Linear Multistep" - - local: in_translation - title: "PNDM" - - local: in_translation - title: "VE-SDE" - - local: in_translation - title: "IPNDM" - - local: in_translation - title: "VP-SDE" - - local: in_translation - title: "Euler scheduler" - - local: in_translation - title: "Euler Ancestral Scheduler" - - local: in_translation - title: "VQDiffusionScheduler" - - local: in_translation - title: "RePaint Scheduler" - title: "스케줄러 (번역 예정)" - - sections: - - local: in_translation - title: "RL Planning" - title: "Experimental Features" - title: "API (번역 예정)" diff --git a/diffusers/docs/source/ko/in_translation.mdx b/diffusers/docs/source/ko/in_translation.mdx deleted file mode 100644 index 518be0c03b7c8cf0e8e9b2b083f08ccbb62bfad6..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/ko/in_translation.mdx +++ /dev/null @@ -1,16 +0,0 @@ - - -# 번역중 - -열심히 번역을 진행중입니다. 조금만 기다려주세요. -감사합니다! \ No newline at end of file diff --git a/diffusers/docs/source/ko/index.mdx b/diffusers/docs/source/ko/index.mdx deleted file mode 100644 index d01dff5c5e005248c95f17995161acf83ecbe08d..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/ko/index.mdx +++ /dev/null @@ -1,63 +0,0 @@ - - -

-
- -
-

- -# 🧨 Diffusers - -🤗 Diffusers는 사전학습된 비전 및 오디오 확산 모델을 제공하고, 추론 및 학습을 위한 모듈식 도구 상자 역할을 합니다. - -보다 정확하게, 🤗 Diffusers는 다음을 제공합니다: - -- 단 몇 줄의 코드로 추론을 실행할 수 있는 최신 확산 파이프라인을 제공합니다. ([**Using Diffusers**](./using-diffusers/conditional_image_generation)를 살펴보세요) 지원되는 모든 파이프라인과 해당 논문에 대한 개요를 보려면 [**Pipelines**](#pipelines)을 살펴보세요. -- 추론에서 속도 vs 품질의 절충을 위해 상호교환적으로 사용할 수 있는 다양한 노이즈 스케줄러를 제공합니다. 자세한 내용은 [**Schedulers**](./api/schedulers/overview)를 참고하세요. -- UNet과 같은 여러 유형의 모델을 end-to-end 확산 시스템의 구성 요소로 사용할 수 있습니다. 자세한 내용은 [**Models**](./api/models)을 참고하세요. -- 가장 인기있는 확산 모델 테스크를 학습하는 방법을 보여주는 예제들을 제공합니다. 자세한 내용은 [**Training**](./training/overview)를 참고하세요. - -## 🧨 Diffusers 파이프라인 - -다음 표에는 공시적으로 지원되는 모든 파이프라인, 관련 논문, 직접 사용해 볼 수 있는 Colab 노트북(사용 가능한 경우)이 요약되어 있습니다. - -| Pipeline | Paper | Tasks | Colab -|---|---|:---:|:---:| -| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) -| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | -| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | -| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | -| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | -| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | -| [paint_by_example](./api/pipelines/paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | -| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | -| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [stable_diffusion](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [stable_diffusion](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) -| [stable_diffusion](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) -| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | -| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | -| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | - -**참고**: 파이프라인은 해당 문서에 설명된 대로 확산 시스템을 사용한 방법에 대한 간단한 예입니다. diff --git a/diffusers/docs/source/ko/installation.mdx b/diffusers/docs/source/ko/installation.mdx deleted file mode 100644 index a10f9f8d1b52c0281433356f03f81039d4356f91..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/ko/installation.mdx +++ /dev/null @@ -1,142 +0,0 @@ - - -# 설치 - -사용하시는 라이브러리에 맞는 🤗 Diffusers를 설치하세요. - -🤗 Diffusers는 Python 3.7+, PyTorch 1.7.0+ 및 flax에서 테스트되었습니다. 사용중인 딥러닝 라이브러리에 대한 아래의 설치 안내를 따르세요. - -- [PyTorch 설치 안내](https://pytorch.org/get-started/locally/) -- [Flax 설치 안내](https://flax.readthedocs.io/en/latest/) - -## pip를 이용한 설치 - -[가상 환경](https://docs.python.org/3/library/venv.html)에 🤗 Diffusers를 설치해야 합니다. -Python 가상 환경에 익숙하지 않은 경우 [가상환경 pip 설치 가이드](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)를 살펴보세요. -가상 환경을 사용하면 서로 다른 프로젝트를 더 쉽게 관리하고, 종속성간의 호환성 문제를 피할 수 있습니다. - -프로젝트 디렉토리에 가상 환경을 생성하는 것으로 시작하세요: - -```bash -python -m venv .env -``` - -그리고 가상 환경을 활성화합니다: - -```bash -source .env/bin/activate -``` - -이제 다음의 명령어로 🤗 Diffusers를 설치할 준비가 되었습니다: - -**PyTorch의 경우** - -```bash -pip install diffusers["torch"] -``` - -**Flax의 경우** - -```bash -pip install diffusers["flax"] -``` - -## 소스로부터 설치 - -소스에서 `diffusers`를 설치하기 전에, `torch` 및 `accelerate`이 설치되어 있는지 확인하세요. - -`torch` 설치에 대해서는 [torch docs](https://pytorch.org/get-started/locally/#start-locally)를 참고하세요. - -다음과 같이 `accelerate`을 설치하세요. - -```bash -pip install accelerate -``` - -다음 명령어를 사용하여 소스에서 🤗 Diffusers를 설치하세요: - -```bash -pip install git+https://github.com/huggingface/diffusers -``` - -이 명령어는 최신 `stable` 버전이 아닌 최첨단 `main` 버전을 설치합니다. -`main` 버전은 최신 개발 정보를 최신 상태로 유지하는 데 유용합니다. -예를 들어 마지막 공식 릴리즈 이후 버그가 수정되었지만, 새 릴리즈가 아직 출시되지 않은 경우입니다. -그러나 이는 `main` 버전이 항상 안정적이지 않을 수 있음을 의미합니다. -우리는 `main` 버전이 지속적으로 작동하도록 노력하고 있으며, 대부분의 문제는 보통 몇 시간 또는 하루 안에 해결됩니다. -문제가 발생하면 더 빨리 해결할 수 있도록 [Issue](https://github.com/huggingface/transformers/issues)를 열어주세요! - - -## 편집가능한 설치 - -다음을 수행하려면 편집가능한 설치가 필요합니다: - -* 소스 코드의 `main` 버전을 사용 -* 🤗 Diffusers에 기여 (코드의 변경 사항을 테스트하기 위해 필요) - -저장소를 복제하고 다음 명령어를 사용하여 🤗 Diffusers를 설치합니다: - -```bash -git clone https://github.com/huggingface/diffusers.git -cd diffusers -``` - -**PyTorch의 경우** - -``` -pip install -e ".[torch]" -``` - -**Flax의 경우** - -``` -pip install -e ".[flax]" -``` - -이러한 명령어들은 저장소를 복제한 폴더와 Python 라이브러리 경로를 연결합니다. -Python은 이제 일반 라이브러리 경로에 더하여 복제한 폴더 내부를 살펴봅니다. -예를들어 Python 패키지가 `~/anaconda3/envs/main/lib/python3.7/site-packages/`에 설치되어 있는 경우 Python은 복제한 폴더인 `~/diffusers/`도 검색합니다. - - - -라이브러리를 계속 사용하려면 `diffusers` 폴더를 유지해야 합니다. - - - -이제 다음 명령어를 사용하여 최신 버전의 🤗 Diffusers로 쉽게 업데이트할 수 있습니다: - -```bash -cd ~/diffusers/ -git pull -``` - -이렇게 하면, 다음에 실행할 때 Python 환경이 🤗 Diffusers의 `main` 버전을 찾게 됩니다. - -## 텔레메트리 로깅에 대한 알림 - -우리 라이브러리는 `from_pretrained()` 요청 중에 텔레메트리 정보를 원격으로 수집합니다. -이 데이터에는 Diffusers 및 PyTorch/Flax의 버전, 요청된 모델 또는 파이프라인 클래스, 그리고 허브에서 호스팅되는 경우 사전학습된 체크포인트에 대한 경로를 포함합니다. -이 사용 데이터는 문제를 디버깅하고 새로운 기능의 우선순위를 지정하는데 도움이 됩니다. -텔레메트리는 HuggingFace 허브에서 모델과 파이프라인을 불러올 때만 전송되며, 로컬 사용 중에는 수집되지 않습니다. - -우리는 추가 정보를 공유하지 않기를 원하는 사람이 있다는 것을 이해하고 개인 정보를 존중하므로, 터미널에서 `DISABLE_TELEMETRY` 환경 변수를 설정하여 텔레메트리 수집을 비활성화할 수 있습니다. - -Linux/MacOS에서: -```bash -export DISABLE_TELEMETRY=YES -``` - -Windows에서: -```bash -set DISABLE_TELEMETRY=YES -``` \ No newline at end of file diff --git a/diffusers/docs/source/ko/quicktour.mdx b/diffusers/docs/source/ko/quicktour.mdx deleted file mode 100644 index e0676ce2a9ca169322c79c17c4cfd224b6163f43..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/ko/quicktour.mdx +++ /dev/null @@ -1,123 +0,0 @@ - - -# 훑어보기 - -🧨 Diffusers로 빠르게 시작하고 실행하세요! -이 훑어보기는 여러분이 개발자, 일반사용자 상관없이 시작하는 데 도움을 주며, 추론을 위해 [`DiffusionPipeline`] 사용하는 방법을 보여줍니다. - -시작하기에 앞서서, 필요한 모든 라이브러리가 설치되어 있는지 확인하세요: - -```bash -pip install --upgrade diffusers accelerate transformers -``` - -- [`accelerate`](https://huggingface.co/docs/accelerate/index)은 추론 및 학습을 위한 모델 불러오기 속도를 높입니다. -- [`transformers`](https://huggingface.co/docs/transformers/index)는 [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview)과 같이 가장 널리 사용되는 확산 모델을 실행하기 위해 필요합니다. - -## DiffusionPipeline - -[`DiffusionPipeline`]은 추론을 위해 사전학습된 확산 시스템을 사용하는 가장 쉬운 방법입니다. 다양한 양식의 많은 작업에 [`DiffusionPipeline`]을 바로 사용할 수 있습니다. 지원되는 작업은 아래의 표를 참고하세요: - -| **Task** | **Description** | **Pipeline** -|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------| -| Unconditional Image Generation | 가우시안 노이즈에서 이미지 생성 | [unconditional_image_generation](./using-diffusers/unconditional_image_generation`) | -| Text-Guided Image Generation | 텍스트 프롬프트로 이미지 생성 | [conditional_image_generation](./using-diffusers/conditional_image_generation) | -| Text-Guided Image-to-Image Translation | 텍스트 프롬프트에 따라 이미지 조정 | [img2img](./using-diffusers/img2img) | -| Text-Guided Image-Inpainting | 마스크 및 텍스트 프롬프트가 주어진 이미지의 마스킹된 부분을 채우기 | [inpaint](./using-diffusers/inpaint) | -| Text-Guided Depth-to-Image Translation | 깊이 추정을 통해 구조를 유지하면서 텍스트 프롬프트에 따라 이미지의 일부를 조정 | [depth2image](./using-diffusers/depth2image) | - -확산 파이프라인이 다양한 작업에 대해 어떻게 작동하는지는 [**Using Diffusers**](./using-diffusers/overview)를 참고하세요. - -예를들어, [`DiffusionPipeline`] 인스턴스를 생성하여 시작하고, 다운로드하려는 파이프라인 체크포인트를 지정합니다. -모든 [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads)에 대해 [`DiffusionPipeline`]을 사용할 수 있습니다. -하지만, 이 가이드에서는 [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion)을 사용하여 text-to-image를 하는데 [`DiffusionPipeline`]을 사용합니다. - -[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) 기반 모델을 실행하기 전에 [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license)를 주의 깊게 읽으세요. -이는 모델의 향상된 이미지 생성 기능과 이것으로 생성될 수 있는 유해한 콘텐츠 때문입니다. 선택한 Stable Diffusion 모델(*예*: [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5))로 이동하여 라이센스를 읽으세요. - -다음과 같이 모델을 로드할 수 있습니다: - -```python ->>> from diffusers import DiffusionPipeline - ->>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -``` - -[`DiffusionPipeline`]은 모든 모델링, 토큰화 및 스케줄링 구성요소를 다운로드하고 캐시합니다. -모델은 약 14억개의 매개변수로 구성되어 있으므로 GPU에서 실행하는 것이 좋습니다. -PyTorch에서와 마찬가지로 생성기 객체를 GPU로 옮길 수 있습니다. - -```python ->>> pipeline.to("cuda") -``` - -이제 `pipeline`을 사용할 수 있습니다: - -```python ->>> image = pipeline("An image of a squirrel in Picasso style").images[0] -``` - -출력은 기본적으로 [PIL Image object](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class)로 래핑됩니다. - -다음과 같이 함수를 호출하여 이미지를 저장할 수 있습니다: - -```python ->>> image.save("image_of_squirrel_painting.png") -``` - -**참고**: 다음을 통해 가중치를 다운로드하여 로컬에서 파이프라인을 사용할 수도 있습니다: - -``` -git lfs install -git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -``` - -그리고 저장된 가중치를 파이프라인에 불러옵니다. - -```python ->>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") -``` - -파이프라인 실행은 동일한 모델 아키텍처이므로 위의 코드와 동일합니다. - -```python ->>> generator.to("cuda") ->>> image = generator("An image of a squirrel in Picasso style").images[0] ->>> image.save("image_of_squirrel_painting.png") -``` - -확산 시스템은 각각 장점이 있는 여러 다른 [schedulers](./api/schedulers/overview)와 함께 사용할 수 있습니다. 기본적으로 Stable Diffusion은 `PNDMScheduler`로 실행되지만 다른 스케줄러를 사용하는 방법은 매우 간단합니다. *예* [`EulerDiscreteScheduler`] 스케줄러를 사용하려는 경우, 다음과 같이 사용할 수 있습니다: - -```python ->>> from diffusers import EulerDiscreteScheduler - ->>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") - ->>> # change scheduler to Euler ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) -``` - -스케줄러 변경 방법에 대한 자세한 내용은 [Using Schedulers](./using-diffusers/schedulers) 가이드를 참고하세요. - -[Stability AI's](https://stability.ai/)의 Stable Diffusion 모델은 인상적인 이미지 생성 모델이며 텍스트에서 이미지를 생성하는 것보다 훨씬 더 많은 작업을 수행할 수 있습니다. 우리는 Stable Diffusion만을 위한 전체 문서 페이지를 제공합니다 [link](./conceptual/stable_diffusion). - -만약 더 적은 메모리, 더 높은 추론 속도, Mac과 같은 특정 하드웨어 또는 ONNX 런타임에서 실행되도록 Stable Diffusion을 최적화하는 방법을 알고 싶다면 최적화 페이지를 살펴보세요: - -- [Optimized PyTorch on GPU](./optimization/fp16) -- [Mac OS with PyTorch](./optimization/mps) -- [ONNX](./optimization/onnx) -- [OpenVINO](./optimization/open_vino) - -확산 모델을 미세조정하거나 학습시키려면, [**training section**](./training/overview)을 살펴보세요. - -마지막으로, 생성된 이미지를 공개적으로 배포할 때 신중을 기해 주세요 🤗. \ No newline at end of file diff --git a/diffusers/docs/source/zh/_toctree.yml b/diffusers/docs/source/zh/_toctree.yml deleted file mode 100644 index 2d67d9c4a025104a13e0de3851e53a690ac86fc5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/zh/_toctree.yml +++ /dev/null @@ -1,238 +0,0 @@ -- sections: - - local: index - title: 🧨 Diffusers - - local: quicktour - title: 快速入门 - - local: stable_diffusion - title: Stable Diffusion - - local: installation - title: 安装 - title: 开始 -- sections: - - local: tutorials/basic_training - title: Train a diffusion model - title: Tutorials -- sections: - - sections: - - local: using-diffusers/loading - title: Loading Pipelines, Models, and Schedulers - - local: using-diffusers/schedulers - title: Using different Schedulers - - local: using-diffusers/configuration - title: Configuring Pipelines, Models, and Schedulers - - local: using-diffusers/custom_pipeline_overview - title: Loading and Adding Custom Pipelines - - local: using-diffusers/kerascv - title: Using KerasCV Stable Diffusion Checkpoints in Diffusers - title: Loading & Hub - - sections: - - local: using-diffusers/unconditional_image_generation - title: Unconditional Image Generation - - local: using-diffusers/conditional_image_generation - title: Text-to-Image Generation - - local: using-diffusers/img2img - title: Text-Guided Image-to-Image - - local: using-diffusers/inpaint - title: Text-Guided Image-Inpainting - - local: using-diffusers/depth2img - title: Text-Guided Depth-to-Image - - local: using-diffusers/controlling_generation - title: Controlling generation - - local: using-diffusers/reusing_seeds - title: Reusing seeds for deterministic generation - - local: using-diffusers/reproducibility - title: Reproducibility - - local: using-diffusers/custom_pipeline_examples - title: Community Pipelines - - local: using-diffusers/contribute_pipeline - title: How to contribute a Pipeline - - local: using-diffusers/using_safetensors - title: Using safetensors - title: Pipelines for Inference - - sections: - - local: using-diffusers/rl - title: Reinforcement Learning - - local: using-diffusers/audio - title: Audio - - local: using-diffusers/other-modalities - title: Other Modalities - title: Taking Diffusers Beyond Images - title: Using Diffusers -- sections: - - local: optimization/fp16 - title: Memory and Speed - - local: optimization/torch2.0 - title: Torch2.0 support - - local: optimization/xformers - title: xFormers - - local: optimization/onnx - title: ONNX - - local: optimization/open_vino - title: OpenVINO - - local: optimization/mps - title: MPS - - local: optimization/habana - title: Habana Gaudi - title: Optimization/Special Hardware -- sections: - - local: training/overview - title: Overview - - local: training/unconditional_training - title: Unconditional Image Generation - - local: training/text_inversion - title: Textual Inversion - - local: training/dreambooth - title: DreamBooth - - local: training/text2image - title: Text-to-image - - local: training/lora - title: Low-Rank Adaptation of Large Language Models (LoRA) - title: Training -- sections: - - local: conceptual/philosophy - title: Philosophy - - local: conceptual/contribution - title: How to contribute? - - local: conceptual/ethical_guidelines - title: Diffusers' Ethical Guidelines - title: Conceptual Guides -- sections: - - sections: - - local: api/models - title: Models - - local: api/diffusion_pipeline - title: Diffusion Pipeline - - local: api/logging - title: Logging - - local: api/configuration - title: Configuration - - local: api/outputs - title: Outputs - - local: api/loaders - title: Loaders - title: Main Classes - - sections: - - local: api/pipelines/overview - title: Overview - - local: api/pipelines/alt_diffusion - title: AltDiffusion - - local: api/pipelines/audio_diffusion - title: Audio Diffusion - - local: api/pipelines/cycle_diffusion - title: Cycle Diffusion - - local: api/pipelines/dance_diffusion - title: Dance Diffusion - - local: api/pipelines/ddim - title: DDIM - - local: api/pipelines/ddpm - title: DDPM - - local: api/pipelines/dit - title: DiT - - local: api/pipelines/latent_diffusion - title: Latent Diffusion - - local: api/pipelines/paint_by_example - title: PaintByExample - - local: api/pipelines/pndm - title: PNDM - - local: api/pipelines/repaint - title: RePaint - - local: api/pipelines/stable_diffusion_safe - title: Safe Stable Diffusion - - local: api/pipelines/score_sde_ve - title: Score SDE VE - - local: api/pipelines/semantic_stable_diffusion - title: Semantic Guidance - - sections: - - local: api/pipelines/stable_diffusion/overview - title: Overview - - local: api/pipelines/stable_diffusion/text2img - title: Text-to-Image - - local: api/pipelines/stable_diffusion/img2img - title: Image-to-Image - - local: api/pipelines/stable_diffusion/inpaint - title: Inpaint - - local: api/pipelines/stable_diffusion/depth2img - title: Depth-to-Image - - local: api/pipelines/stable_diffusion/image_variation - title: Image-Variation - - local: api/pipelines/stable_diffusion/upscale - title: Super-Resolution - - local: api/pipelines/stable_diffusion/latent_upscale - title: Stable-Diffusion-Latent-Upscaler - - local: api/pipelines/stable_diffusion/pix2pix - title: InstructPix2Pix - - local: api/pipelines/stable_diffusion/attend_and_excite - title: Attend and Excite - - local: api/pipelines/stable_diffusion/pix2pix_zero - title: Pix2Pix Zero - - local: api/pipelines/stable_diffusion/self_attention_guidance - title: Self-Attention Guidance - - local: api/pipelines/stable_diffusion/panorama - title: MultiDiffusion Panorama - - local: api/pipelines/stable_diffusion/controlnet - title: Text-to-Image Generation with ControlNet Conditioning - title: Stable Diffusion - - local: api/pipelines/stable_diffusion_2 - title: Stable Diffusion 2 - - local: api/pipelines/stable_unclip - title: Stable unCLIP - - local: api/pipelines/stochastic_karras_ve - title: Stochastic Karras VE - - local: api/pipelines/unclip - title: UnCLIP - - local: api/pipelines/latent_diffusion_uncond - title: Unconditional Latent Diffusion - - local: api/pipelines/versatile_diffusion - title: Versatile Diffusion - - local: api/pipelines/vq_diffusion - title: VQ Diffusion - title: Pipelines - - sections: - - local: api/schedulers/overview - title: Overview - - local: api/schedulers/ddim - title: DDIM - - local: api/schedulers/ddim_inverse - title: DDIMInverse - - local: api/schedulers/ddpm - title: DDPM - - local: api/schedulers/deis - title: DEIS - - local: api/schedulers/dpm_discrete - title: DPM Discrete Scheduler - - local: api/schedulers/dpm_discrete_ancestral - title: DPM Discrete Scheduler with ancestral sampling - - local: api/schedulers/euler_ancestral - title: Euler Ancestral Scheduler - - local: api/schedulers/euler - title: Euler scheduler - - local: api/schedulers/heun - title: Heun Scheduler - - local: api/schedulers/ipndm - title: IPNDM - - local: api/schedulers/lms_discrete - title: Linear Multistep - - local: api/schedulers/multistep_dpm_solver - title: Multistep DPM-Solver - - local: api/schedulers/pndm - title: PNDM - - local: api/schedulers/repaint - title: RePaint Scheduler - - local: api/schedulers/singlestep_dpm_solver - title: Singlestep DPM-Solver - - local: api/schedulers/stochastic_karras_ve - title: Stochastic Kerras VE - - local: api/schedulers/unipc - title: UniPCMultistepScheduler - - local: api/schedulers/score_sde_ve - title: VE-SDE - - local: api/schedulers/score_sde_vp - title: VP-SDE - - local: api/schedulers/vq_diffusion - title: VQDiffusionScheduler - title: Schedulers - - sections: - - local: api/experimental/rl - title: RL Planning - title: Experimental Features - title: API diff --git a/diffusers/docs/source/zh/index.mdx b/diffusers/docs/source/zh/index.mdx deleted file mode 100644 index 4f952c5db79ccfa120fb23e11303a9a878a887f5..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/zh/index.mdx +++ /dev/null @@ -1,78 +0,0 @@ - - -

-
- -
-

- -# 🧨 Diffusers - -🤗Diffusers提供了预训练好的视觉和音频扩散模型,并可以作为推理和训练的模块化工具箱。 - -更准确地说,🤗Diffusers提供了: - -- 最先进的扩散管道,可以在推理中仅用几行代码运行(详情看[**Using Diffusers**](./using-diffusers/conditional_image_generation))或看[**管道**](#pipelines) 以获取所有支持的管道及其对应的论文的概述。 -- 可以在推理中交替使用的各种噪声调度程序,以便在推理过程中权衡如何选择速度和质量。有关更多信息,可以看[**Schedulers**](./api/schedulers/overview)。 -- 多种类型的模型,如U-Net,可用作端到端扩散系统中的构建模块。有关更多详细信息,可以看 [**Models**](./api/models) 。 -- 训练示例,展示如何训练最流行的扩散模型任务。更多相关信息,可以看[**Training**](./training/overview)。 - - -## 🧨 Diffusers pipelines - -下表总结了所有官方支持的pipelines及其对应的论文,部分提供了colab,可以直接尝试一下。 - - -| 管道 | 论文 | 任务 | Colab -|---|---|:---:|:---:| -| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) -| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb) -| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation | -| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation | -| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation | -| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation | -| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image | -| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | -| [paint_by_example](./api/pipelines/paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting | -| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation | -| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation | -| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [**Semantic Guidance**](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) -| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) -| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) -| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [**MultiDiffusion**](https://multidiffusion.github.io/) | Text-to-Panorama Generation | -| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [**InstructPix2Pix**](https://github.com/timothybrooks/instruct-pix2pix) | Text-Guided Image Editing| -| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://pix2pixzero.github.io/) | Text-Guided Image Editing | -| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [**Attend and Excite for Stable Diffusion**](https://attendandexcite.github.io/Attend-and-Excite/) | Text-to-Image Generation | -| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://ku-cvlab.github.io/Self-Attention-Guidance) | Text-to-Image Generation | -| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation | -| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Depth-Conditional Stable Diffusion**](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation | -| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image | -| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) -| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation | -| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation | -| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation | -| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation | -| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation | -| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation | - - -**注意**: 管道是如何使用相应论文中提出的扩散模型的简单示例。 \ No newline at end of file diff --git a/diffusers/docs/source/zh/installation.mdx b/diffusers/docs/source/zh/installation.mdx deleted file mode 100644 index cda91df8a6cd6fa99dd8710adca08ed08844dafb..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/zh/installation.mdx +++ /dev/null @@ -1,147 +0,0 @@ - - -# 安装 - -安装🤗 Diffusers 到你正在使用的任何深度学习框架中。 - -🤗 Diffusers已在Python 3.7+、PyTorch 1.7.0+和Flax上进行了测试。按照下面的安装说明,针对你正在使用的深度学习框架进行安装: - -- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions. -- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions. - -## 使用pip安装 - -你需要在[虚拟环境](https://docs.python.org/3/library/venv.html)中安装🤗 Diffusers 。 - -如果你对 Python 虚拟环境不熟悉,可以看看这个[教程](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). - -使用虚拟环境你可以轻松管理不同的项目,避免了依赖项之间的兼容性问题。 - -首先,在你的项目目录下创建一个虚拟环境: - -```bash -python -m venv .env -``` - -激活虚拟环境: - -```bash -source .env/bin/activate -``` - -现在你就可以安装 🤗 Diffusers了!使用下边这个命令: - -**PyTorch** - -```bash -pip install diffusers["torch"] -``` - -**Flax** - -```bash -pip install diffusers["flax"] -``` - -## 从源代码安装 - -在从源代码安装 `diffusers` 之前,你先确定你已经安装了 `torch` 和 `accelerate`。 - -`torch`的安装教程可以看 `torch` [文档](https://pytorch.org/get-started/locally/#start-locally). - -安装 `accelerate` - -```bash -pip install accelerate -``` - -从源码安装 🤗 Diffusers 使用以下命令: - -```bash -pip install git+https://github.com/huggingface/diffusers -``` - -这个命令安装的是最新的 `main`版本,而不是最近的`stable`版。 -`main`是一直和最新进展保持一致的。比如,上次正式版发布了,有bug,新的正式版还没推出,但是`main`中可以看到这个bug被修复了。 -但是这也意味着 `main`版本并不总是稳定的。 - -我们努力保持`main`版本正常运行,大多数问题都能在几个小时或一天之内解决 - -如果你遇到了问题,可以提 [Issue](https://github.com/huggingface/transformers/issues),这样我们就能更快修复问题了。 - -## 可修改安装 - -如果你想做以下两件事,那你可能需要一个可修改代码的安装方式: - -* 使用 `main`版本的源代码。 -* 为 🤗 Diffusers 贡献,需要测试代码中的变化。 - -使用以下命令克隆并安装 🤗 Diffusers: - -```bash -git clone https://github.com/huggingface/diffusers.git -cd diffusers -``` - -**PyTorch** - -``` -pip install -e ".[torch]" -``` - -**Flax** - -``` -pip install -e ".[flax]" -``` - -这些命令将连接你克隆的版本库和你的 Python 库路径。 -现在,除了正常的库路径外,Python 还会在你克隆的文件夹内寻找。 -例如,如果你的 Python 包通常安装在 `~/anaconda3/envs/main/lib/python3.7/Site-packages/`,Python 也会搜索你克隆到的文件夹。`~/diffusers/`。 - - - -如果你想继续使用这个库,你必须保留 `diffusers` 文件夹。 - - - - -现在你可以用下面的命令轻松地将你克隆的🤗Diffusers仓库更新到最新版本。 - -```bash -cd ~/diffusers/ -git pull -``` - -你的Python环境将在下次运行时找到`main`版本的🤗 Diffusers。 - -## 注意遥测日志 - -我们的库会在使用`from_pretrained()`请求期间收集信息。这些数据包括Diffusers和PyTorch/Flax的版本,请求的模型或管道,以及预训练检查点的路径(如果它被托管在Hub上)。 - -这些使用数据有助于我们调试问题并优先考虑新功能。 -当从HuggingFace Hub加载模型和管道时才会发送遥测数据,并且在本地使用时不会收集数据。 - -我们知道并不是每个人都想分享这些的信息,我们尊重您的隐私, -因此您可以通过在终端中设置“DISABLE_TELEMETRY”环境变量来禁用遥测数据的收集: - - -在Linux/MacOS中: -```bash -export DISABLE_TELEMETRY=YES -``` - -在Windows中: -```bash -set DISABLE_TELEMETRY=YES -``` \ No newline at end of file diff --git a/diffusers/docs/source/zh/quicktour.mdx b/diffusers/docs/source/zh/quicktour.mdx deleted file mode 100644 index 68ab56c55a85a53c6b444d7831a059f7bed745f4..0000000000000000000000000000000000000000 --- a/diffusers/docs/source/zh/quicktour.mdx +++ /dev/null @@ -1,331 +0,0 @@ - - -[[open-in-colab]] - -# 快速上手 - -训练扩散模型,是为了对随机高斯噪声进行逐步去噪,以生成令人感兴趣的样本,比如图像或者语音。 - -扩散模型的发展引起了人们对生成式人工智能的极大兴趣,你可能已经在网上见过扩散生成的图像了。🧨 Diffusers库的目的是让大家更易上手扩散模型。 - -无论你是开发人员还是普通用户,本文将向你介绍🧨 Diffusers 并帮助你快速开始生成内容! - -🧨 Diffusers 库的三个主要组件: - - -无论你是开发者还是普通用户,这个快速指南将向你介绍🧨 Diffusers,并帮助你快速使用和生成!该库三个主要部分如下: - -* [`DiffusionPipeline`]是一个高级的端到端类,旨在通过预训练的扩散模型快速生成样本进行推理。 -* 作为创建扩散系统做组件的流行的预训练[模型](./api/models)框架和模块。 -* 许多不同的[调度器](./api/schedulers/overview):控制如何在训练过程中添加噪声的算法,以及如何在推理过程中生成去噪图像的算法。 - -快速入门将告诉你如何使用[`DiffusionPipeline`]进行推理,然后指导你如何结合模型和调度器以复现[`DiffusionPipeline`]内部发生的事情。 - - - -快速入门是🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)的简化版,可以帮助你快速上手。如果你想了解更多关于🧨 Diffusers的目标、设计理念以及关于它的核心API的更多细节,可以点击🧨[Diffusers入门](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb)查看。 - - - -在开始之前,确认一下你已经安装好了所需要的库: - -```bash -pip install --upgrade diffusers accelerate transformers -``` - -- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) 在推理和训练过程中加速模型加载。 -- [🤗 Transformers](https://huggingface.co/docs/transformers/index) 是运行最流行的扩散模型所必须的库,比如[Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview). - -## 扩散模型管道 - -[`DiffusionPipeline`]是用预训练的扩散系统进行推理的最简单方法。它是一个包含模型和调度器的端到端系统。你可以直接使用[`DiffusionPipeline`]完成许多任务。请查看下面的表格以了解一些支持的任务,要获取完整的支持任务列表,请查看[🧨 Diffusers 总结](./api/pipelines/overview#diffusers-summary) 。 - -| **任务** | **描述** | **管道** -|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------| -| Unconditional Image Generation | 从高斯噪声中生成图片 | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) | -| Text-Guided Image Generation | 给定文本提示生成图像 | [conditional_image_generation](./using-diffusers/conditional_image_generation) | -| Text-Guided Image-to-Image Translation | 在文本提示的指导下调整图像 | [img2img](./using-diffusers/img2img) | -| Text-Guided Image-Inpainting | 给出图像、遮罩和文本提示,填充图像的遮罩部分 | [inpaint](./using-diffusers/inpaint) | -| Text-Guided Depth-to-Image Translation | 在文本提示的指导下调整图像的部分内容,同时通过深度估计保留其结构 | [depth2img](./using-diffusers/depth2img) | - -首先创建一个[`DiffusionPipeline`]的实例,并指定要下载的pipeline检查点。 -你可以使用存储在Hugging Face Hub上的任何[`DiffusionPipeline`][检查点](https://huggingface.co/models?library=diffusers&sort=downloads)。 -在教程中,你将加载[`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)检查点,用于文本到图像的生成。 - -首先创建一个[DiffusionPipeline]实例,并指定要下载的管道检查点。 -您可以在Hugging Face Hub上使用[DiffusionPipeline]的任何检查点。 -在本快速入门中,您将加载stable-diffusion-v1-5检查点,用于文本到图像生成。 - -。 - -对于[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion)模型,在运行该模型之前,请先仔细阅读[许可证](https://huggingface.co/spaces/CompVis/stable-diffusion-license)。🧨 Diffusers实现了一个[`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py),以防止有攻击性的或有害的内容,但Stable Diffusion模型改进图像的生成能力仍有可能产生潜在的有害内容。 - - - -用[`~DiffusionPipeline.from_pretrained`]方法加载模型。 - -```python ->>> from diffusers import DiffusionPipeline - ->>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") -``` -[`DiffusionPipeline`]会下载并缓存所有的建模、标记化和调度组件。你可以看到Stable Diffusion的pipeline是由[`UNet2DConditionModel`]和[`PNDMScheduler`]等组件组成的: - -```py ->>> pipeline -StableDiffusionPipeline { - "_class_name": "StableDiffusionPipeline", - "_diffusers_version": "0.13.1", - ..., - "scheduler": [ - "diffusers", - "PNDMScheduler" - ], - ..., - "unet": [ - "diffusers", - "UNet2DConditionModel" - ], - "vae": [ - "diffusers", - "AutoencoderKL" - ] -} -``` - -我们强烈建议你在GPU上运行这个pipeline,因为该模型由大约14亿个参数组成。 - -你可以像在Pytorch里那样把生成器对象移到GPU上: - -```python ->>> pipeline.to("cuda") -``` - -现在你可以向`pipeline`传递一个文本提示来生成图像,然后获得去噪的图像。默认情况下,图像输出被放在一个[`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class)对象中。 - -```python ->>> image = pipeline("An image of a squirrel in Picasso style").images[0] ->>> image -``` - -
- -
- - -调用`save`保存图像: - -```python ->>> image.save("image_of_squirrel_painting.png") -``` - -### 本地管道 - -你也可以在本地使用管道。唯一的区别是你需提前下载权重: - -``` -git lfs install -git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -``` - -将下载好的权重加载到管道中: - -```python ->>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5") -``` - -现在你可以像上一节中那样运行管道了。 - -### 更换调度器 - -不同的调度器对去噪速度和质量的权衡是不同的。要想知道哪种调度器最适合你,最好的办法就是试用一下。🧨 Diffusers的主要特点之一是允许你轻松切换不同的调度器。例如,要用[`EulerDiscreteScheduler`]替换默认的[`PNDMScheduler`],用[`~diffusers.ConfigMixin.from_config`]方法加载即可: - -```py ->>> from diffusers import EulerDiscreteScheduler - ->>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") ->>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) -``` - - -试着用新的调度器生成一个图像,看看你能否发现不同之处。 - -在下一节中,你将仔细观察组成[`DiffusionPipeline`]的组件——模型和调度器,并学习如何使用这些组件来生成猫咪的图像。 - -## 模型 - -大多数模型取一个噪声样本,在每个时间点预测*噪声残差*(其他模型则直接学习预测前一个样本或速度或[`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)),即噪声较小的图像与输入图像的差异。你可以混搭模型创建其他扩散系统。 - -模型是用[`~ModelMixin.from_pretrained`]方法启动的,该方法还在本地缓存了模型权重,所以下次加载模型时更快。对于快速入门,你默认加载的是[`UNet2DModel`],这是一个基础的无条件图像生成模型,该模型有一个在猫咪图像上训练的检查点: - - -```py ->>> from diffusers import UNet2DModel - ->>> repo_id = "google/ddpm-cat-256" ->>> model = UNet2DModel.from_pretrained(repo_id) -``` - -想知道模型的参数,调用 `model.config`: - -```py ->>> model.config -``` - -模型配置是一个🧊冻结的🧊字典,意思是这些参数在模型创建后就不变了。这是特意设置的,确保在开始时用于定义模型架构的参数保持不变,其他参数仍然可以在推理过程中进行调整。 - -一些最重要的参数: - -* `sample_size`:输入样本的高度和宽度尺寸。 -* `in_channels`:输入样本的输入通道数。 -* `down_block_types`和`up_block_types`:用于创建U-Net架构的下采样和上采样块的类型。 -* `block_out_channels`:下采样块的输出通道数;也以相反的顺序用于上采样块的输入通道数。 -* `layers_per_block`:每个U-Net块中存在的ResNet块的数量。 - -为了使用该模型进行推理,用随机高斯噪声生成图像形状。它应该有一个`batch`轴,因为模型可以接收多个随机噪声,一个`channel`轴,对应于输入通道的数量,以及一个`sample_size`轴,对应图像的高度和宽度。 - - -```py ->>> import torch - ->>> torch.manual_seed(0) - ->>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size) ->>> noisy_sample.shape -torch.Size([1, 3, 256, 256]) -``` - -对于推理,将噪声图像和一个`timestep`传递给模型。`timestep` 表示输入图像的噪声程度,开始时噪声更多,结束时噪声更少。这有助于模型确定其在扩散过程中的位置,是更接近开始还是结束。使用 `sample` 获得模型输出: - - -```py ->>> with torch.no_grad(): -... noisy_residual = model(sample=noisy_sample, timestep=2).sample -``` - -想生成实际的样本,你需要一个调度器指导去噪过程。在下一节中,你将学习如何把模型与调度器结合起来。 - -## 调度器 - -调度器管理一个噪声样本到一个噪声较小的样本的处理过程,给出模型输出 —— 在这种情况下,它是`noisy_residual`。 - - - - - -🧨 Diffusers是一个用于构建扩散系统的工具箱。预定义好的扩散系统[`DiffusionPipeline`]能方便你快速试用,你也可以单独选择自己的模型和调度器组件来建立一个自定义的扩散系统。 - - - -在快速入门教程中,你将用它的[`~diffusers.ConfigMixin.from_config`]方法实例化[`DDPMScheduler`]: - -```py ->>> from diffusers import DDPMScheduler - ->>> scheduler = DDPMScheduler.from_config(repo_id) ->>> scheduler -DDPMScheduler { - "_class_name": "DDPMScheduler", - "_diffusers_version": "0.13.1", - "beta_end": 0.02, - "beta_schedule": "linear", - "beta_start": 0.0001, - "clip_sample": true, - "clip_sample_range": 1.0, - "num_train_timesteps": 1000, - "prediction_type": "epsilon", - "trained_betas": null, - "variance_type": "fixed_small" -} -``` - - - - -💡 注意调度器是如何从配置中实例化的。与模型不同,调度器没有可训练的权重,而且是无参数的。 - - - -* `num_train_timesteps`:去噪过程的长度,或者换句话说,将随机高斯噪声处理成数据样本所需的时间步数。 -* `beta_schedule`:用于推理和训练的噪声表。 -* `beta_start`和`beta_end`:噪声表的开始和结束噪声值。 - -要预测一个噪音稍小的图像,请将 模型输出、`timestep`和当前`sample` 传递给调度器的[`~diffusers.DDPMScheduler.step`]方法: - - -```py ->>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample ->>> less_noisy_sample.shape -``` - -这个 `less_noisy_sample` 去噪样本 可以被传递到下一个`timestep` ,处理后会将变得噪声更小。现在让我们把所有步骤合起来,可视化整个去噪过程。 - -首先,创建一个函数,对去噪后的图像进行后处理并显示为`PIL.Image`: - -```py ->>> import PIL.Image ->>> import numpy as np - - ->>> def display_sample(sample, i): -... image_processed = sample.cpu().permute(0, 2, 3, 1) -... image_processed = (image_processed + 1.0) * 127.5 -... image_processed = image_processed.numpy().astype(np.uint8) - -... image_pil = PIL.Image.fromarray(image_processed[0]) -... display(f"Image at step {i}") -... display(image_pil) -``` - -将输入和模型移到GPU上加速去噪过程: - -```py ->>> model.to("cuda") ->>> noisy_sample = noisy_sample.to("cuda") -``` - -现在创建一个去噪循环,该循环预测噪声较少样本的残差,并使用调度程序计算噪声较少的样本: - -```py ->>> import tqdm - ->>> sample = noisy_sample - ->>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)): -... # 1. predict noise residual -... with torch.no_grad(): -... residual = model(sample, t).sample - -... # 2. compute less noisy image and set x_t -> x_t-1 -... sample = scheduler.step(residual, t, sample).prev_sample - -... # 3. optionally look at image -... if (i + 1) % 50 == 0: -... display_sample(sample, i + 1) -``` - -看!这样就从噪声中生成出一只猫了!😻 - -
- -
- -## 下一步 - -希望你在这次快速入门教程中用🧨Diffuser 生成了一些很酷的图像! 下一步你可以: - -* 在[训练](./tutorials/basic_training)教程中训练或微调一个模型来生成你自己的图像。 -* 查看官方和社区的[训练或微调脚本](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples)的例子,了解更多使用情况。 -* 在[使用不同的调度器](./using-diffusers/schedulers)指南中了解更多关于加载、访问、更改和比较调度器的信息。 -* 在[Stable Diffusion](./stable_diffusion)教程中探索提示工程、速度和内存优化,以及生成更高质量图像的技巧。 -* 通过[在GPU上优化PyTorch](./optimization/fp16)指南,以及运行[Apple (M1/M2)上的Stable Diffusion](./optimization/mps)和[ONNX Runtime](./optimization/onnx)的教程,更深入地了解如何加速🧨Diffuser。 \ No newline at end of file