Diffusers documentation

Schedulers

You are viewing v0.9.0 version. A newer version v0.32.2 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Schedulers

Diffusers contains multiple pre-built schedule functions for the diffusion process.

What is a scheduler?

The schedule functions, denoted Schedulers in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample. That’s why schedulers may also be called Samplers in other diffusion models implementations.

  • Schedulers define the methodology for iteratively adding noise to an image or for updating a sample based on model outputs.
    • adding noise in different manners represent the algorithmic processes to train a diffusion model by adding noise to images.
    • for inference, the scheduler defines how to update a sample based on an output from a pretrained model.
  • Schedulers are often defined by a noise schedule and an update rule to solve the differential equation solution.

Discrete versus continuous schedulers

All schedulers take in a timestep to predict the updated version of the sample being diffused. The timesteps dictate where in the diffusion process the step is, where data is generated by iterating forward in time and inference is executed by propagating backwards through timesteps. Different algorithms use timesteps that both discrete (accepting int inputs), such as the DDPMScheduler or PNDMScheduler, and continuous (accepting float inputs), such as the score-based schedulers ScoreSdeVeScheduler or ScoreSdeVpScheduler.

Designing Re-usable schedulers

The core design principle between the schedule functions is to be model, system, and framework independent. This allows for rapid experimentation and cleaner abstractions in the code, where the model prediction is separated from the sample update. To this end, the design of schedulers is such that:

  • Schedulers can be used interchangeably between diffusion models in inference to find the preferred trade-off between speed and generation quality.
  • Schedulers are currently by default in PyTorch, but are designed to be framework independent (partial Jax support currently exists).

API

The core API for any new scheduler must follow a limited structure.

  • Schedulers should provide one or more def step(...) functions that should be called to update the generated sample iteratively.
  • Schedulers should provide a set_timesteps(...) method that configures the parameters of a schedule function for a specific inference task.
  • Schedulers should be framework-specific.

The base class SchedulerMixin implements low level utilities used by multiple schedulers.

SchedulerMixin

class diffusers.SchedulerMixin

< >

( )

Mixin containing common functions for the schedulers.

Class attributes:

  • _compatibles (List[str]) — A list of classes that are compatible with the parent class, so that from_config can be used from a class different than the one used to save the config (should be overridden by parent class).

from_pretrained

< >

( pretrained_model_name_or_path: typing.Dict[str, typing.Any] = None subfolder: typing.Optional[str] = None return_unused_kwargs = False **kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike, optional) — Can be either:

    • A string, the model id of a model repo on huggingface.co. Valid model ids should have an organization name, like google/ddpm-celebahq-256.
    • A path to a directory containing the schedluer configurations saved using save_pretrained(), e.g., ./my_model_directory/.
  • subfolder (str, optional) — In case the relevant files are located inside a subfolder of the model repo (either remote in huggingface.co or downloaded locally), you can specify the folder name here.
  • return_unused_kwargs (bool, optional, defaults to False) — Whether kwargs that are not consumed by the Python class should be returned or not.
  • cache_dir (Union[str, os.PathLike], optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
  • resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received files. Will attempt to resume the download if such a file exists.
  • proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.
  • output_loading_info(bool, optional, defaults to False) — Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (i.e., do not try to download the model).
  • use_auth_token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface).
  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

Instantiate a Scheduler class from a pre-defined JSON configuration file inside a directory or Hub repo.

It is required to be logged in (huggingface-cli login) when you want to use private or gated models.

Activate the special “offline-mode” to use this method in a firewalled environment.

save_pretrained

< >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

Parameters

  • save_directory (str or os.PathLike) — Directory where the configuration JSON file will be saved (will be created if it does not exist).

Save a scheduler configuration object to the directory save_directory, so that it can be re-loaded using the from_pretrained() class method.

SchedulerOutput

The class `SchedulerOutput` contains the outputs from any schedulers `step(...)` call.

class diffusers.schedulers.scheduling_utils.SchedulerOutput

< >

( prev_sample: FloatTensor )

Parameters

  • prev_sample (torch.FloatTensor of shape (batch_size, num_channels, height, width) for images) — Computed sample (x_{t-1}) of previous timestep. prev_sample should be used as next model input in the denoising loop.

Base class for the scheduler’s step function output.

Implemented Schedulers

Denoising diffusion implicit models (DDIM)

Original paper can be found here.

class diffusers.DDIMScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None clip_sample: bool = True set_alpha_to_one: bool = True steps_offset: int = 0 prediction_type: str = 'epsilon' **kwargs )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • clip_sample (bool, default True) — option to clip predicted sample between -1 and 1 for numerical stability.
  • set_alpha_to_one (bool, default True) — each diffusion step uses the value of alphas product at that step and at the previous one. For the final step there is no previous alpha. When this option is True the previous alpha product is fixed to 1, otherwise it uses the value of alpha at step 0.
  • steps_offset (int, default 0) — an offset added to the inference steps. You can use a combination of offset=1 and set_alpha_to_one=False, to make the last step use step 0 for the previous alpha product, as done in stable diffusion.
  • prediction_type (str, default epsilon) — indicates whether the model predicts the noise (epsilon), or the samples. One of epsilon, sample. v-prediction is not supported for this scheduler.

Denoising diffusion implicit models is a scheduler that extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2010.02502

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor eta: float = 0.0 use_clipped_model_output: bool = False generator = None variance_noise: typing.Optional[torch.FloatTensor] = None return_dict: bool = True ) ~schedulers.scheduling_utils.DDIMSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • eta (float) — weight of noise for added noise in diffusion step.
  • use_clipped_model_output (bool) — if True, compute “corrected” model_output from the clipped predicted original sample. Necessary because predicted original sample is clipped to [-1, 1] when self.config.clip_sample is True. If no clipping has happened, “corrected” model_output would coincide with the one provided as input and use_clipped_model_output will have not effect. generator — random number generator.
  • variance_noise (torch.FloatTensor) — instead of generating noise for the variance using generator, we can directly provide the noise for the variance itself. This is useful for methods such as CycleDiffusion. (https://arxiv.org/abs/2210.05559)
  • return_dict (bool) — option for returning tuple rather than DDIMSchedulerOutput class

Returns

~schedulers.scheduling_utils.DDIMSchedulerOutput or tuple

~schedulers.scheduling_utils.DDIMSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

Denoising diffusion probabilistic models (DDPM)

Original paper can be found here.

class diffusers.DDPMScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None variance_type: str = 'fixed_small' clip_sample: bool = True prediction_type: str = 'epsilon' **kwargs )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • variance_type (str) — options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
  • clip_sample (bool, default True) — option to clip predicted sample between -1 and 1 for numerical stability.
  • prediction_type (str, default epsilon) — indicates whether the model predicts the noise (epsilon), or the samples. One of epsilon, sample. v-prediction is not supported for this scheduler.

Denoising diffusion probabilistic models (DDPMs) explores the connections between denoising score matching and Langevin dynamics sampling.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2006.11239

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor generator = None return_dict: bool = True **kwargs ) ~schedulers.scheduling_utils.DDPMSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process. generator — random number generator.
  • return_dict (bool) — option for returning tuple rather than DDPMSchedulerOutput class

Returns

~schedulers.scheduling_utils.DDPMSchedulerOutput or tuple

~schedulers.scheduling_utils.DDPMSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

Multistep DPM-Solver

Original paper can be found here and the improved version. The original implementation can be found here.

class diffusers.DPMSolverMultistepScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None solver_order: int = 2 prediction_type: str = 'epsilon' thresholding: bool = False dynamic_thresholding_ratio: float = 0.995 sample_max_value: float = 1.0 algorithm_type: str = 'dpmsolver++' solver_type: str = 'midpoint' lower_order_final: bool = True **kwargs )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • solver_order (int, default 2) — the order of DPM-Solver; can be 1 or 2 or 3. We recommend to use solver_order=2 for guided sampling, and solver_order=3 for unconditional sampling.
  • prediction_type (str, default epsilon) — indicates whether the model predicts the noise (epsilon), or the data / x0. One of epsilon, sample, or v-prediction.
  • thresholding (bool, default False) — whether to use the “dynamic thresholding” method (introduced by Imagen, https://arxiv.org/abs/2205.11487). For pixel-space diffusion models, you can set both algorithm_type=dpmsolver++ and thresholding=True to use the dynamic thresholding. Note that the thresholding method is unsuitable for latent-space diffusion models (such as stable-diffusion).
  • dynamic_thresholding_ratio (float, default 0.995) — the ratio for the dynamic thresholding method. Default is 0.995, the same as Imagen (https://arxiv.org/abs/2205.11487).
  • sample_max_value (float, default 1.0) — the threshold value for dynamic thresholding. Valid only when thresholding=True and algorithm_type="dpmsolver++.
  • algorithm_type (str, default dpmsolver++) — the algorithm type for the solver. Either dpmsolver or dpmsolver++. The dpmsolver type implements the algorithms in https://arxiv.org/abs/2206.00927, and the dpmsolver++ type implements the algorithms in https://arxiv.org/abs/2211.01095. We recommend to use dpmsolver++ with solver_order=2 for guided sampling (e.g. stable-diffusion).
  • solver_type (str, default midpoint) — the solver type for the second-order solver. Either midpoint or heun. The solver type slightly affects the sample quality, especially for small number of steps. We empirically find that midpoint solvers are slightly better, so we recommend to use the midpoint type.
  • lower_order_final (bool, default True) — whether to use lower-order solvers in the final steps. Only valid for < 15 inference steps. We empirically find this trick can stabilize the sampling of DPM-Solver for steps < 15, especially for steps <= 10.

DPM-Solver (and the improved version DPM-Solver++) is a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee. Empirically, sampling by DPM-Solver with only 20 steps can generate high-quality samples, and it can generate quite good samples even in only 10 steps.

For more details, see the original paper: https://arxiv.org/abs/2206.00927 and https://arxiv.org/abs/2211.01095

Currently, we support the multistep DPM-Solver for both noise prediction models and data prediction models. We recommend to use solver_order=2 for guided sampling, and solver_order=3 for unconditional sampling.

We also support the “dynamic thresholding” method in Imagen (https://arxiv.org/abs/2205.11487). For pixel-space diffusion models, you can set both algorithm_type="dpmsolver++" and thresholding=True to use the dynamic thresholding. Note that the thresholding method is unsuitable for latent-space diffusion models (such as stable-diffusion).

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

convert_model_output

< >

( model_output: FloatTensor timestep: int sample: FloatTensor ) torch.FloatTensor

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.

Returns

torch.FloatTensor

the converted model output.

Convert the model output to the corresponding type that the algorithm (DPM-Solver / DPM-Solver++) needs.

DPM-Solver is designed to discretize an integral of the noise prediction model, and DPM-Solver++ is designed to discretize an integral of the data prediction model. So we need to first convert the model output to the corresponding type to match the algorithm.

Note that the algorithm type and the model type is decoupled. That is to say, we can use either DPM-Solver or DPM-Solver++ for both noise prediction model and data prediction model.

dpm_solver_first_order_update

< >

( model_output: FloatTensor timestep: int prev_timestep: int sample: FloatTensor ) torch.FloatTensor

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • prev_timestep (int) — previous discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.

Returns

torch.FloatTensor

the sample tensor at the previous timestep.

One step for the first-order DPM-Solver (equivalent to DDIM).

See https://arxiv.org/abs/2206.00927 for the detailed derivation.

multistep_dpm_solver_second_order_update

< >

( model_output_list: typing.List[torch.FloatTensor] timestep_list: typing.List[int] prev_timestep: int sample: FloatTensor ) torch.FloatTensor

Parameters

  • model_output_list (List[torch.FloatTensor]) — direct outputs from learned diffusion model at current and latter timesteps.
  • timestep (int) — current and latter discrete timestep in the diffusion chain.
  • prev_timestep (int) — previous discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.

Returns

torch.FloatTensor

the sample tensor at the previous timestep.

One step for the second-order multistep DPM-Solver.

multistep_dpm_solver_third_order_update

< >

( model_output_list: typing.List[torch.FloatTensor] timestep_list: typing.List[int] prev_timestep: int sample: FloatTensor ) torch.FloatTensor

Parameters

  • model_output_list (List[torch.FloatTensor]) — direct outputs from learned diffusion model at current and latter timesteps.
  • timestep (int) — current and latter discrete timestep in the diffusion chain.
  • prev_timestep (int) — previous discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.

Returns

torch.FloatTensor

the sample tensor at the previous timestep.

One step for the third-order multistep DPM-Solver.

scale_model_input

< >

( sample: FloatTensor *args **kwargs ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device, optional) — the device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor return_dict: bool = True ) ~scheduling_utils.SchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

~scheduling_utils.SchedulerOutput or tuple

~scheduling_utils.SchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Step function propagating the sample with the multistep DPM-Solver.

Variance exploding, stochastic sampling from Karras et. al

Original paper can be found here.

class diffusers.KarrasVeScheduler

< >

( sigma_min: float = 0.02 sigma_max: float = 100 s_noise: float = 1.007 s_churn: float = 80 s_min: float = 0.05 s_max: float = 50 )

Parameters

  • sigma_min (float) — minimum noise magnitude
  • sigma_max (float) — maximum noise magnitude
  • s_noise (float) — the amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, 1.011].
  • s_churn (float) — the parameter controlling the overall amount of stochasticity. A reasonable range is [0, 100].
  • s_min (float) — the start value of the sigma range where we add noise (enable stochasticity). A reasonable range is [0, 10].
  • s_max (float) — the end value of the sigma range where we add noise. A reasonable range is [0.2, 80].

Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and the VE column of Table 1 from [1] for reference.

[1] Karras, Tero, et al. “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. “Score-based generative modeling through stochastic differential equations.” https://arxiv.org/abs/2011.13456

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details on the parameters, see the original paper’s Appendix E.: “Elucidating the Design Space of Diffusion-Based Generative Models.” https://arxiv.org/abs/2206.00364. The grid search values used to find the optimal {s_noise, s_churn, s_min, s_max} for a specific model are described in Table 5 of the paper.

add_noise_to_input

< >

( sample: FloatTensor sigma: float generator: typing.Optional[torch._C.Generator] = None )

Explicit Langevin-like “churn” step of adding noise to the sample according to a factor gamma_i ≥ 0 to reach a higher noise level sigma_hat = sigma_i + gamma_i*sigma_i.

TODO Args:

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor return_dict: bool = True ) KarrasVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • sigma_hat (float) — TODO
  • sigma_prev (float) — TODO
  • sample_hat (torch.FloatTensor) — TODO
  • return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

    KarrasVeOutput — updated sample in the diffusion chain and derivative (TODO double check).

Returns

KarrasVeOutput or tuple

KarrasVeOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

step_correct

< >

( model_output: FloatTensor sigma_hat: float sigma_prev: float sample_hat: FloatTensor sample_prev: FloatTensor derivative: FloatTensor return_dict: bool = True ) prev_sample (TODO)

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • sigma_hat (float) — TODO
  • sigma_prev (float) — TODO
  • sample_hat (torch.FloatTensor) — TODO
  • sample_prev (torch.FloatTensor) — TODO
  • derivative (torch.FloatTensor) — TODO
  • return_dict (bool) — option for returning tuple rather than KarrasVeOutput class

Returns

prev_sample (TODO)

updated sample in the diffusion chain. derivative (TODO): TODO

Correct the predicted sample based on the output model_output of the network. TODO complete description

Linear multistep scheduler for discrete beta schedules

Original implementation can be found here.

class diffusers.LMSDiscreteScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear or scaled_linear.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.

Linear Multistep Scheduler for discrete beta schedules. Based on the original k-diffusion implementation by Katherine Crowson: https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

get_lms_coefficient

< >

( order t current_order )

Parameters

  • order (TODO) —
  • t (TODO) —
  • current_order (TODO) —

Compute a linear multistep coefficient.

scale_model_input

< >

( sample: FloatTensor timestep: typing.Union[float, torch.FloatTensor] ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (float or torch.FloatTensor) — the current timestep in the diffusion chain

Returns

torch.FloatTensor

scaled input sample

Scales the denoising model input by (sigma**2 + 1) ** 0.5 to match the K-LMS algorithm.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device, optional) — the device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: typing.Union[float, torch.FloatTensor] sample: FloatTensor order: int = 4 return_dict: bool = True ) ~schedulers.scheduling_utils.LMSDiscreteSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (float) — current timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process. order — coefficient for multi-step inference.
  • return_dict (bool) — option for returning tuple rather than LMSDiscreteSchedulerOutput class

Returns

~schedulers.scheduling_utils.LMSDiscreteSchedulerOutput or tuple

~schedulers.scheduling_utils.LMSDiscreteSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

Pseudo numerical methods for diffusion models (PNDM)

Original implementation can be found here.

class diffusers.PNDMScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None skip_prk_steps: bool = False set_alpha_to_one: bool = False steps_offset: int = 0 )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • skip_prk_steps (bool) — allows the scheduler to skip the Runge-Kutta steps that are defined in the original paper as being required before plms steps; defaults to False.
  • set_alpha_to_one (bool, default False) — each diffusion step uses the value of alphas product at that step and at the previous one. For the final step there is no previous alpha. When this option is True the previous alpha product is fixed to 1, otherwise it uses the value of alpha at step 0.
  • steps_offset (int, default 0) — an offset added to the inference steps. You can use a combination of offset=1 and set_alpha_to_one=False, to make the last step use step 0 for the previous alpha product, as done in stable diffusion.

Pseudo numerical methods for diffusion models (PNDM) proposes using more advanced ODE integration techniques, namely Runge-Kutta method and a linear multi-step method.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2202.09778

scale_model_input

< >

( sample: FloatTensor *args **kwargs ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor return_dict: bool = True ) SchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

SchedulerOutput or tuple

SchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

This function calls step_prk() or step_plms() depending on the internal variable counter.

step_plms

< >

( model_output: FloatTensor timestep: int sample: FloatTensor return_dict: bool = True ) ~scheduling_utils.SchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

~scheduling_utils.SchedulerOutput or tuple

~scheduling_utils.SchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Step function propagating the sample with the linear multi-step method. This has one forward pass with multiple times to approximate the solution.

step_prk

< >

( model_output: FloatTensor timestep: int sample: FloatTensor return_dict: bool = True ) ~scheduling_utils.SchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

~scheduling_utils.SchedulerOutput or tuple

~scheduling_utils.SchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Step function propagating the sample with the Runge-Kutta method. RK takes 4 forward passes to approximate the solution to the differential equation.

variance exploding stochastic differential equation (VE-SDE) scheduler

Original paper can be found here.

class diffusers.ScoreSdeVeScheduler

< >

( num_train_timesteps: int = 2000 snr: float = 0.15 sigma_min: float = 0.01 sigma_max: float = 1348.0 sampling_eps: float = 1e-05 correct_steps: int = 1 )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • snr (float) — coefficient weighting the step from the model_output sample (from the network) to the random noise.
  • sigma_min (float) — initial noise scale for sigma sequence in sampling procedure. The minimum sigma should mirror the distribution of the data.
  • sigma_max (float) — maximum value used for the range of continuous timesteps passed into the model.
  • sampling_eps (float) — the end value of sampling, where timesteps decrease progressively from 1 to epsilon. —
  • correct_steps (int) — number of correction steps performed on a produced sample.

The variance exploding stochastic differential equation (SDE) scheduler.

For more information, see the original paper: https://arxiv.org/abs/2011.13456

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_sigmas

< >

( num_inference_steps: int sigma_min: float = None sigma_max: float = None sampling_eps: float = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • sigma_min (float, optional) — initial noise scale value (overrides value given at Scheduler instantiation).
  • sigma_max (float, optional) — final noise scale value (overrides value given at Scheduler instantiation).
  • sampling_eps (float, optional) — final timestep value (overrides value given at Scheduler instantiation).

Sets the noise scales used for the diffusion chain. Supporting function to be run before inference.

The sigmas control the weight of the drift and diffusion components of sample update.

set_timesteps

< >

( num_inference_steps: int sampling_eps: float = None device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • sampling_eps (float, optional) — final timestep value (overrides value given at Scheduler instantiation).

Sets the continuous timesteps used for the diffusion chain. Supporting function to be run before inference.

step_correct

< >

( model_output: FloatTensor sample: FloatTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) SdeVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process. generator — random number generator.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

SdeVeOutput or tuple

SdeVeOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Correct the predicted sample based on the output model_output of the network. This is often run repeatedly after making the prediction for the previous timestep.

step_pred

< >

( model_output: FloatTensor timestep: int sample: FloatTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) SdeVeOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process. generator — random number generator.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

SdeVeOutput or tuple

SdeVeOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

improved pseudo numerical methods for diffusion models (iPNDM)

Original implementation can be found here.

class diffusers.IPNDMScheduler

< >

( num_train_timesteps: int = 1000 trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.

Improved Pseudo numerical methods for diffusion models (iPNDM) ported from @crowsonkb’s amazing k-diffusion library

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2202.09778

scale_model_input

< >

( sample: FloatTensor *args **kwargs ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor return_dict: bool = True ) ~scheduling_utils.SchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • return_dict (bool) — option for returning tuple rather than SchedulerOutput class

Returns

~scheduling_utils.SchedulerOutput or tuple

~scheduling_utils.SchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Step function propagating the sample with the linear multi-step method. This has one forward pass with multiple times to approximate the solution.

variance preserving stochastic differential equation (VP-SDE) scheduler

Original paper can be found here.

Score SDE-VP is under construction.

class diffusers.schedulers.ScoreSdeVpScheduler

< >

( num_train_timesteps = 2000 beta_min = 0.1 beta_max = 20 sampling_eps = 0.001 )

The variance preserving stochastic differential equation (SDE) scheduler.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more information, see the original paper: https://arxiv.org/abs/2011.13456

UNDER CONSTRUCTION

Euler scheduler

Euler scheduler (Algorithm 2) from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). Based on the original k-diffusion implementation by Katherine Crowson. Fast scheduler which often times generates good outputs with 20-30 steps.

class diffusers.EulerDiscreteScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None prediction_type: str = 'epsilon' )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear or scaled_linear.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.

Euler scheduler (Algorithm 2) from Karras et al. (2022) https://arxiv.org/abs/2206.00364. . Based on the original k-diffusion implementation by Katherine Crowson: https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

scale_model_input

< >

( sample: FloatTensor timestep: typing.Union[float, torch.FloatTensor] ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (float or torch.FloatTensor) — the current timestep in the diffusion chain

Returns

torch.FloatTensor

scaled input sample

Scales the denoising model input by (sigma**2 + 1) ** 0.5 to match the Euler algorithm.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device, optional) — the device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: typing.Union[float, torch.FloatTensor] sample: FloatTensor s_churn: float = 0.0 s_tmin: float = 0.0 s_tmax: float = inf s_noise: float = 1.0 generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) ~schedulers.scheduling_utils.EulerDiscreteSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (float) — current timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • s_churn (float) —
  • s_tmin (float) —
  • s_tmax (float) —
  • s_noise (float) —
  • generator (torch.Generator, optional) — Random number generator.
  • return_dict (bool) — option for returning tuple rather than EulerDiscreteSchedulerOutput class

Returns

~schedulers.scheduling_utils.EulerDiscreteSchedulerOutput or tuple

~schedulers.scheduling_utils.EulerDiscreteSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

Euler Ancestral scheduler

Ancestral sampling with Euler method steps. Based on the original (k-diffusion)[https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72] implementation by Katherine Crowson. Fast scheduler which often times generates good outputs with 20-30 steps.

class diffusers.EulerAncestralDiscreteScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' trained_betas: typing.Union[numpy.ndarray, typing.List[float], NoneType] = None )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear or scaled_linear.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.

Ancestral sampling with Euler method steps. Based on the original k-diffusion implementation by Katherine Crowson: https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

scale_model_input

< >

( sample: FloatTensor timestep: typing.Union[float, torch.FloatTensor] ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (float or torch.FloatTensor) — the current timestep in the diffusion chain

Returns

torch.FloatTensor

scaled input sample

Scales the denoising model input by (sigma**2 + 1) ** 0.5 to match the Euler algorithm.

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device, optional) — the device to which the timesteps should be moved to. If None, the timesteps are not moved.

Sets the timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: typing.Union[float, torch.FloatTensor] sample: FloatTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) ~schedulers.scheduling_utils.EulerAncestralDiscreteSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (float) — current timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • generator (torch.Generator, optional) — Random number generator.
  • return_dict (bool) — option for returning tuple rather than EulerAncestralDiscreteSchedulerOutput class

Returns

~schedulers.scheduling_utils.EulerAncestralDiscreteSchedulerOutput or tuple

~schedulers.scheduling_utils.EulerAncestralDiscreteSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).

VQDiffusionScheduler

Original paper can be found here

class diffusers.VQDiffusionScheduler

< >

( num_vec_classes: int num_train_timesteps: int = 100 alpha_cum_start: float = 0.99999 alpha_cum_end: float = 9e-06 gamma_cum_start: float = 9e-06 gamma_cum_end: float = 0.99999 )

Parameters

  • num_vec_classes (int) — The number of classes of the vector embeddings of the latent pixels. Includes the class for the masked latent pixel.
  • num_train_timesteps (int) — Number of diffusion steps used to train the model.
  • alpha_cum_start (float) — The starting cumulative alpha value.
  • alpha_cum_end (float) — The ending cumulative alpha value.
  • gamma_cum_start (float) — The starting cumulative gamma value.
  • gamma_cum_end (float) — The ending cumulative gamma value.

The VQ-diffusion transformer outputs predicted probabilities of the initial unnoised image.

The VQ-diffusion scheduler converts the transformer’s output into a sample for the unnoised image at the previous diffusion timestep.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/abs/2111.14822

log_Q_t_transitioning_to_known_class

< >

( t: torch.int32 x_t: LongTensor log_onehot_x_t: FloatTensor cumulative: bool ) torch.FloatTensor of shape (batch size, num classes - 1, num latent pixels)

Parameters

  • t (torch.Long) — The timestep that determines which transition matrix is used.
  • x_t (torch.LongTensor of shape (batch size, num latent pixels)) — The classes of each latent pixel at time t.
  • log_onehot_x_t (torch.FloatTensor of shape (batch size, num classes, num latent pixels)) — The log one-hot vectors of x_t
  • cumulative (bool) — If cumulative is False, we use the single step transition matrix t-1->t. If cumulative is True, we use the cumulative transition matrix 0->t.

Returns

torch.FloatTensor of shape (batch size, num classes - 1, num latent pixels)

Each column of the returned matrix is a row of log probabilities of the complete probability transition matrix.

When non cumulative, returns self.num_classes - 1 rows because the initial latent pixel cannot be masked.

Where:

  • q_n is the probability distribution for the forward process of the nth latent pixel.
  • C_0 is a class of a latent pixel embedding
  • C_k is the class of the masked latent pixel

non-cumulative result (omitting logarithms):

_0(x_t | x_{t-1\} = C_0) ... q_n(x_t | x_{t-1\} = C_0) . . . . . . . . . q_0(x_t | x_{t-1\} = C_k) ... q_n(x_t | x_{t-1\} = C_k)`} />

cumulative result (omitting logarithms):

_0_cumulative(x_t | x_0 = C_0) ... q_n_cumulative(x_t | x_0 = C_0) . . . . . . . . . q_0_cumulative(x_t | x_0 = C_{k-1\}) ... q_n_cumulative(x_t | x_0 = C_{k-1\})`} />

Returns the log probabilities of the rows from the (cumulative or non-cumulative) transition matrix for each latent pixel in x_t.

See equation (7) for the complete non-cumulative transition matrix. The complete cumulative transition matrix is the same structure except the parameters (alpha, beta, gamma) are the cumulative analogs.

q_posterior

< >

( log_p_x_0 x_t t ) torch.FloatTensor of shape (batch size, num classes, num latent pixels)

Parameters

  • t (torch.Long) — The timestep that determines which transition matrix is used.

Returns

torch.FloatTensor of shape (batch size, num classes, num latent pixels)

The log probabilities for the predicted classes of the image at timestep t-1. I.e. Equation (11).

Calculates the log probabilities for the predicted classes of the image at timestep t-1. I.e. Equation (11).

Instead of directly computing equation (11), we use Equation (5) to restate Equation (11) in terms of only forward probabilities.

Equation (11) stated in terms of forward probabilities via Equation (5):

Where:

  • the sum is over x0 = {C_0 … C{k-1}} (classes for x_0)

p(x{t-1} | x_t) = sum( q(x_t | x{t-1}) q(x_{t-1} | x_0) p(x_0) / q(x_t | x_0) )

set_timesteps

< >

( num_inference_steps: int device: typing.Union[str, torch.device] = None )

Parameters

  • num_inference_steps (int) — the number of diffusion steps used when generating samples with a pre-trained model.
  • device (str or torch.device) — device to place the timesteps and the diffusion process parameters (alpha, beta, gamma) on.

Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference.

step

< >

( model_output: FloatTensor timestep: torch.int64 sample: LongTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) ~schedulers.scheduling_utils.VQDiffusionSchedulerOutput or tuple

Parameters

  • t (torch.long) — The timestep that determines which transition matrices are used.

    x_t — (torch.LongTensor of shape (batch size, num latent pixels)): The classes of each latent pixel at time t

    generator — (torch.Generator or None): RNG for the noise applied to p(x_{t-1} | x_t) before it is sampled from.

  • return_dict (bool) — option for returning tuple rather than VQDiffusionSchedulerOutput class

Returns

~schedulers.scheduling_utils.VQDiffusionSchedulerOutput or tuple

~schedulers.scheduling_utils.VQDiffusionSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep via the reverse transition distribution i.e. Equation (11). See the docstring for self.q_posterior for more in depth docs on how Equation (11) is computed.

RePaint scheduler

DDPM-based inpainting scheduler for unsupervised inpainting with extreme masks. Intended for use with RePaintPipeline. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models and the original implementation by Andreas Lugmayr et al.: https://github.com/andreas128/RePaint

class diffusers.RePaintScheduler

< >

( num_train_timesteps: int = 1000 beta_start: float = 0.0001 beta_end: float = 0.02 beta_schedule: str = 'linear' eta: float = 0.0 trained_betas: typing.Optional[numpy.ndarray] = None clip_sample: bool = True )

Parameters

  • num_train_timesteps (int) — number of diffusion steps used to train the model.
  • beta_start (float) — the starting beta value of inference.
  • beta_end (float) — the final beta value.
  • beta_schedule (str) — the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
  • eta (float) — The weight of noise for added noise in a diffusion step. Its value is between 0.0 and 1.0 -0.0 is DDIM and 1.0 is DDPM scheduler respectively.
  • trained_betas (np.ndarray, optional) — option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
  • variance_type (str) — options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
  • clip_sample (bool, default True) — option to clip predicted sample between -1 and 1 for numerical stability.

RePaint is a schedule for DDPM inpainting inside a given mask.

~ConfigMixin takes care of storing all config attributes that are passed in the scheduler’s __init__ function, such as num_train_timesteps. They can be accessed via scheduler.config.num_train_timesteps. SchedulerMixin provides general loading and saving functionality via the SchedulerMixin.save_pretrained() and from_pretrained() functions.

For more details, see the original paper: https://arxiv.org/pdf/2201.09865.pdf

scale_model_input

< >

( sample: FloatTensor timestep: typing.Optional[int] = None ) torch.FloatTensor

Parameters

  • sample (torch.FloatTensor) — input sample
  • timestep (int, optional) — current timestep

Returns

torch.FloatTensor

scaled input sample

Ensures interchangeability with schedulers that need to scale the denoising model input depending on the current timestep.

step

< >

( model_output: FloatTensor timestep: int sample: FloatTensor original_image: FloatTensor mask: FloatTensor generator: typing.Optional[torch._C.Generator] = None return_dict: bool = True ) ~schedulers.scheduling_utils.RePaintSchedulerOutput or tuple

Parameters

  • model_output (torch.FloatTensor) — direct output from learned diffusion model.
  • timestep (int) — current discrete timestep in the diffusion chain.
  • sample (torch.FloatTensor) — current instance of sample being created by diffusion process.
  • original_image (torch.FloatTensor) — the original image to inpaint on.
  • mask (torch.FloatTensor) — the mask where 0.0 values define which part of the original image to inpaint (change).
  • generator (torch.Generator, optional) — random number generator.
  • return_dict (bool) — option for returning tuple rather than DDPMSchedulerOutput class

Returns

~schedulers.scheduling_utils.RePaintSchedulerOutput or tuple

~schedulers.scheduling_utils.RePaintSchedulerOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.

Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion process from the learned model outputs (most often the predicted noise).