Helpful Utilities
Below are a variety of utility functions that 🤗 Accelerate provides, broken down by use-case.
Constants
Constants used throughout 🤗 Accelerate for reference
The following are constants used when utilizing Accelerator.save_state()
utils.MODEL_NAME
: "pytorch_model"
utils.OPTIMIZER_NAME
: "optimizer"
utils.RNG_STATE_NAME
: "random_states"
utils.SCALER_NAME
: "scaler.pt
utils.SCHEDULER_NAME
: "scheduler
The following are constants used when utilizing Accelerator.save_model()
utils.WEIGHTS_NAME
: "pytorch_model.bin"
utils.SAFE_WEIGHTS_NAME
: "model.safetensors"
utils.WEIGHTS_INDEX_NAME
: "pytorch_model.bin.index.json"
utils.SAFE_WEIGHTS_INDEX_NAME
: "model.safetensors.index.json"
Data Classes
These are basic dataclasses used throughout 🤗 Accelerate and they can be passed in as parameters.
class accelerate.DistributedType
< source >( value names = None module = None qualname = None type = None start = 1 )
Represents a type of distributed environment.
Values:
- NO — Not a distributed environment, just a single process.
- MULTI_CPU — Distributed on multiple CPU nodes.
- MULTI_GPU — Distributed on multiple GPUs.
- MULTI_NPU — Distributed on multiple NPUs.
- MULTI_XPU — Distributed on multiple XPUs.
- DEEPSPEED — Using DeepSpeed.
- TPU — Distributed on TPUs.
class accelerate.utils.DynamoBackend
< source >( value names = None module = None qualname = None type = None start = 1 )
Represents a dynamo backend (see https://github.com/pytorch/torchdynamo).
Values:
- NO — Do not use torch dynamo.
- EAGER — Uses PyTorch to run the extracted GraphModule. This is quite useful in debugging TorchDynamo issues.
- AOT_EAGER — Uses AotAutograd with no compiler, i.e, just using PyTorch eager for the AotAutograd’s extracted forward and backward graphs. This is useful for debugging, and unlikely to give speedups.
- INDUCTOR — Uses TorchInductor backend with AotAutograd and cudagraphs by leveraging codegened Triton kernels. Read more
- NVFUSER — nvFuser with TorchScript. Read more
- AOT_NVFUSER — nvFuser with AotAutograd. Read more
- AOT_CUDAGRAPHS — cudagraphs with AotAutograd. Read more
- OFI — Uses Torchscript optimize_for_inference. Inference only. Read more
- FX2TRT — Uses Nvidia TensorRT for inference optimizations. Inference only. Read more
- ONNXRT — Uses ONNXRT for inference on CPU/GPU. Inference only. Read more
- IPEX — Uses IPEX for inference on CPU. Inference only. Read more.
class accelerate.utils.LoggerType
< source >( value names = None module = None qualname = None type = None start = 1 )
Represents a type of supported experiment tracker
Values:
- ALL — all available trackers in the environment that are supported
- TENSORBOARD — TensorBoard as an experiment tracker
- WANDB — wandb as an experiment tracker
- COMETML — comet_ml as an experiment tracker
class accelerate.utils.PrecisionType
< source >( value names = None module = None qualname = None type = None start = 1 )
Represents a type of precision used on floating point values
Values:
- NO — using full precision (FP32)
- FP16 — using half precision
- BF16 — using brain floating point precision
class accelerate.utils.ProjectConfiguration
< source >( project_dir: str = None logging_dir: str = None automatic_checkpoint_naming: bool = False total_limit: int = None iteration: int = 0 )
Configuration for the Accelerator object based on inner-project needs.
Sets self.project_dir
and self.logging_dir
to the appropriate values.
Plugins
These are plugins that can be passed to the Accelerator object. While they are defined elsewhere in the documentation, for convience all of them are available to see here:
class accelerate.DeepSpeedPlugin
< source >( hf_ds_config: typing.Any = None gradient_accumulation_steps: int = None gradient_clipping: float = None zero_stage: int = None is_train_batch_min: str = True offload_optimizer_device: bool = None offload_param_device: bool = None offload_optimizer_nvme_path: str = None offload_param_nvme_path: str = None zero3_init_flag: bool = None zero3_save_16bit_model: bool = None )
This plugin is used to integrate DeepSpeed.
deepspeed_config_process
< source >( prefix = '' mismatches = None config = None must_match = True **kwargs )
Process the DeepSpeed config with the values from the kwargs.
class accelerate.FullyShardedDataParallelPlugin
< source >( sharding_strategy: typing.Any = None backward_prefetch: typing.Any = None mixed_precision_policy: typing.Any = None auto_wrap_policy: typing.Optional[typing.Callable] = None cpu_offload: typing.Any = None ignored_modules: typing.Optional[typing.Iterable[torch.nn.modules.module.Module]] = None state_dict_type: typing.Any = None state_dict_config: typing.Any = None optim_state_dict_config: typing.Any = None limit_all_gathers: bool = False use_orig_params: bool = False param_init_fn: typing.Optional[typing.Callable[[torch.nn.modules.module.Module]], NoneType] = None sync_module_states: bool = True forward_prefetch: bool = False )
This plugin is used to enable fully sharded data parallelism.
get_module_class_from_name
< source >( module name )
Gets a class from a module by its name.
class accelerate.utils.GradientAccumulationPlugin
< source >( num_steps: int = None adjust_scheduler: bool = True sync_with_dataloader: bool = True )
A plugin to configure gradient accumulation behavior.
class accelerate.utils.MegatronLMPlugin
< source >( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activation: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: typing.List[int] = None no_wd_decay_cond: typing.Optional[typing.Callable] = None scale_lr_cond: typing.Optional[typing.Callable] = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: typing.Optional[typing.Any] = None custom_train_step_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None custom_model_provider_function: typing.Optional[typing.Callable] = None custom_prepare_model_function: typing.Optional[typing.Callable] = None other_megatron_args: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
class accelerate.utils.TorchDynamoPlugin
< source >( backend: DynamoBackend = None mode: str = None fullgraph: bool = None dynamic: bool = None options: typing.Any = None disable: bool = False )
This plugin is used to compile a model with PyTorch 2.0
Data Manipulation and Operations
These include data operations that mimic the same torch
ops but can be used on distributed processes.
accelerate.utils.broadcast
< source >( tensor from_process: int = 0 )
Recursively broadcast tensor in a nested list/tuple/dictionary of tensors to all devices.
accelerate.utils.concatenate
< source >( data dim = 0 )
Recursively concatenate the tensors in a nested list/tuple/dictionary of lists of tensors with the same shape.
accelerate.utils.gather
< source >( tensor )
Recursively gather tensor in a nested list/tuple/dictionary of tensors from all devices.
accelerate.utils.pad_across_processes
< source >( tensor dim = 0 pad_index = 0 pad_first = False )
Parameters
-
tensor (nested list/tuple/dictionary of
torch.Tensor
) — The data to gather. -
dim (
int
, optional, defaults to 0) — The dimension on which to pad. -
pad_index (
int
, optional, defaults to 0) — The value with which to pad. -
pad_first (
bool
, optional, defaults toFalse
) — Whether to pad at the beginning or the end.
Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered.
accelerate.utils.reduce
< source >( tensor reduction = 'mean' )
Recursively reduce the tensors in a nested list/tuple/dictionary of lists of tensors across all processes by the mean of a given operation.
accelerate.utils.send_to_device
< source >( tensor device non_blocking = False skip_keys = None )
Recursively sends the elements in a nested list/tuple/dictionary of tensors to a given device.
Environment Checks
These functionalities check the state of the current working environment including information about the operating system itself, what it can support, and if particular dependencies are installed.
Checks if bf16 is supported, optionally ignoring the TPU
Checks if torch_npu
is installed and potentially if a NPU is in the environment
accelerate.utils.is_torch_version
< source >( operation: str version: str )
Compares the current PyTorch version to a given reference with an operation.
Checks if torch_xla
is installed and potentially if a TPU is in the environment
Environment Manipulation
A context manager that will add each keyword argument passed to os.environ
and remove them when exiting.
Will convert the values in kwargs
to strings and upper-case all the keys.
A context manager that will cache origin os.environ
and replace it with a empty dictionary in this context.
When this context exits, the cached os.environ
will be back.
accelerate.commands.config.default.write_basic_config
< source >( mixed_precision = 'no' save_location: str = '/github/home/.cache/huggingface/accelerate/default_config.yaml' use_xpu: bool = False )
Parameters
-
mixed_precision (
str
, optional, defaults to “no”) — Mixed Precision to use. Should be one of “no”, “fp16”, or “bf16” -
save_location (
str
, optional, defaults todefault_json_config_file
) — Optional custom save location. Should be passed to--config_file
when usingaccelerate launch
. Default location is inside the huggingface cache folder (~/.cache/huggingface
) but can be overriden by setting theHF_HOME
environmental variable, followed byaccelerate/default_config.yaml
. -
use_xpu (
bool
, optional, defaults toFalse
) — Whether to use XPU if available.
Creates and saves a basic cluster config to be used on a local machine with potentially multiple GPUs. Will also set CPU if it is a CPU-only machine.
When setting up 🤗 Accelerate for the first time, rather than running accelerate config
[~utils.write_basic_config] can be used as an alternative for quick configuration.
Memory
accelerate.utils.get_max_memory
< source >( max_memory: typing.Union[typing.Dict[typing.Union[int, str], typing.Union[int, str]], NoneType] = None )
Get the maximum memory available if nothing is passed, converts string to int otherwise.
accelerate.find_executable_batch_size
< source >( function: callable = None starting_batch_size: int = 128 )
A basic decorator that will try to execute function
. If it fails from exceptions related to out-of-memory or
CUDNN, the batch size is cut in half and passed to function
function
must take in a batch_size
parameter as its first argument.
Modeling
These utilities relate to interacting with PyTorch models
accelerate.utils.extract_model_from_parallel
< source >(
model
keep_fp32_wrapper: bool = True
)
→
torch.nn.Module
Extract a model from its distributed containers.
accelerate.utils.get_max_layer_size
< source >(
modules: typing.List[typing.Tuple[str, torch.nn.modules.module.Module]]
module_sizes: typing.Dict[str, int]
no_split_module_classes: typing.List[str]
)
→
Tuple[int, List[str]]
Parameters
-
modules (
List[Tuple[str, torch.nn.Module]]
) — The list of named modules where we want to determine the maximum layer size. -
module_sizes (
Dict[str, int]
) — A dictionary mapping each layer name to its size (as generated bycompute_module_sizes
). -
no_split_module_classes (
List[str]
) — A list of class names for layers we don’t want to be split.
Returns
Tuple[int, List[str]]
The maximum size of a layer with the list of layer names realizing that maximum size.
Utility function that will scan a list of named modules and return the maximum size used by one full layer. The definition of a layer being:
- a module with no direct children (just parameters and buffers)
- a module whose class name is in the list
no_split_module_classes
accelerate.utils.offload_state_dict
< source >( save_dir: typing.Union[str, os.PathLike] state_dict: typing.Dict[str, torch.Tensor] )
Offload a state dict in a given folder.
Parallel
These include general utilities that should be used when working in parallel.
accelerate.utils.extract_model_from_parallel
< source >(
model
keep_fp32_wrapper: bool = True
)
→
torch.nn.Module
Extract a model from its distributed containers.
Save the data to disk. Use in place of torch.save()
.
Introduces a blocking point in the script, making sure all processes have reached this point before continuing.
Make sure all processes will reach this instruction otherwise one of your processes will hang forever.
Random
These utilities relate to setting and synchronizing of all the random states.
accelerate.utils.set_seed
< source >( seed: int device_specific: bool = False )
Helper function for reproducible behavior to set the seed in random
, numpy
, torch
.
accelerate.utils.synchronize_rng_state
< source >( rng_type: typing.Optional[accelerate.utils.dataclasses.RNGType] = None generator: typing.Optional[torch._C.Generator] = None )
accelerate.synchronize_rng_states
< source >( rng_types: typing.List[typing.Union[str, accelerate.utils.dataclasses.RNGType]] generator: typing.Optional[torch._C.Generator] = None )
PyTorch XLA
These include utilities that are useful while using PyTorch with XLA.
accelerate.utils.install_xla
< source >( upgrade: bool = False )
Helper function to install appropriate xla wheels based on the torch
version in Google Colaboratory.
Loading model weights
These include utilities that are useful to load checkpoints.
accelerate.load_checkpoint_in_model
< source >( model: Module checkpoint: typing.Union[str, os.PathLike] device_map: typing.Union[typing.Dict[str, typing.Union[int, str, torch.device]], NoneType] = None offload_folder: typing.Union[str, os.PathLike, NoneType] = None dtype: typing.Union[str, torch.dtype, NoneType] = None offload_state_dict: bool = False offload_buffers: bool = False keep_in_fp32_modules: typing.List[str] = None offload_8bit_bnb: bool = False )
Parameters
-
model (
torch.nn.Module
) — The model in which we want to load a checkpoint. -
checkpoint (
str
oros.PathLike
) — The folder checkpoint to load. It can be:- a path to a file containing a whole model state dict
- a path to a
.json
file containing the index to a sharded checkpoint - a path to a folder containing a unique
.index.json
file and the shards of a checkpoint. - a path to a folder containing a unique pytorch_model.bin or a model.safetensors file.
-
device_map (
Dict[str, Union[int, str, torch.device]]
, optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. -
offload_folder (
str
oros.PathLike
, optional) — If thedevice_map
contains any value"disk"
, the folder where we will offload weights. -
dtype (
str
ortorch.dtype
, optional) — If provided, the weights will be converted to that type when loaded. -
offload_state_dict (
bool
, optional, defaults toFalse
) — IfTrue
, will temporarily offload the CPU state dict on the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard does not fit. -
offload_buffers (
bool
, optional, defaults toFalse
) — Whether or not to include the buffers in the weights offloaded to disk. -
keep_in_fp32_modules(
List[str]
, optional) — A list of the modules that we keep intorch.float32
dtype. -
offload_8bit_bnb (
bool
, optional) — Whether or not to enable offload of 8-bit modules on cpu/disk.
Loads a (potentially sharded) checkpoint inside a model, potentially sending weights to a given device as they are loaded.
Once loaded across devices, you still need to call dispatch_model() on your model to make it able to run. To group the checkpoint loading and dispatch in one single call, use load_checkpoint_and_dispatch().
Quantization
These include utilities that are useful to quantize model.
accelerate.utils.load_and_quantize_model
< source >(
model: Module
bnb_quantization_config: BnbQuantizationConfig
weights_location: typing.Union[str, os.PathLike] = None
device_map: typing.Union[typing.Dict[str, typing.Union[int, str, torch.device]], NoneType] = None
no_split_module_classes: typing.Optional[typing.List[str]] = None
max_memory: typing.Union[typing.Dict[typing.Union[int, str], typing.Union[int, str]], NoneType] = None
offload_folder: typing.Union[str, os.PathLike, NoneType] = None
offload_state_dict: bool = False
)
→
torch.nn.Module
Parameters
-
model (
torch.nn.Module
) — Input model. The model can be already loaded or on the meta device -
bnb_quantization_config (
BnbQuantizationConfig
) — The bitsandbytes quantization parameters -
weights_location (
str
oros.PathLike
) — The folder weights_location to load. It can be:- a path to a file containing a whole model state dict
- a path to a
.json
file containing the index to a sharded checkpoint - a path to a folder containing a unique
.index.json
file and the shards of a checkpoint. - a path to a folder containing a unique pytorch_model.bin file.
-
device_map (
Dict[str, Union[int, str, torch.device]]
, optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. -
no_split_module_classes (
List[str]
, optional) — A list of layer class names that should never be split across device (for instance any layer that has a residual connection). -
max_memory (
Dict
, optional) — A dictionary device identifier to maximum memory. Will default to the maximum memory available if unset. -
offload_folder (
str
oros.PathLike
, optional) — If thedevice_map
contains any value"disk"
, the folder where we will offload weights. -
offload_state_dict (
bool
, optional, defaults toFalse
) — IfTrue
, will temporarily offload the CPU state dict on the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard does not fit.
Returns
torch.nn.Module
The quantized model
This function will quantize the input model with the associated config passed in bnb_quantization_config
. If the
model is in the meta device, we will load and dispatch the weights according to the device_map
passed. If the
model is already loaded, we will quantize the model and put the model on the GPU,
class accelerate.utils.BnbQuantizationConfig
< source >( load_in_8bit: bool = False llm_int8_threshold: float = 6.0 load_in_4bit: bool = False bnb_4bit_quant_type: str = 'fp4' bnb_4bit_use_double_quant: bool = False bnb_4bit_compute_dtype: bool = 'fp16' torch_dtype: dtype = None skip_modules: typing.List[str] = None keep_in_fp32_modules: typing.List[str] = None )
A plugin to enable BitsAndBytes 4bit and 8bit quantization