Transformers documentation

コールバック数

You are viewing v4.36.1 version. A newer version v4.48.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

コールバック数

コールバックは、PyTorch のトレーニング ループの動作をカスタマイズできるオブジェクトです。 トレーニング ループを検査できる Trainer (この機能は TensorFlow にはまだ実装されていません) 状態を確認し (進捗レポート、TensorBoard または他の ML プラットフォームへのログ記録など)、決定を下します (初期段階など)。 停止中)。

コールバックは、返される TrainerControl オブジェクトを除けば、「読み取り専用」のコード部分です。 トレーニング ループ内では何も変更できません。トレーニング ループの変更が必要なカスタマイズの場合は、次のことを行う必要があります。 Trainer をサブクラス化し、必要なメソッドをオーバーライドします (例については、trainer を参照してください)。

デフォルトでは、TrainingArguments.report_to"all" に設定されているため、Trainer は次のコールバックを使用します。

パッケージがインストールされているが、付随する統合を使用したくない場合は、TrainingArguments.report_to を、使用したい統合のみのリストに変更できます (例: ["azure_ml", "wandb"]) 。

コールバックを実装するメインクラスは TrainerCallback です。それは、 TrainingArgumentsTrainer をインスタンス化するために使用され、それにアクセスできます。 TrainerState を介してトレーナーの内部状態を取得し、トレーニング ループ上でいくつかのアクションを実行できます。 TrainerControl

利用可能なコールバック

ライブラリで利用可能な TrainerCallback のリストは次のとおりです。

class transformers.integrations.CometCallback

< >

( )

A TrainerCallback that sends the logs to Comet ML.

setup

< >

( args state model )

Setup the optional Comet.ml integration.

Environment:

  • COMET_MODE (str, optional, defaults to ONLINE): Whether to create an online, offline experiment or disable Comet logging. Can be OFFLINE, ONLINE, or DISABLED.
  • COMET_PROJECT_NAME (str, optional): Comet project name for experiments.
  • COMET_OFFLINE_DIRECTORY (str, optional): Folder to use for saving offline experiments when COMET_MODE is OFFLINE.
  • COMET_LOG_ASSETS (str, optional, defaults to TRUE): Whether or not to log training assets (tf event logs, checkpoints, etc), to Comet. Can be TRUE, or FALSE.

For a number of configurable items in the environment, see here.

class transformers.DefaultFlowCallback

< >

( )

A TrainerCallback that handles the default flow of the training loop for logs, evaluation and checkpoints.

class transformers.PrinterCallback

< >

( )

A bare TrainerCallback that just prints the logs.

class transformers.ProgressCallback

< >

( )

A TrainerCallback that displays the progress of training or evaluation.

class transformers.EarlyStoppingCallback

< >

( early_stopping_patience: int = 1 early_stopping_threshold: typing.Optional[float] = 0.0 )

Parameters

  • early_stopping_patience (int) — Use with metric_for_best_model to stop training when the specified metric worsens for early_stopping_patience evaluation calls.
  • early_stopping_threshold(float, optional) — Use with TrainingArguments metric_for_best_model and early_stopping_patience to denote how much the specified metric must improve to satisfy early stopping conditions. `

A TrainerCallback that handles early stopping.

This callback depends on TrainingArguments argument load_best_model_at_end functionality to set best_metric in TrainerState. Note that if the TrainingArguments argument save_steps differs from eval_steps, the early stopping will not occur until the next save step.

class transformers.integrations.TensorBoardCallback

< >

( tb_writer = None )

Parameters

  • tb_writer (SummaryWriter, optional) — The writer to use. Will instantiate one if not set.

A TrainerCallback that sends the logs to TensorBoard.

class transformers.integrations.WandbCallback

< >

( )

A TrainerCallback that logs metrics, media, model checkpoints to Weight and Biases.

setup

< >

( args state model **kwargs )

Setup the optional Weights & Biases (wandb) integration.

One can subclass and override this method to customize the setup if needed. Find more information here. You can also override the following environment variables:

Environment:

  • WANDB_LOG_MODEL (str, optional, defaults to "false"): Whether to log model and checkpoints during training. Can be "end", "checkpoint" or "false". If set to "end", the model will be uploaded at the end of training. If set to "checkpoint", the checkpoint will be uploaded every args.save_steps . If set to "false", the model will not be uploaded. Use along with load_best_model_at_end() to upload best model.

    Deprecated in 5.0

    Setting WANDB_LOG_MODEL as bool will be deprecated in version 5 of 🤗 Transformers.

  • WANDB_WATCH (str, optional defaults to "false"): Can be "gradients", "all", "parameters", or "false". Set to "all" to log gradients and parameters.

  • WANDB_PROJECT (str, optional, defaults to "huggingface"): Set this to a custom string to store results in a different project.

  • WANDB_DISABLED (bool, optional, defaults to False): Whether to disable wandb entirely. Set WANDB_DISABLED=true to disable.

class transformers.integrations.MLflowCallback

< >

( )

A TrainerCallback that sends the logs to MLflow. Can be disabled by setting environment variable DISABLE_MLFLOW_INTEGRATION = TRUE.

setup

< >

( args state model )

Setup the optional MLflow integration.

Environment:

  • HF_MLFLOW_LOG_ARTIFACTS (str, optional): Whether to use MLflow .log_artifact() facility to log artifacts. This only makes sense if logging to a remote server, e.g. s3 or GCS. If set to True or 1, will copy each saved checkpoint on each save in TrainingArguments’s output_dir to the local or remote artifact storage. Using it without a remote storage will just copy the files to your artifact location.
  • MLFLOW_EXPERIMENT_NAME (str, optional, defaults to None): Whether to use an MLflow experiment_name under which to launch the run. Default to None which will point to the Default experiment in MLflow. Otherwise, it is a case sensitive name of the experiment to be activated. If an experiment with this name does not exist, a new experiment with this name is created.
  • MLFLOW_TAGS (str, optional): A string dump of a dictionary of key/value pair to be added to the MLflow run as tags. Example: os.environ['MLFLOW_TAGS']='{"release.candidate": "RC1", "release.version": "2.2.0"}'.
  • MLFLOW_NESTED_RUN (str, optional): Whether to use MLflow nested runs. If set to True or 1, will create a nested run inside the current run.
  • MLFLOW_RUN_ID (str, optional): Allow to reattach to an existing run which can be usefull when resuming training from a checkpoint. When MLFLOW_RUN_ID environment variable is set, start_run attempts to resume a run with the specified run ID and other parameters are ignored.
  • MLFLOW_FLATTEN_PARAMS (str, optional, defaults to False): Whether to flatten the parameters dictionary before logging.

class transformers.integrations.AzureMLCallback

< >

( azureml_run = None )

A TrainerCallback that sends the logs to AzureML.

class transformers.integrations.CodeCarbonCallback

< >

( )

A TrainerCallback that tracks the CO2 emission of training.

class transformers.integrations.NeptuneCallback

< >

( api_token: typing.Optional[str] = None project: typing.Optional[str] = None name: typing.Optional[str] = None base_namespace: str = 'finetuning' run = None log_parameters: bool = True log_checkpoints: typing.Optional[str] = None **neptune_run_kwargs )

Parameters

  • api_token (str, optional) — Neptune API token obtained upon registration. You can leave this argument out if you have saved your token to the NEPTUNE_API_TOKEN environment variable (strongly recommended). See full setup instructions in the docs.
  • project (str, optional) — Name of an existing Neptune project, in the form “workspace-name/project-name”. You can find and copy the name in Neptune from the project settings -> Properties. If None (default), the value of the NEPTUNE_PROJECT environment variable is used.
  • name (str, optional) — Custom name for the run.
  • base_namespace (str, optional, defaults to “finetuning”) — In the Neptune run, the root namespace that will contain all of the metadata logged by the callback.
  • log_parameters (bool, optional, defaults to True) — If True, logs all Trainer arguments and model parameters provided by the Trainer.
  • log_checkpoints (str, optional) — If “same”, uploads checkpoints whenever they are saved by the Trainer. If “last”, uploads only the most recently saved checkpoint. If “best”, uploads the best checkpoint (among the ones saved by the Trainer). If None, does not upload checkpoints.
  • run (Run, optional) — Pass a Neptune run object if you want to continue logging to an existing run. Read more about resuming runs in the docs.
  • **neptune_run_kwargs (optional) — Additional keyword arguments to be passed directly to the neptune.init_run() function when a new run is created.

TrainerCallback that sends the logs to Neptune.

For instructions and examples, see the Transformers integration guide in the Neptune documentation.

class transformers.integrations.ClearMLCallback

< >

( )

A TrainerCallback that sends the logs to ClearML.

Environment:

  • CLEARML_PROJECT (str, optional, defaults to HuggingFace Transformers): ClearML project name.
  • CLEARML_TASK (str, optional, defaults to Trainer): ClearML task name.
  • CLEARML_LOG_MODEL (bool, optional, defaults to False): Whether to log models as artifacts during training.

class transformers.integrations.DagsHubCallback

< >

( )

A TrainerCallback that logs to DagsHub. Extends MLflowCallback

setup

< >

( *args **kwargs )

Setup the DagsHub’s Logging integration.

Environment:

  • HF_DAGSHUB_LOG_ARTIFACTS (str, optional): Whether to save the data and model artifacts for the experiment. Default to False.

class transformers.integrations.FlyteCallback

< >

( save_log_history: bool = True sync_checkpoints: bool = True )

Parameters

  • save_log_history (bool, optional, defaults to True) — When set to True, the training logs are saved as a Flyte Deck.
  • sync_checkpoints (bool, optional, defaults to True) — When set to True, checkpoints are synced with Flyte and can be used to resume training in the case of an interruption.

A TrainerCallback that sends the logs to Flyte. NOTE: This callback only works within a Flyte task.

Example:

# Note: This example skips over some setup steps for brevity.
from flytekit import current_context, task


@task
def train_hf_transformer():
    cp = current_context().checkpoint
    trainer = Trainer(..., callbacks=[FlyteCallback()])
    output = trainer.train(resume_from_checkpoint=cp.restore())

class transformers.integrations.DVCLiveCallback

< >

( live: typing.Optional[typing.Any] = None log_model: typing.Union[typing.Literal['all'], bool, NoneType] = None **kwargs )

Parameters

  • live (dvclive.Live, optional, defaults to None) — Optional Live instance. If None, a new instance will be created using **kwargs.
  • log_model (Union[Literal[“all”], bool], optional, defaults to None) — Whether to use dvclive.Live.log_artifact() to log checkpoints created by Trainer. If set to True, the final checkpoint is logged at the end of training. If set to "all", the entire TrainingArguments’s output_dir is logged at each checkpoint.

A TrainerCallback that sends the logs to DVCLive.

Use the environment variables below in setup to configure the integration. To customize this callback beyond those environment variables, see here.

setup

< >

( args state model )

Setup the optional DVCLive integration. To customize this callback beyond the environment variables below, see here.

Environment:

  • HF_DVCLIVE_LOG_MODEL (str, optional): Whether to use dvclive.Live.log_artifact() to log checkpoints created by Trainer. If set to True or 1, the final checkpoint is logged at the end of training. If set to all, the entire TrainingArguments’s output_dir is logged at each checkpoint.

TrainerCallback

class transformers.TrainerCallback

< >

( )

Parameters

  • args (TrainingArguments) — The training arguments used to instantiate the Trainer.
  • state (TrainerState) — The current state of the Trainer.
  • control (TrainerControl) — The object that is returned to the Trainer and can be used to make some decisions.
  • model (PreTrainedModel or torch.nn.Module) — The model being trained.
  • tokenizer (PreTrainedTokenizer) — The tokenizer used for encoding the data.
  • optimizer (torch.optim.Optimizer) — The optimizer used for the training steps.
  • lr_scheduler (torch.optim.lr_scheduler.LambdaLR) — The scheduler used for setting the learning rate.
  • train_dataloader (torch.utils.data.DataLoader, optional) — The current dataloader used for training.
  • eval_dataloader (torch.utils.data.DataLoader, optional) — The current dataloader used for training.
  • metrics (Dict[str, float]) — The metrics computed by the last evaluation phase.

    Those are only accessible in the event on_evaluate.

  • logs (Dict[str, float]) — The values to log.

    Those are only accessible in the event on_log.

A class for objects that will inspect the state of the training loop at some events and take some decisions. At each of those events the following arguments are available:

The control object is the only one that can be changed by the callback, in which case the event that changes it should return the modified version.

The argument args, state and control are positionals for all events, all the others are grouped in kwargs. You can unpack the ones you need in the signature of the event using them. As an example, see the code of the simple PrinterCallback.

Example:

class PrinterCallback(TrainerCallback):
    def on_log(self, args, state, control, logs=None, **kwargs):
        _ = logs.pop("total_flos", None)
        if state.is_local_process_zero:
            print(logs)

on_epoch_begin

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the beginning of an epoch.

on_epoch_end

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the end of an epoch.

on_evaluate

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called after an evaluation phase.

on_init_end

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the end of the initialization of the Trainer.

on_log

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called after logging the last logs.

on_predict

< >

( args: TrainingArguments state: TrainerState control: TrainerControl metrics **kwargs )

Event called after a successful prediction.

on_prediction_step

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called after a prediction step.

on_save

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called after a checkpoint save.

on_step_begin

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the beginning of a training step. If using gradient accumulation, one training step might take several inputs.

on_step_end

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the end of a training step. If using gradient accumulation, one training step might take several inputs.

on_substep_end

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the end of an substep during gradient accumulation.

on_train_begin

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the beginning of training.

on_train_end

< >

( args: TrainingArguments state: TrainerState control: TrainerControl **kwargs )

Event called at the end of training.

以下は、カスタム コールバックを PyTorch Trainer に登録する方法の例です。

class MyCallback(TrainerCallback):
    "A callback that prints a message at the beginning of training"

    def on_train_begin(self, args, state, control, **kwargs):
        print("Starting training")


trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[MyCallback],  # We can either pass the callback class this way or an instance of it (MyCallback())
)

コールバックを登録する別の方法は、次のように trainer.add_callback() を呼び出すことです。

trainer = Trainer(...)
trainer.add_callback(MyCallback)
# Alternatively, we can pass an instance of the callback class
trainer.add_callback(MyCallback())

TrainerState

class transformers.TrainerState

< >

( epoch: typing.Optional[float] = None global_step: int = 0 max_steps: int = 0 logging_steps: int = 500 eval_steps: int = 500 save_steps: int = 500 train_batch_size: int = None num_train_epochs: int = 0 num_input_tokens_seen: int = 0 total_flos: float = 0 log_history: typing.List[typing.Dict[str, float]] = None best_metric: typing.Optional[float] = None best_model_checkpoint: typing.Optional[str] = None is_local_process_zero: bool = True is_world_process_zero: bool = True is_hyper_param_search: bool = False trial_name: str = None trial_params: typing.Dict[str, typing.Union[str, float, int, bool]] = None )

Parameters

  • epoch (float, optional) — Only set during training, will represent the epoch the training is at (the decimal part being the percentage of the current epoch completed).
  • global_step (int, optional, defaults to 0) — During training, represents the number of update steps completed.
  • max_steps (int, optional, defaults to 0) — The number of update steps to do during the current training.
  • logging_steps (int, optional, defaults to 500) — Log every X updates steps
  • eval_steps (int, optional) — Run an evaluation every X steps.
  • save_steps (int, optional, defaults to 500) — Save checkpoint every X updates steps.
  • train_batch_size (int, optional) — The batch size for the training dataloader. Only needed when auto_find_batch_size has been used.
  • num_input_tokens_seen (int, optional, defaults to 0) — The number of tokens seen during training (number of input tokens, not the number of prediction tokens).
  • total_flos (float, optional, defaults to 0) — The total number of floating operations done by the model since the beginning of training (stored as floats to avoid overflow).
  • log_history (List[Dict[str, float]], optional) — The list of logs done since the beginning of training.
  • best_metric (float, optional) — When tracking the best model, the value of the best metric encountered so far.
  • best_model_checkpoint (str, optional) — When tracking the best model, the value of the name of the checkpoint for the best model encountered so far.
  • is_local_process_zero (bool, optional, defaults to True) — Whether or not this process is the local (e.g., on one machine if training in a distributed fashion on several machines) main process.
  • is_world_process_zero (bool, optional, defaults to True) — Whether or not this process is the global main process (when training in a distributed fashion on several machines, this is only going to be True for one process).
  • is_hyper_param_search (bool, optional, defaults to False) — Whether we are in the process of a hyper parameter search using Trainer.hyperparameter_search. This will impact the way data will be logged in TensorBoard.

A class containing the Trainer inner state that will be saved along the model and optimizer when checkpointing and passed to the TrainerCallback.

In all this class, one step is to be understood as one update step. When using gradient accumulation, one update step may require several forward and backward passes: if you use gradient_accumulation_steps=n, then one update step requires going through n batches.

load_from_json

< >

( json_path: str )

Create an instance from the content of json_path.

save_to_json

< >

( json_path: str )

Save the content of this instance in JSON format inside json_path.

TrainerControl

class transformers.TrainerControl

< >

( should_training_stop: bool = False should_epoch_stop: bool = False should_save: bool = False should_evaluate: bool = False should_log: bool = False )

Parameters

  • should_training_stop (bool, optional, defaults to False) — Whether or not the training should be interrupted.

    If True, this variable will not be set back to False. The training will just stop.

  • should_epoch_stop (bool, optional, defaults to False) — Whether or not the current epoch should be interrupted.

    If True, this variable will be set back to False at the beginning of the next epoch.

  • should_save (bool, optional, defaults to False) — Whether or not the model should be saved at this step.

    If True, this variable will be set back to False at the beginning of the next step.

  • should_evaluate (bool, optional, defaults to False) — Whether or not the model should be evaluated at this step.

    If True, this variable will be set back to False at the beginning of the next step.

  • should_log (bool, optional, defaults to False) — Whether or not the logs should be reported at this step.

    If True, this variable will be set back to False at the beginning of the next step.

A class that handles the Trainer control flow. This class is used by the TrainerCallback to activate some switches in the training loop.