Callbacks
SyncRefModelCallback
class trl.SyncRefModelCallback
< source >( ref_model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] accelerator: typing.Optional[accelerate.accelerator.Accelerator] )
RichProgressCallback
A TrainerCallback
that displays the progress of training or evaluation using Rich.
WinRateCallback
class trl.WinRateCallback
< source >( judge: BasePairwiseJudge trainer: Trainer generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None num_prompts: typing.Optional[int] = None shuffle_order: bool = True use_soft_judge: bool = False )
Parameters
- judge (
BasePairwiseJudge
) — The judge to use for comparing completions. - trainer (
Trainer
) — Trainer to which the callback will be attached. The trainer’s evaluation dataset must include a"prompt"
column containing the prompts for generating completions. If theTrainer
has a reference model (via theref_model
attribute), it will use this reference model for generating the reference completions; otherwise, it defaults to using the initial model. - generation_config (
GenerationConfig
, optional) — The generation config to use for generating completions. - num_prompts (
int
orNone
, optional, defaults toNone
) — The number of prompts to generate completions for. If not provided, defaults to the number of examples in the evaluation dataset. - shuffle_order (
bool
, optional, defaults toTrue
) — Whether to shuffle the order of the completions before judging. - use_soft_judge (
bool
, optional, defaults toFalse
) — Whether to use a soft judge that returns a win probability between 0 and 1 for the first completion vs the second.
A TrainerCallback that computes the win rate of a model based on a reference.
It generates completions using prompts from the evaluation dataset and compares the trained model’s outputs against
a reference. The reference is either the initial version of the model (before training) or the reference model, if
available in the trainer. During each evaluation step, a judge determines how often the trained model’s completions
win against the reference using a judge. The win rate is then logged in the trainer’s logs under the key
"eval_win_rate"
.
LogCompletionsCallback
class trl.LogCompletionsCallback
< source >( trainer: Trainer generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None num_prompts: typing.Optional[int] = None freq: typing.Optional[int] = None )
Parameters
- trainer (
Trainer
) — Trainer to which the callback will be attached. The trainer’s evaluation dataset must include a"prompt"
column containing the prompts for generating completions. - generation_config (
GenerationConfig
, optional) — The generation config to use for generating completions. - num_prompts (
int
orNone
, optional) — The number of prompts to generate completions for. If not provided, defaults to the number of examples in the evaluation dataset. - freq (
int
orNone
, optional) — The frequency at which to log completions. If not provided, defaults to the trainer’seval_steps
.
A TrainerCallback that logs completions to Weights & Biases and/or Comet.
MergeModelCallback
class trl.MergeModelCallback
< source >( merge_config: typing.Optional[ForwardRef('MergeConfig')] = None merge_at_every_checkpoint: bool = False push_to_hub: bool = False )
Parameters
- merge_config (
MergeConfig
, optional, defaults toNone
) — Configuration used for the merging process. If not provided, the defaultMergeConfig
is used. - merge_at_every_checkpoint (
bool
, optional, defaults toFalse
) — Whether to merge the model at every checkpoint. - push_to_hub (
bool
, optional, defaults toFalse
) — Whether to push the merged model to the Hub after merging.
A TrainerCallback that merges the policy model (the model being trained) with another model based on a merge configuration.