---
license: llama3.1
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model-index:
- name: llama3.1_8b_dpo_bwgenerator
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama3.1_8b_dpo_bwgenerator

This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0900
- Rewards/chosen: -9.2458
- Rewards/rejected: -18.5064
- Rewards/accuracies: 0.9799
- Rewards/margins: 9.2605
- Logps/rejected: -295.2113
- Logps/chosen: -177.0069
- Logits/rejected: -1.0648
- Logits/chosen: -1.6755

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.122         | 0.0719 | 1000  | 0.1055          | -6.4463        | -12.7595         | 0.9689             | 6.3132          | -237.7425      | -149.0121    | -1.0551         | -1.6809       |
| 0.1018        | 0.1438 | 2000  | 0.0928          | -8.3841        | -16.7138         | 0.9760             | 8.3297          | -277.2856      | -168.3895    | -1.0613         | -1.6756       |
| 0.0975        | 0.2157 | 3000  | 0.0914          | -9.0349        | -17.9922         | 0.9773             | 8.9574          | -290.0698      | -174.8974    | -1.0675         | -1.6787       |
| 0.0861        | 0.2876 | 4000  | 0.0911          | -9.1503        | -18.2788         | 0.9786             | 9.1285          | -292.9356      | -176.0516    | -1.0649         | -1.6760       |
| 0.0957        | 0.3595 | 5000  | 0.0904          | -9.2383        | -18.4646         | 0.9786             | 9.2263          | -294.7940      | -176.9318    | -1.0621         | -1.6732       |
| 0.079         | 0.4313 | 6000  | 0.0900          | -9.1569        | -18.3683         | 0.9806             | 9.2114          | -293.8309      | -176.1181    | -1.0645         | -1.6758       |
| 0.0692        | 0.5032 | 7000  | 0.0901          | -9.2211        | -18.4391         | 0.9802             | 9.2179          | -294.5381      | -176.7600    | -1.0652         | -1.6760       |
| 0.0931        | 0.5751 | 8000  | 0.0901          | -9.2306        | -18.4876         | 0.9802             | 9.2570          | -295.0236      | -176.8544    | -1.0630         | -1.6740       |
| 0.0863        | 0.6470 | 9000  | 0.0902          | -9.2159        | -18.4436         | 0.9799             | 9.2277          | -294.5839      | -176.7078    | -1.0635         | -1.6746       |
| 0.0942        | 0.7189 | 10000 | 0.0902          | -9.1872        | -18.4035         | 0.9802             | 9.2163          | -294.1824      | -176.4204    | -1.0647         | -1.6760       |
| 0.0771        | 0.7908 | 11000 | 0.0902          | -9.2250        | -18.4541         | 0.9796             | 9.2290          | -294.6884      | -176.7990    | -1.0629         | -1.6739       |
| 0.0916        | 0.8627 | 12000 | 0.0903          | -9.2340        | -18.4770         | 0.9799             | 9.2430          | -294.9172      | -176.8884    | -1.0633         | -1.6744       |
| 0.0999        | 0.9346 | 13000 | 0.0900          | -9.2458        | -18.5064         | 0.9799             | 9.2605          | -295.2113      | -177.0069    | -1.0648         | -1.6755       |


### Framework versions

- PEFT 0.10.0
- Transformers 4.44.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.7
- Tokenizers 0.19.1