Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V1.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9790
  • Rewards/chosen: -2.8897
  • Rewards/rejected: -2.9459
  • Rewards/accuracies: 0.5
  • Rewards/margins: 0.0562
  • Logps/rejected: -129.8927
  • Logps/chosen: -161.6417
  • Logits/rejected: -1.2043
  • Logits/chosen: -1.1773

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6899 0.3002 71 0.7199 0.1044 0.1468 0.5 -0.0424 -98.9657 -131.7005 -0.6723 -0.6398
0.5545 0.6004 142 0.6834 -0.1295 -0.1555 0.6000 0.0260 -101.9890 -134.0396 -0.6949 -0.6636
0.6881 0.9006 213 0.7185 -0.1471 -0.1685 0.6000 0.0214 -102.1191 -134.2157 -0.7134 -0.6805
0.6234 1.2008 284 0.8098 -0.9930 -1.0067 0.6000 0.0137 -110.5010 -142.6748 -0.7955 -0.7622
0.2756 1.5011 355 0.7770 -1.2850 -1.3168 0.6000 0.0318 -113.6021 -145.5950 -0.8659 -0.8358
0.4006 1.8013 426 0.7082 -0.8266 -0.9994 0.7000 0.1728 -110.4281 -141.0111 -0.8156 -0.7870
0.0745 2.1015 497 0.8545 -1.9092 -2.0160 0.5 0.1068 -120.5937 -151.8366 -1.0343 -1.0061
0.1066 2.4017 568 0.9854 -2.7276 -2.7740 0.5 0.0463 -128.1734 -160.0211 -1.2086 -1.1809
0.0845 2.7019 639 0.9790 -2.8897 -2.9459 0.5 0.0562 -129.8927 -161.6417 -1.2043 -1.1773

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V1.0

Adapter
(1840)
this model