OpenELM-1_1B-DPO-full-2

This model is a fine-tuned version of data/OpenELM-1_1B-SFT-2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7945
  • Rewards/chosen: -8.3125
  • Rewards/rejected: -10.4375
  • Rewards/accuracies: 0.7324
  • Rewards/margins: 2.1406
  • Logps/rejected: -1336.0
  • Logps/chosen: -1144.0
  • Logits/rejected: 5.5
  • Logits/chosen: 3.5938

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6007 0.1047 100 0.6140 -1.2344 -1.5781 0.6562 0.3418 -444.0 -438.0 -8.5 -8.8125
0.591 0.2093 200 0.6025 -1.9297 -2.4688 0.6895 0.5312 -532.0 -508.0 -6.9375 -7.5312
0.6351 0.3140 300 0.5962 -2.2344 -2.6875 0.6875 0.4512 -556.0 -540.0 -4.9062 -5.7812
0.6031 0.4186 400 0.5900 -1.7109 -2.2812 0.6875 0.5625 -512.0 -486.0 -6.25 -7.2188
0.5813 0.5233 500 0.5824 -2.25 -2.8125 0.7051 0.5547 -568.0 -540.0 -3.6406 -4.8125
0.5376 0.6279 600 0.5624 -2.625 -3.3281 0.7012 0.7109 -620.0 -576.0 2.4219 0.9258
0.5582 0.7326 700 0.5655 -3.2812 -4.0938 0.7051 0.8008 -696.0 -644.0 -0.3281 -1.7891
0.5437 0.8373 800 0.5704 -2.8281 -3.4375 0.6992 0.6172 -632.0 -596.0 -1.6719 -3.1719
0.567 0.9419 900 0.5633 -3.1406 -3.9062 0.7227 0.7539 -676.0 -628.0 -1.0781 -2.4219
0.223 1.0466 1000 0.5835 -4.1562 -5.25 0.7461 1.0859 -812.0 -732.0 3.375 1.7734
0.1774 1.1512 1100 0.6000 -4.8438 -5.9688 0.7227 1.1328 -884.0 -800.0 2.8906 0.9844
0.1868 1.2559 1200 0.5954 -4.9062 -6.0625 0.7188 1.1484 -892.0 -804.0 3.5 1.9609
0.1871 1.3605 1300 0.6086 -5.3438 -6.5 0.7324 1.1562 -932.0 -848.0 3.1719 1.3281
0.1651 1.4652 1400 0.5995 -5.375 -6.4688 0.7090 1.0938 -932.0 -852.0 2.9375 1.0625
0.1557 1.5699 1500 0.6073 -5.3125 -6.5938 0.7012 1.2656 -944.0 -848.0 1.9219 -0.1582
0.2145 1.6745 1600 0.6256 -5.1875 -6.4688 0.7031 1.2656 -932.0 -832.0 3.0469 0.9570
0.1666 1.7792 1700 0.6223 -5.5312 -6.8438 0.7246 1.3047 -972.0 -868.0 3.8906 1.7969
0.164 1.8838 1800 0.6084 -4.6875 -5.9375 0.7383 1.2266 -880.0 -784.0 2.6562 0.5117
0.1552 1.9885 1900 0.6211 -5.4375 -6.7812 0.7363 1.3359 -964.0 -856.0 2.5469 0.4004
0.0204 2.0931 2000 0.6830 -6.4062 -8.0 0.7383 1.6328 -1088.0 -952.0 4.1562 2.1719
0.0205 2.1978 2100 0.8096 -9.0 -11.125 0.7168 2.1094 -1400.0 -1216.0 5.4375 3.5469
0.0228 2.3025 2200 0.8077 -8.625 -10.8125 0.7305 2.1562 -1368.0 -1176.0 5.25 3.3281
0.0148 2.4071 2300 0.7832 -8.1875 -10.1875 0.7227 2.0469 -1304.0 -1128.0 5.25 3.3906
0.0202 2.5118 2400 0.7835 -8.1875 -10.25 0.7344 2.0781 -1312.0 -1136.0 5.3125 3.375
0.01 2.6164 2500 0.7940 -8.1875 -10.3125 0.7363 2.1094 -1320.0 -1136.0 5.4688 3.5312
0.0153 2.7211 2600 0.8036 -8.5625 -10.75 0.7324 2.1719 -1360.0 -1168.0 5.625 3.75
0.0205 2.8257 2700 0.7961 -8.375 -10.5 0.7344 2.1562 -1336.0 -1152.0 5.5312 3.6406
0.0184 2.9304 2800 0.7947 -8.3125 -10.5 0.7324 2.1562 -1336.0 -1144.0 5.5 3.5938

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train CharlesLi/OpenELM-1_1B-DPO-full-2