phi-2-ipo-renew1

This model is a fine-tuned version of lole25/phi-2-sft-ultrachat-lora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 2028.0933
  • Rewards/chosen: -0.1243
  • Rewards/rejected: -0.2158
  • Rewards/accuracies: 0.6900
  • Rewards/margins: 0.0915
  • Logps/rejected: -255.1287
  • Logps/chosen: -269.0499
  • Logits/rejected: 0.5909
  • Logits/chosen: 0.5352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2496.843 0.05 100 2502.2668 -0.0003 -0.0002 0.5005 -0.0002 -233.5649 -256.6506 0.8888 0.8318
2499.2807 0.1 200 2494.8354 0.0001 -0.0005 0.5190 0.0006 -233.5995 -256.6106 0.8882 0.8310
2477.7609 0.16 300 2481.5015 -0.0011 -0.0031 0.5595 0.0019 -233.8548 -256.7285 0.8892 0.8319
2428.4195 0.21 400 2419.1045 -0.0068 -0.0156 0.6495 0.0089 -235.1127 -257.2951 0.8983 0.8404
2296.8842 0.26 500 2349.4358 -0.0240 -0.0419 0.6565 0.0179 -237.7379 -259.0124 0.8806 0.8214
2254.5846 0.31 600 2273.4993 -0.0525 -0.0829 0.6570 0.0304 -241.8383 -261.8659 0.8478 0.7868
2330.7787 0.37 700 2224.3350 -0.0819 -0.1221 0.6630 0.0402 -245.7631 -264.8093 0.8128 0.7517
2223.6863 0.42 800 2196.0991 -0.1009 -0.1487 0.6675 0.0478 -248.4222 -266.7057 0.7611 0.6992
2066.7418 0.47 900 2166.0732 -0.1112 -0.1658 0.6700 0.0546 -250.1319 -267.7397 0.7518 0.6917
2119.2691 0.52 1000 2138.9312 -0.1215 -0.1821 0.6715 0.0606 -251.7610 -268.7693 0.7213 0.6619
2191.7109 0.58 1100 2121.8115 -0.1257 -0.1906 0.6695 0.0648 -252.6059 -269.1910 0.7176 0.6584
2308.1883 0.63 1200 2110.3069 -0.1409 -0.2123 0.6665 0.0715 -254.7812 -270.7044 0.6920 0.6330
1996.7178 0.68 1300 2095.3130 -0.1314 -0.2042 0.6755 0.0728 -253.9726 -269.7621 0.6722 0.6141
2038.3844 0.73 1400 2085.0852 -0.1383 -0.2140 0.6800 0.0756 -254.9441 -270.4488 0.6513 0.5933
2094.2182 0.79 1500 2076.3042 -0.1390 -0.2166 0.6790 0.0777 -255.2133 -270.5129 0.6474 0.5898
2171.3457 0.84 1600 2069.3757 -0.1374 -0.2166 0.6810 0.0792 -255.2130 -270.3595 0.6392 0.5818
2189.3863 0.89 1700 2062.1995 -0.1386 -0.2192 0.6780 0.0806 -255.4675 -270.4739 0.6291 0.5723
2292.8938 0.94 1800 2053.1299 -0.1196 -0.2005 0.6830 0.0809 -253.6025 -268.5789 0.6275 0.5703
2085.5805 0.99 1900 2052.3237 -0.1086 -0.1906 0.6900 0.0821 -252.6131 -267.4730 0.6319 0.5747
1847.759 1.05 2000 2050.4177 -0.1118 -0.1953 0.6850 0.0836 -253.0827 -267.7950 0.6333 0.5763
2024.9559 1.1 2100 2046.7593 -0.1219 -0.2083 0.6900 0.0864 -254.3799 -268.8073 0.6157 0.5590
2038.6354 1.15 2200 2043.5728 -0.1205 -0.2072 0.6880 0.0867 -254.2731 -268.6722 0.6083 0.5518
2022.9617 1.2 2300 2035.5857 -0.1173 -0.2041 0.6895 0.0868 -253.9597 -268.3491 0.6101 0.5535
1871.641 1.26 2400 2036.3373 -0.1190 -0.2073 0.6895 0.0884 -254.2831 -268.5161 0.6046 0.5482
1907.3463 1.31 2500 2034.7010 -0.1216 -0.2108 0.6880 0.0892 -254.6297 -268.7765 0.6022 0.5460
1884.6086 1.36 2600 2033.7977 -0.1215 -0.2105 0.6910 0.0890 -254.6014 -268.7708 0.6013 0.5451
2034.9129 1.41 2700 2032.5447 -0.1235 -0.2140 0.6900 0.0905 -254.9471 -268.9633 0.5987 0.5426
2068.2822 1.47 2800 2030.8698 -0.1251 -0.2162 0.6900 0.0911 -255.1671 -269.1270 0.5943 0.5383
1977.4029 1.52 2900 2030.6033 -0.1251 -0.2162 0.6895 0.0911 -255.1690 -269.1252 0.5941 0.5381
2110.2887 1.57 3000 2030.5707 -0.1259 -0.2173 0.6905 0.0915 -255.2821 -269.2050 0.5908 0.5348
2068.2863 1.62 3100 2029.4174 -0.1242 -0.2156 0.6935 0.0914 -255.1087 -269.0390 0.5913 0.5357
1977.8852 1.67 3200 2026.1289 -0.1249 -0.2165 0.6960 0.0916 -255.2016 -269.1071 0.5920 0.5364
2123.3787 1.73 3300 2027.3552 -0.1248 -0.2162 0.6930 0.0914 -255.1666 -269.0933 0.5926 0.5370
1945.4934 1.78 3400 2025.7804 -0.1248 -0.2164 0.6935 0.0916 -255.1899 -269.1010 0.5909 0.5353
1937.2627 1.83 3500 2027.8240 -0.1247 -0.2163 0.6930 0.0916 -255.1750 -269.0878 0.5903 0.5347
2007.2062 1.88 3600 2025.3228 -0.1244 -0.2164 0.6895 0.0919 -255.1843 -269.0623 0.5910 0.5352
2076.715 1.94 3700 2027.4857 -0.1243 -0.2159 0.6920 0.0916 -255.1383 -269.0487 0.5913 0.5358
2055.2201 1.99 3800 2027.8082 -0.1244 -0.2160 0.6920 0.0916 -255.1455 -269.0543 0.5902 0.5347

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DUAL-GPO-2/phi-2-ipo-renew1

Base model

microsoft/phi-2
Adapter
(706)
this model

Dataset used to train DUAL-GPO-2/phi-2-ipo-renew1