metadata
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- HuggingFaceH4/ultrafeedback_binarized
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-5e-7
results: []
zephyr-dpop-qlora-uf-5e-7
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.6789
- Positive Losses: 0.2624
- Dpo Losses: 0.6395
- Rewards/chosen: 0.2321
- Rewards/rejected: 0.1098
- Rewards/accuracies: 0.7180
- Rewards/margins: 0.1223
- Rewards/margins Max: 0.4485
- Rewards/margins Min: -0.1548
- Rewards/margins Std: 0.2025
- Logps/rejected: -247.6011
- Logps/chosen: -261.3811
- Logits/rejected: -2.6202
- Logits/chosen: -2.6543
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6911 | 0.03 | 100 | 0.6918 | 0.0097 | 0.6901 | 0.0256 | 0.0195 | 0.6670 | 0.0061 | 0.0282 | -0.0136 | 0.0138 | -256.6321 | -282.0331 | -2.7662 | -2.8054 |
0.6847 | 0.05 | 200 | 0.6919 | 0.0310 | 0.6806 | 0.0844 | 0.0583 | 0.6710 | 0.0261 | 0.1132 | -0.0512 | 0.0542 | -252.7455 | -276.1540 | -2.7592 | -2.7987 |
0.686 | 0.08 | 300 | 0.6901 | 0.0841 | 0.6693 | 0.1465 | 0.0956 | 0.6950 | 0.0509 | 0.2071 | -0.0915 | 0.0989 | -249.0196 | -269.9467 | -2.7474 | -2.7859 |
0.6944 | 0.1 | 400 | 0.6911 | 0.1510 | 0.6631 | 0.1581 | 0.0931 | 0.7100 | 0.0650 | 0.2490 | -0.1113 | 0.1195 | -249.2730 | -268.7827 | -2.7115 | -2.7504 |
0.6923 | 0.13 | 500 | 0.6788 | 0.0596 | 0.6647 | 0.1948 | 0.1332 | 0.6950 | 0.0617 | 0.2513 | -0.1077 | 0.1190 | -245.2602 | -265.1090 | -2.6843 | -2.7243 |
0.663 | 0.16 | 600 | 0.6892 | 0.1483 | 0.6607 | 0.1942 | 0.1226 | 0.6770 | 0.0716 | 0.3008 | -0.1286 | 0.1420 | -246.3230 | -265.1740 | -2.6660 | -2.7036 |
0.6784 | 0.18 | 700 | 0.6935 | 0.2142 | 0.6550 | 0.1892 | 0.1049 | 0.6970 | 0.0843 | 0.3275 | -0.1274 | 0.1516 | -248.0892 | -265.6756 | -2.6229 | -2.6624 |
0.661 | 0.21 | 800 | 0.6885 | 0.1770 | 0.6538 | 0.1994 | 0.1122 | 0.7020 | 0.0872 | 0.3388 | -0.1292 | 0.1549 | -247.3548 | -264.6508 | -2.6850 | -2.7245 |
0.6736 | 0.24 | 900 | 0.6827 | 0.1576 | 0.6557 | 0.2025 | 0.1192 | 0.6940 | 0.0833 | 0.3345 | -0.1335 | 0.1561 | -246.6593 | -264.3388 | -2.6814 | -2.7201 |
0.6998 | 0.26 | 1000 | 0.6806 | 0.2131 | 0.6517 | 0.2037 | 0.1115 | 0.7070 | 0.0922 | 0.3499 | -0.1335 | 0.1615 | -247.4245 | -264.2192 | -2.6830 | -2.7190 |
0.6943 | 0.29 | 1100 | 0.6808 | 0.2125 | 0.6503 | 0.2101 | 0.1144 | 0.7100 | 0.0957 | 0.3629 | -0.1371 | 0.1674 | -247.1344 | -263.5789 | -2.6633 | -2.6979 |
0.6761 | 0.31 | 1200 | 0.6793 | 0.1898 | 0.6511 | 0.2157 | 0.1215 | 0.7110 | 0.0942 | 0.3704 | -0.1366 | 0.1692 | -246.4255 | -263.0201 | -2.6573 | -2.6916 |
0.6976 | 0.34 | 1300 | 0.6730 | 0.1194 | 0.6535 | 0.2178 | 0.1297 | 0.7080 | 0.0881 | 0.3434 | -0.1322 | 0.1594 | -245.6055 | -262.8122 | -2.6282 | -2.6641 |
0.7536 | 0.37 | 1400 | 0.7005 | 0.3143 | 0.6471 | 0.2121 | 0.1083 | 0.7030 | 0.1038 | 0.3986 | -0.1530 | 0.1838 | -247.7509 | -263.3833 | -2.6211 | -2.6572 |
0.6711 | 0.39 | 1500 | 0.6918 | 0.2213 | 0.6489 | 0.2190 | 0.1197 | 0.7040 | 0.0994 | 0.3826 | -0.1451 | 0.1760 | -246.6128 | -262.6917 | -2.5983 | -2.6356 |
0.7428 | 0.42 | 1600 | 0.6867 | 0.1652 | 0.6501 | 0.2193 | 0.1228 | 0.7010 | 0.0965 | 0.3730 | -0.1448 | 0.1730 | -246.2957 | -262.6611 | -2.5979 | -2.6328 |
0.6593 | 0.44 | 1700 | 0.6785 | 0.2228 | 0.6467 | 0.2221 | 0.1173 | 0.7110 | 0.1048 | 0.3978 | -0.1483 | 0.1825 | -246.8526 | -262.3859 | -2.6262 | -2.6614 |
0.6856 | 0.47 | 1800 | 0.6702 | 0.1343 | 0.6504 | 0.2318 | 0.1356 | 0.6980 | 0.0962 | 0.3760 | -0.1454 | 0.1748 | -245.0162 | -261.4142 | -2.5972 | -2.6326 |
0.6552 | 0.5 | 1900 | 0.6743 | 0.1855 | 0.6484 | 0.2278 | 0.1267 | 0.6990 | 0.1011 | 0.3920 | -0.1494 | 0.1816 | -245.9063 | -261.8096 | -2.5761 | -2.6118 |
0.6577 | 0.52 | 2000 | 0.6748 | 0.2036 | 0.6461 | 0.2310 | 0.1247 | 0.7090 | 0.1063 | 0.4016 | -0.1526 | 0.1853 | -246.1064 | -261.4890 | -2.5869 | -2.6241 |
0.6695 | 0.55 | 2100 | 0.6841 | 0.2842 | 0.6443 | 0.2230 | 0.1124 | 0.7100 | 0.1106 | 0.4202 | -0.1537 | 0.1915 | -247.3420 | -262.2980 | -2.6033 | -2.6404 |
0.6633 | 0.58 | 2200 | 0.6799 | 0.2580 | 0.6435 | 0.2273 | 0.1147 | 0.7140 | 0.1126 | 0.4254 | -0.1549 | 0.1932 | -247.1040 | -261.8589 | -2.6014 | -2.6383 |
0.7136 | 0.6 | 2300 | 0.6781 | 0.2376 | 0.6443 | 0.2290 | 0.1183 | 0.7110 | 0.1107 | 0.4197 | -0.1532 | 0.1914 | -246.7446 | -261.6907 | -2.6118 | -2.6471 |
0.6631 | 0.63 | 2400 | 0.6769 | 0.2289 | 0.6450 | 0.2285 | 0.1195 | 0.7080 | 0.1090 | 0.4134 | -0.1509 | 0.1882 | -246.6301 | -261.7479 | -2.6072 | -2.6430 |
0.6884 | 0.65 | 2500 | 0.6854 | 0.3215 | 0.6404 | 0.2248 | 0.1047 | 0.7120 | 0.1201 | 0.4408 | -0.1583 | 0.2000 | -248.1103 | -262.1167 | -2.6064 | -2.6413 |
0.6701 | 0.68 | 2600 | 0.6817 | 0.2661 | 0.6432 | 0.2290 | 0.1154 | 0.7240 | 0.1136 | 0.4344 | -0.1554 | 0.1960 | -247.0384 | -261.6952 | -2.6116 | -2.6458 |
0.668 | 0.71 | 2700 | 0.6771 | 0.2209 | 0.6441 | 0.2330 | 0.1218 | 0.7190 | 0.1112 | 0.4213 | -0.1525 | 0.1911 | -246.4004 | -261.2966 | -2.6196 | -2.6533 |
0.6851 | 0.73 | 2800 | 0.6777 | 0.2299 | 0.6430 | 0.2330 | 0.1192 | 0.7090 | 0.1138 | 0.4274 | -0.1550 | 0.1946 | -246.6621 | -261.2937 | -2.6278 | -2.6613 |
0.678 | 0.76 | 2900 | 0.6856 | 0.2997 | 0.6402 | 0.2278 | 0.1072 | 0.7110 | 0.1207 | 0.4462 | -0.1603 | 0.2028 | -247.8615 | -261.8085 | -2.6269 | -2.6602 |
0.6605 | 0.79 | 3000 | 0.6807 | 0.2415 | 0.6412 | 0.2316 | 0.1134 | 0.7160 | 0.1182 | 0.4380 | -0.1547 | 0.1986 | -247.2367 | -261.4324 | -2.6275 | -2.6605 |
0.6874 | 0.81 | 3100 | 0.6753 | 0.2061 | 0.6425 | 0.2349 | 0.1199 | 0.7190 | 0.1150 | 0.4300 | -0.1520 | 0.1951 | -246.5852 | -261.0995 | -2.6151 | -2.6494 |
0.6516 | 0.84 | 3200 | 0.6828 | 0.3006 | 0.6385 | 0.2284 | 0.1036 | 0.7160 | 0.1248 | 0.4527 | -0.1586 | 0.2052 | -248.2176 | -261.7539 | -2.6158 | -2.6498 |
0.6627 | 0.86 | 3300 | 0.6773 | 0.2406 | 0.6403 | 0.2325 | 0.1123 | 0.7190 | 0.1203 | 0.4419 | -0.1545 | 0.2003 | -247.3520 | -261.3398 | -2.6184 | -2.6526 |
0.6517 | 0.89 | 3400 | 0.6814 | 0.2865 | 0.6386 | 0.2300 | 0.1056 | 0.7190 | 0.1244 | 0.4519 | -0.1569 | 0.2045 | -248.0181 | -261.5968 | -2.6213 | -2.6551 |
0.7267 | 0.92 | 3500 | 0.6810 | 0.2880 | 0.6385 | 0.2302 | 0.1056 | 0.7200 | 0.1246 | 0.4536 | -0.1569 | 0.2050 | -248.0208 | -261.5744 | -2.6222 | -2.6560 |
0.6563 | 0.94 | 3600 | 0.6790 | 0.2627 | 0.6394 | 0.2318 | 0.1093 | 0.7220 | 0.1225 | 0.4487 | -0.1550 | 0.2027 | -247.6491 | -261.4136 | -2.6216 | -2.6555 |
0.7039 | 0.97 | 3700 | 0.6790 | 0.2634 | 0.6396 | 0.2320 | 0.1099 | 0.7230 | 0.1222 | 0.4483 | -0.1550 | 0.2025 | -247.5927 | -261.3918 | -2.6220 | -2.6559 |
0.6622 | 0.99 | 3800 | 0.6789 | 0.2612 | 0.6395 | 0.2320 | 0.1098 | 0.7220 | 0.1222 | 0.4482 | -0.1549 | 0.2025 | -247.6030 | -261.3938 | -2.6204 | -2.6544 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2