sft_dpo_p
This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo_p dataset. It achieves the following results on the evaluation set:
- Loss: 0.1569
- Rewards/chosen: 0.3090
- Rewards/rejected: -5.2240
- Rewards/accuracies: 0.9520
- Rewards/margins: 5.5331
- Logps/chosen: -1.4012
- Logps/rejected: -57.0955
- Logits/chosen: -0.1708
- Logits/rejected: -0.2166
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 8
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.3669 | 0.0533 | 60 | 0.3126 | 0.3606 | -0.9629 | 0.9150 | 1.3235 | -0.8857 | -14.4843 | -0.5259 | -0.5415 |
0.2995 | 0.1067 | 120 | 0.2095 | 0.2729 | -3.2809 | 0.9320 | 3.5538 | -1.7626 | -37.6640 | -0.2224 | -0.2795 |
0.0686 | 0.16 | 180 | 0.2650 | 0.2280 | -4.0377 | 0.9220 | 4.2657 | -2.2109 | -45.2318 | -0.1560 | -0.2160 |
0.1007 | 0.2133 | 240 | 0.2294 | 0.2211 | -4.3632 | 0.9340 | 4.5843 | -2.2807 | -48.4872 | -0.1604 | -0.2090 |
0.2146 | 0.2667 | 300 | 0.1389 | 0.3621 | -3.4515 | 0.9390 | 3.8136 | -0.8700 | -39.3696 | -0.2215 | -0.2535 |
0.0175 | 0.32 | 360 | 0.1924 | 0.2508 | -4.5680 | 0.9430 | 4.8188 | -1.9836 | -50.5354 | -0.1839 | -0.2427 |
0.2375 | 0.3733 | 420 | 0.2330 | 0.2380 | -4.5576 | 0.9310 | 4.7956 | -2.1114 | -50.4313 | -0.1628 | -0.2199 |
0.2265 | 0.4267 | 480 | 0.2988 | 0.1994 | -4.5453 | 0.9190 | 4.7447 | -2.4975 | -50.3082 | -0.1496 | -0.2141 |
0.0854 | 0.48 | 540 | 0.1945 | 0.2575 | -4.3099 | 0.9370 | 4.5674 | -1.9162 | -47.9538 | -0.1301 | -0.1829 |
0.2707 | 0.5333 | 600 | 0.1508 | 0.3076 | -4.9413 | 0.9500 | 5.2489 | -1.4153 | -54.2679 | -0.1536 | -0.2036 |
0.161 | 0.5867 | 660 | 0.1841 | 0.2792 | -5.1292 | 0.9470 | 5.4084 | -1.6994 | -56.1473 | -0.1543 | -0.2038 |
0.4007 | 0.64 | 720 | 0.1888 | 0.2476 | -5.0702 | 0.9480 | 5.3178 | -2.0148 | -55.5571 | -0.1643 | -0.2078 |
0.1186 | 0.6933 | 780 | 0.2090 | 0.2271 | -5.1242 | 0.9450 | 5.3513 | -2.2203 | -56.0969 | -0.1519 | -0.1959 |
0.148 | 0.7467 | 840 | 0.1778 | 0.2731 | -5.1445 | 0.9470 | 5.4176 | -1.7601 | -56.3004 | -0.1673 | -0.2100 |
0.12 | 0.8 | 900 | 0.1519 | 0.3056 | -5.1776 | 0.9520 | 5.4832 | -1.4355 | -56.6311 | -0.1742 | -0.2169 |
0.1522 | 0.8533 | 960 | 0.1528 | 0.3085 | -5.2151 | 0.9520 | 5.5236 | -1.4062 | -57.0058 | -0.1666 | -0.2108 |
0.1224 | 0.9067 | 1020 | 0.1497 | 0.3084 | -5.2228 | 0.9550 | 5.5312 | -1.4068 | -57.0827 | -0.1706 | -0.2145 |
0.0707 | 0.96 | 1080 | 0.1587 | 0.3037 | -5.2156 | 0.9510 | 5.5192 | -1.4542 | -57.0105 | -0.1721 | -0.2193 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1
- Downloads last month
- 7
Model tree for Howard881010/heat_transfer_sft_dpo_p
Base model
mistralai/Mistral-Nemo-Base-2407
Finetuned
mistralai/Mistral-Nemo-Instruct-2407