Ministral-8B-Instruct-2410-dpo-1000
This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the bct_non_cot_dpo_1000 dataset. It achieves the following results on the evaluation set:
- Loss: 0.3134
- Rewards/chosen: -0.2081
- Rewards/rejected: -2.0644
- Rewards/accuracies: 0.9000
- Rewards/margins: 1.8564
- Logps/chosen: -27.0565
- Logps/rejected: -47.3938
- Logits/chosen: -1.2929
- Logits/rejected: -1.3581
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5697 | 1.7778 | 50 | 0.5310 | 0.0503 | -0.3398 | 0.8400 | 0.3901 | -24.4728 | -30.1476 | -1.6708 | -1.7014 |
0.3795 | 3.5556 | 100 | 0.3455 | -0.1007 | -1.3901 | 0.9000 | 1.2894 | -25.9826 | -40.6504 | -1.4192 | -1.4765 |
0.2291 | 5.3333 | 150 | 0.3134 | -0.2081 | -2.0644 | 0.9000 | 1.8564 | -27.0565 | -47.3938 | -1.2929 | -1.3581 |
0.2516 | 7.1111 | 200 | 0.3151 | -0.2309 | -2.3286 | 0.8900 | 2.0977 | -27.2845 | -50.0355 | -1.2557 | -1.3219 |
0.1897 | 8.8889 | 250 | 0.3143 | -0.2393 | -2.3851 | 0.8900 | 2.1459 | -27.3683 | -50.6008 | -1.2463 | -1.3122 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.2
- Pytorch 2.3.0
- Datasets 2.19.0
- Tokenizers 0.20.0
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for chchen/Ministral-8B-Instruct-2410-dpo-1000
Base model
mistralai/Ministral-8B-Instruct-2410