|
--- |
|
license: mit |
|
base_model: facebook/m2m100_418M |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- bleu |
|
model-index: |
|
- name: m2m100_418M-finetuned-en-to-hi |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# m2m100_418M-finetuned-en-to-hi |
|
|
|
This model is a fine-tuned version of [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M) on the None dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.0453 |
|
- Bleu: 17.4993 |
|
- Gen Len: 6.7284 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-05 |
|
- train_batch_size: 48 |
|
- eval_batch_size: 48 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 5 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | |
|
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:| |
|
| 2.4274 | 0.16 | 500 | 2.1152 | 4.4935 | 6.8813 | |
|
| 2.1915 | 0.33 | 1000 | 1.9722 | 5.8486 | 6.9727 | |
|
| 2.1187 | 0.49 | 1500 | 1.8575 | 5.5802 | 6.9993 | |
|
| 2.0151 | 0.66 | 2000 | 1.7686 | 8.8892 | 6.8233 | |
|
| 1.9709 | 0.82 | 2500 | 1.6948 | 8.4082 | 6.8809 | |
|
| 1.9376 | 0.99 | 3000 | 1.6341 | 10.0801 | 6.85 | |
|
| 1.761 | 1.15 | 3500 | 1.5788 | 8.1916 | 6.8816 | |
|
| 1.7269 | 1.32 | 4000 | 1.5380 | 10.2779 | 6.9447 | |
|
| 1.7231 | 1.48 | 4500 | 1.4946 | 6.9244 | 6.9402 | |
|
| 1.6925 | 1.65 | 5000 | 1.4456 | 13.7246 | 6.9018 | |
|
| 1.6658 | 1.81 | 5500 | 1.4146 | 9.1181 | 6.9104 | |
|
| 1.6673 | 1.98 | 6000 | 1.3727 | 8.6535 | 6.8682 | |
|
| 1.5165 | 2.14 | 6500 | 1.3441 | 14.8146 | 6.9804 | |
|
| 1.5111 | 2.31 | 7000 | 1.3101 | 11.192 | 6.92 | |
|
| 1.4889 | 2.47 | 7500 | 1.2814 | 11.8364 | 6.9509 | |
|
| 1.4903 | 2.64 | 8000 | 1.2510 | 16.8035 | 6.9316 | |
|
| 1.4871 | 2.8 | 8500 | 1.2298 | 14.5766 | 6.9053 | |
|
| 1.4854 | 2.97 | 9000 | 1.2051 | 14.2822 | 6.8438 | |
|
| 1.3719 | 3.13 | 9500 | 1.1758 | 16.1779 | 6.8918 | |
|
| 1.3481 | 3.3 | 10000 | 1.1612 | 20.1789 | 6.8138 | |
|
| 1.3585 | 3.46 | 10500 | 1.1410 | 15.6937 | 6.8613 | |
|
| 1.35 | 3.63 | 11000 | 1.1261 | 20.0808 | 6.832 | |
|
| 1.3557 | 3.79 | 11500 | 1.1069 | 19.588 | 6.8242 | |
|
| 1.3329 | 3.96 | 12000 | 1.0924 | 19.9913 | 6.796 | |
|
| 1.2792 | 4.12 | 12500 | 1.0791 | 18.8275 | 6.7616 | |
|
| 1.2568 | 4.29 | 13000 | 1.0701 | 16.7189 | 6.7676 | |
|
| 1.2558 | 4.45 | 13500 | 1.0605 | 18.7687 | 6.7464 | |
|
| 1.2533 | 4.62 | 14000 | 1.0541 | 19.1818 | 6.7693 | |
|
| 1.2559 | 4.78 | 14500 | 1.0475 | 19.0462 | 6.738 | |
|
| 1.2513 | 4.95 | 15000 | 1.0453 | 17.4993 | 6.7284 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.2 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |
|
|