metadata
license: apache-2.0
base_model: t5-small
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: war_tl_model
results: []
war_tl_model
This model is a fine-tuned version of t5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0083
- Bleu: 95.2691
- Gen Len: 5.3275
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 200
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 54 | 2.8093 | 2.5958 | 6.0523 |
No log | 2.0 | 108 | 2.4043 | 3.1846 | 6.1382 |
No log | 3.0 | 162 | 1.9327 | 6.8308 | 6.4901 |
No log | 4.0 | 216 | 1.5969 | 13.8714 | 5.6562 |
No log | 5.0 | 270 | 1.2099 | 20.5562 | 5.9721 |
No log | 6.0 | 324 | 0.9304 | 31.1495 | 5.7038 |
No log | 7.0 | 378 | 0.7074 | 43.6407 | 5.7619 |
No log | 8.0 | 432 | 0.5408 | 49.2356 | 5.5772 |
No log | 9.0 | 486 | 0.3822 | 63.1038 | 5.5528 |
1.9648 | 10.0 | 540 | 0.2888 | 67.2835 | 5.5041 |
1.9648 | 11.0 | 594 | 0.1852 | 72.4324 | 5.3449 |
1.9648 | 12.0 | 648 | 0.1235 | 84.0315 | 5.36 |
1.9648 | 13.0 | 702 | 0.0831 | 88.3721 | 5.374 |
1.9648 | 14.0 | 756 | 0.0629 | 87.43 | 5.3531 |
1.9648 | 15.0 | 810 | 0.0515 | 88.0698 | 5.3577 |
1.9648 | 16.0 | 864 | 0.0526 | 89.6299 | 5.3902 |
1.9648 | 17.0 | 918 | 0.0454 | 89.7151 | 5.3879 |
1.9648 | 18.0 | 972 | 0.0434 | 88.0326 | 5.3879 |
0.4211 | 19.0 | 1026 | 0.0375 | 89.9125 | 5.3229 |
0.4211 | 20.0 | 1080 | 0.0295 | 91.976 | 5.3554 |
0.4211 | 21.0 | 1134 | 0.0441 | 91.7403 | 5.3693 |
0.4211 | 22.0 | 1188 | 0.0290 | 92.0153 | 5.3461 |
0.4211 | 23.0 | 1242 | 0.0318 | 90.8522 | 5.3391 |
0.4211 | 24.0 | 1296 | 0.0343 | 91.9239 | 5.3856 |
0.4211 | 25.0 | 1350 | 0.0260 | 87.7878 | 5.3519 |
0.4211 | 26.0 | 1404 | 0.0332 | 90.3633 | 5.3751 |
0.4211 | 27.0 | 1458 | 0.0269 | 92.1404 | 5.3717 |
0.1559 | 28.0 | 1512 | 0.0323 | 93.0887 | 5.36 |
0.1559 | 29.0 | 1566 | 0.0326 | 94.5354 | 5.3566 |
0.1559 | 30.0 | 1620 | 0.0314 | 93.4507 | 5.374 |
0.1559 | 31.0 | 1674 | 0.0297 | 94.7939 | 5.3357 |
0.1559 | 32.0 | 1728 | 0.0282 | 92.2858 | 5.3531 |
0.1559 | 33.0 | 1782 | 0.0258 | 92.4661 | 5.3508 |
0.1559 | 34.0 | 1836 | 0.0252 | 91.6147 | 5.3577 |
0.1559 | 35.0 | 1890 | 0.0240 | 93.2291 | 5.3728 |
0.1559 | 36.0 | 1944 | 0.0157 | 93.4177 | 5.3844 |
0.1559 | 37.0 | 1998 | 0.0212 | 94.0209 | 5.3589 |
0.093 | 38.0 | 2052 | 0.0199 | 93.1765 | 5.3728 |
0.093 | 39.0 | 2106 | 0.0257 | 93.9608 | 5.3624 |
0.093 | 40.0 | 2160 | 0.0232 | 93.9594 | 5.3717 |
0.093 | 41.0 | 2214 | 0.0198 | 93.5332 | 5.3519 |
0.093 | 42.0 | 2268 | 0.0150 | 93.9354 | 5.3682 |
0.093 | 43.0 | 2322 | 0.0156 | 94.5189 | 5.3566 |
0.093 | 44.0 | 2376 | 0.0170 | 92.767 | 5.36 |
0.093 | 45.0 | 2430 | 0.0178 | 95.2076 | 5.3519 |
0.093 | 46.0 | 2484 | 0.0217 | 93.4226 | 5.3995 |
0.0655 | 47.0 | 2538 | 0.0181 | 93.0419 | 5.3612 |
0.0655 | 48.0 | 2592 | 0.0185 | 94.4578 | 5.3589 |
0.0655 | 49.0 | 2646 | 0.0210 | 93.3838 | 5.3577 |
0.0655 | 50.0 | 2700 | 0.0152 | 93.883 | 5.331 |
0.0655 | 51.0 | 2754 | 0.0182 | 93.8614 | 5.3914 |
0.0655 | 52.0 | 2808 | 0.0160 | 94.1816 | 5.3426 |
0.0655 | 53.0 | 2862 | 0.0158 | 94.2294 | 5.3484 |
0.0655 | 54.0 | 2916 | 0.0135 | 94.4382 | 5.3508 |
0.0655 | 55.0 | 2970 | 0.0151 | 93.8986 | 5.3612 |
0.0517 | 56.0 | 3024 | 0.0113 | 95.2691 | 5.3484 |
0.0517 | 57.0 | 3078 | 0.0130 | 95.0307 | 5.3519 |
0.0517 | 58.0 | 3132 | 0.0137 | 95.2281 | 5.3705 |
0.0517 | 59.0 | 3186 | 0.0115 | 95.2281 | 5.3786 |
0.0517 | 60.0 | 3240 | 0.0130 | 95.2486 | 5.3589 |
0.0517 | 61.0 | 3294 | 0.0119 | 95.2486 | 5.3635 |
0.0517 | 62.0 | 3348 | 0.0134 | 95.2486 | 5.3473 |
0.0517 | 63.0 | 3402 | 0.0151 | 95.1871 | 5.3798 |
0.0517 | 64.0 | 3456 | 0.0141 | 95.2076 | 5.3566 |
0.0357 | 65.0 | 3510 | 0.0139 | 94.6668 | 5.3566 |
0.0357 | 66.0 | 3564 | 0.0122 | 95.2281 | 5.3403 |
0.0357 | 67.0 | 3618 | 0.0172 | 95.2076 | 5.3484 |
0.0357 | 68.0 | 3672 | 0.0162 | 94.7725 | 5.3403 |
0.0357 | 69.0 | 3726 | 0.0121 | 95.2281 | 5.3473 |
0.0357 | 70.0 | 3780 | 0.0163 | 94.6668 | 5.3624 |
0.0357 | 71.0 | 3834 | 0.0117 | 95.2486 | 5.3473 |
0.0357 | 72.0 | 3888 | 0.0151 | 95.2486 | 5.3566 |
0.0357 | 73.0 | 3942 | 0.0104 | 95.2691 | 5.3554 |
0.0357 | 74.0 | 3996 | 0.0098 | 95.2691 | 5.3415 |
0.0342 | 75.0 | 4050 | 0.0117 | 95.2486 | 5.3438 |
0.0342 | 76.0 | 4104 | 0.0125 | 94.6872 | 5.367 |
0.0342 | 77.0 | 4158 | 0.0103 | 95.2486 | 5.3461 |
0.0342 | 78.0 | 4212 | 0.0113 | 95.2281 | 5.3635 |
0.0342 | 79.0 | 4266 | 0.0119 | 95.2691 | 5.374 |
0.0342 | 80.0 | 4320 | 0.0132 | 93.4378 | 5.3577 |
0.0342 | 81.0 | 4374 | 0.0102 | 94.728 | 5.3496 |
0.0342 | 82.0 | 4428 | 0.0156 | 94.6872 | 5.3821 |
0.0342 | 83.0 | 4482 | 0.0097 | 94.728 | 5.3357 |
0.0292 | 84.0 | 4536 | 0.0096 | 95.2486 | 5.3693 |
0.0292 | 85.0 | 4590 | 0.0104 | 95.2691 | 5.3647 |
0.0292 | 86.0 | 4644 | 0.0110 | 94.7064 | 5.3612 |
0.0292 | 87.0 | 4698 | 0.0094 | 94.7268 | 5.3496 |
0.0292 | 88.0 | 4752 | 0.0115 | 95.2486 | 5.36 |
0.0292 | 89.0 | 4806 | 0.0098 | 95.2691 | 5.36 |
0.0292 | 90.0 | 4860 | 0.0104 | 94.5404 | 5.3461 |
0.0292 | 91.0 | 4914 | 0.0103 | 94.6538 | 5.36 |
0.0292 | 92.0 | 4968 | 0.0096 | 95.2691 | 5.3624 |
0.0243 | 93.0 | 5022 | 0.0092 | 95.2486 | 5.3647 |
0.0243 | 94.0 | 5076 | 0.0095 | 95.2691 | 5.3461 |
0.0243 | 95.0 | 5130 | 0.0105 | 95.0189 | 5.3508 |
0.0243 | 96.0 | 5184 | 0.0111 | 95.1994 | 5.3763 |
0.0243 | 97.0 | 5238 | 0.0099 | 95.2691 | 5.3717 |
0.0243 | 98.0 | 5292 | 0.0102 | 95.2691 | 5.3484 |
0.0243 | 99.0 | 5346 | 0.0101 | 95.2691 | 5.374 |
0.0243 | 100.0 | 5400 | 0.0097 | 95.2486 | 5.3426 |
0.0243 | 101.0 | 5454 | 0.0095 | 95.2691 | 5.3508 |
0.0233 | 102.0 | 5508 | 0.0098 | 95.2691 | 5.3531 |
0.0233 | 103.0 | 5562 | 0.0095 | 95.2691 | 5.3624 |
0.0233 | 104.0 | 5616 | 0.0091 | 95.2691 | 5.3461 |
0.0233 | 105.0 | 5670 | 0.0105 | 95.2691 | 5.36 |
0.0233 | 106.0 | 5724 | 0.0137 | 95.2486 | 5.3554 |
0.0233 | 107.0 | 5778 | 0.0108 | 95.2691 | 5.3577 |
0.0233 | 108.0 | 5832 | 0.0094 | 95.2691 | 5.3717 |
0.0233 | 109.0 | 5886 | 0.0095 | 95.2691 | 5.3531 |
0.0233 | 110.0 | 5940 | 0.0096 | 95.2691 | 5.3415 |
0.0233 | 111.0 | 5994 | 0.0094 | 95.2486 | 5.3589 |
0.02 | 112.0 | 6048 | 0.0092 | 95.2486 | 5.3519 |
0.02 | 113.0 | 6102 | 0.0091 | 94.905 | 5.3635 |
0.02 | 114.0 | 6156 | 0.0091 | 95.2691 | 5.3624 |
0.02 | 115.0 | 6210 | 0.0090 | 95.2691 | 5.3368 |
0.02 | 116.0 | 6264 | 0.0094 | 95.2486 | 5.3542 |
0.02 | 117.0 | 6318 | 0.0133 | 95.2486 | 5.3519 |
0.02 | 118.0 | 6372 | 0.0112 | 95.2691 | 5.3531 |
0.02 | 119.0 | 6426 | 0.0115 | 95.2486 | 5.3496 |
0.02 | 120.0 | 6480 | 0.0091 | 95.2691 | 5.3391 |
0.0181 | 121.0 | 6534 | 0.0089 | 95.2691 | 5.3368 |
0.0181 | 122.0 | 6588 | 0.0090 | 95.2691 | 5.3647 |
0.0181 | 123.0 | 6642 | 0.0096 | 95.2691 | 5.3786 |
0.0181 | 124.0 | 6696 | 0.0091 | 95.2691 | 5.381 |
0.0181 | 125.0 | 6750 | 0.0093 | 95.2691 | 5.3531 |
0.0181 | 126.0 | 6804 | 0.0098 | 95.2691 | 5.3554 |
0.0181 | 127.0 | 6858 | 0.0093 | 95.2691 | 5.3624 |
0.0181 | 128.0 | 6912 | 0.0089 | 95.2691 | 5.3693 |
0.0181 | 129.0 | 6966 | 0.0088 | 95.2691 | 5.374 |
0.0155 | 130.0 | 7020 | 0.0094 | 95.2691 | 5.36 |
0.0155 | 131.0 | 7074 | 0.0091 | 95.2691 | 5.3415 |
0.0155 | 132.0 | 7128 | 0.0088 | 95.2691 | 5.3484 |
0.0155 | 133.0 | 7182 | 0.0090 | 95.2691 | 5.3624 |
0.0155 | 134.0 | 7236 | 0.0088 | 95.2691 | 5.3554 |
0.0155 | 135.0 | 7290 | 0.0089 | 95.2691 | 5.3693 |
0.0155 | 136.0 | 7344 | 0.0090 | 95.2691 | 5.3577 |
0.0155 | 137.0 | 7398 | 0.0094 | 95.2486 | 5.3357 |
0.0155 | 138.0 | 7452 | 0.0092 | 95.2691 | 5.3368 |
0.0147 | 139.0 | 7506 | 0.0090 | 95.2691 | 5.3508 |
0.0147 | 140.0 | 7560 | 0.0089 | 95.2691 | 5.3647 |
0.0147 | 141.0 | 7614 | 0.0090 | 95.2691 | 5.3577 |
0.0147 | 142.0 | 7668 | 0.0089 | 95.2691 | 5.3531 |
0.0147 | 143.0 | 7722 | 0.0090 | 95.2691 | 5.3484 |
0.0147 | 144.0 | 7776 | 0.0096 | 94.112 | 5.3519 |
0.0147 | 145.0 | 7830 | 0.0090 | 95.2691 | 5.3624 |
0.0147 | 146.0 | 7884 | 0.0090 | 95.2691 | 5.3647 |
0.0147 | 147.0 | 7938 | 0.0090 | 95.2691 | 5.36 |
0.0147 | 148.0 | 7992 | 0.0090 | 95.2691 | 5.3647 |
0.0146 | 149.0 | 8046 | 0.0093 | 95.2691 | 5.3624 |
0.0146 | 150.0 | 8100 | 0.0090 | 95.2691 | 5.367 |
0.0146 | 151.0 | 8154 | 0.0087 | 95.2691 | 5.3531 |
0.0146 | 152.0 | 8208 | 0.0090 | 95.2691 | 5.3484 |
0.0146 | 153.0 | 8262 | 0.0088 | 95.2691 | 5.3554 |
0.0146 | 154.0 | 8316 | 0.0088 | 94.728 | 5.3612 |
0.0146 | 155.0 | 8370 | 0.0086 | 95.2691 | 5.3554 |
0.0146 | 156.0 | 8424 | 0.0085 | 95.2691 | 5.3461 |
0.0146 | 157.0 | 8478 | 0.0085 | 95.2691 | 5.3415 |
0.0125 | 158.0 | 8532 | 0.0084 | 95.2691 | 5.3484 |
0.0125 | 159.0 | 8586 | 0.0086 | 95.2691 | 5.3647 |
0.0125 | 160.0 | 8640 | 0.0088 | 95.2691 | 5.3368 |
0.0125 | 161.0 | 8694 | 0.0086 | 95.2691 | 5.3415 |
0.0125 | 162.0 | 8748 | 0.0086 | 95.2691 | 5.3508 |
0.0125 | 163.0 | 8802 | 0.0087 | 95.2691 | 5.3647 |
0.0125 | 164.0 | 8856 | 0.0086 | 95.2691 | 5.3531 |
0.0125 | 165.0 | 8910 | 0.0086 | 95.2691 | 5.3461 |
0.0125 | 166.0 | 8964 | 0.0086 | 95.2691 | 5.3508 |
0.012 | 167.0 | 9018 | 0.0087 | 95.2691 | 5.3415 |
0.012 | 168.0 | 9072 | 0.0087 | 95.2691 | 5.3577 |
0.012 | 169.0 | 9126 | 0.0087 | 95.2691 | 5.3508 |
0.012 | 170.0 | 9180 | 0.0086 | 95.2691 | 5.36 |
0.012 | 171.0 | 9234 | 0.0086 | 95.2691 | 5.3577 |
0.012 | 172.0 | 9288 | 0.0086 | 95.2691 | 5.3717 |
0.012 | 173.0 | 9342 | 0.0084 | 95.2691 | 5.3624 |
0.012 | 174.0 | 9396 | 0.0085 | 95.2691 | 5.3647 |
0.012 | 175.0 | 9450 | 0.0084 | 95.2691 | 5.3577 |
0.0116 | 176.0 | 9504 | 0.0084 | 95.2691 | 5.3554 |
0.0116 | 177.0 | 9558 | 0.0083 | 95.2691 | 5.3438 |
0.0116 | 178.0 | 9612 | 0.0084 | 95.2691 | 5.36 |
0.0116 | 179.0 | 9666 | 0.0084 | 95.2691 | 5.36 |
0.0116 | 180.0 | 9720 | 0.0085 | 95.2691 | 5.3415 |
0.0116 | 181.0 | 9774 | 0.0084 | 95.2691 | 5.3484 |
0.0116 | 182.0 | 9828 | 0.0084 | 95.2691 | 5.3484 |
0.0116 | 183.0 | 9882 | 0.0084 | 95.2691 | 5.3461 |
0.0116 | 184.0 | 9936 | 0.0084 | 95.2691 | 5.3508 |
0.0116 | 185.0 | 9990 | 0.0083 | 95.2691 | 5.3438 |
0.0103 | 186.0 | 10044 | 0.0082 | 95.2691 | 5.3438 |
0.0103 | 187.0 | 10098 | 0.0083 | 95.2691 | 5.3484 |
0.0103 | 188.0 | 10152 | 0.0083 | 95.2691 | 5.3368 |
0.0103 | 189.0 | 10206 | 0.0083 | 95.2691 | 5.3415 |
0.0103 | 190.0 | 10260 | 0.0083 | 95.2691 | 5.3298 |
0.0103 | 191.0 | 10314 | 0.0083 | 95.2691 | 5.3275 |
0.0103 | 192.0 | 10368 | 0.0083 | 95.2691 | 5.3275 |
0.0103 | 193.0 | 10422 | 0.0083 | 95.2691 | 5.3252 |
0.0103 | 194.0 | 10476 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 195.0 | 10530 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 196.0 | 10584 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 197.0 | 10638 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 198.0 | 10692 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 199.0 | 10746 | 0.0083 | 95.2691 | 5.3275 |
0.0105 | 200.0 | 10800 | 0.0083 | 95.2691 | 5.3275 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0