metadata
license: apache-2.0
base_model: t5-small
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: tl-war-model
results: []
tl-war-model
This model is a fine-tuned version of t5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0084
- Bleu: 94.7937
- Gen Len: 5.5401
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 200
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 54 | 2.8430 | 1.2305 | 5.6469 |
No log | 2.0 | 108 | 2.4489 | 2.2133 | 5.9431 |
No log | 3.0 | 162 | 1.9890 | 2.4041 | 6.4425 |
No log | 4.0 | 216 | 1.6632 | 5.3183 | 6.2288 |
No log | 5.0 | 270 | 1.2998 | 11.2337 | 5.8688 |
No log | 6.0 | 324 | 0.9992 | 22.9227 | 5.9826 |
No log | 7.0 | 378 | 0.7938 | 40.8707 | 6.0523 |
No log | 8.0 | 432 | 0.6332 | 41.6658 | 5.8455 |
No log | 9.0 | 486 | 0.4849 | 57.7063 | 5.741 |
2.0554 | 10.0 | 540 | 0.3398 | 66.5916 | 5.7073 |
2.0554 | 11.0 | 594 | 0.2589 | 75.1398 | 5.5552 |
2.0554 | 12.0 | 648 | 0.1862 | 80.095 | 5.4901 |
2.0554 | 13.0 | 702 | 0.1188 | 82.7321 | 5.5656 |
2.0554 | 14.0 | 756 | 0.0992 | 84.2356 | 5.511 |
2.0554 | 15.0 | 810 | 0.0643 | 91.2032 | 5.5215 |
2.0554 | 16.0 | 864 | 0.0608 | 90.156 | 5.5621 |
2.0554 | 17.0 | 918 | 0.0461 | 87.3511 | 5.5726 |
2.0554 | 18.0 | 972 | 0.0555 | 88.5079 | 5.5621 |
0.4753 | 19.0 | 1026 | 0.0354 | 91.2536 | 5.5145 |
0.4753 | 20.0 | 1080 | 0.0423 | 92.0329 | 5.5505 |
0.4753 | 21.0 | 1134 | 0.0367 | 89.7566 | 5.5401 |
0.4753 | 22.0 | 1188 | 0.0319 | 92.3251 | 5.5424 |
0.4753 | 23.0 | 1242 | 0.0383 | 83.639 | 5.5842 |
0.4753 | 24.0 | 1296 | 0.0351 | 89.9239 | 5.5331 |
0.4753 | 25.0 | 1350 | 0.0397 | 90.785 | 5.5319 |
0.4753 | 26.0 | 1404 | 0.0269 | 89.6977 | 5.5273 |
0.4753 | 27.0 | 1458 | 0.0371 | 94.2434 | 5.5424 |
0.1679 | 28.0 | 1512 | 0.0281 | 93.1799 | 5.5389 |
0.1679 | 29.0 | 1566 | 0.0265 | 92.9805 | 5.5459 |
0.1679 | 30.0 | 1620 | 0.0240 | 93.4285 | 5.5401 |
0.1679 | 31.0 | 1674 | 0.0187 | 93.4675 | 5.5552 |
0.1679 | 32.0 | 1728 | 0.0228 | 91.1032 | 5.5389 |
0.1679 | 33.0 | 1782 | 0.0196 | 93.164 | 5.5528 |
0.1679 | 34.0 | 1836 | 0.0244 | 92.8435 | 5.5157 |
0.1679 | 35.0 | 1890 | 0.0224 | 93.3636 | 5.5447 |
0.1679 | 36.0 | 1944 | 0.0248 | 93.0376 | 5.5343 |
0.1679 | 37.0 | 1998 | 0.0205 | 94.3196 | 5.5354 |
0.096 | 38.0 | 2052 | 0.0211 | 93.2583 | 5.5343 |
0.096 | 39.0 | 2106 | 0.0200 | 91.9568 | 5.5343 |
0.096 | 40.0 | 2160 | 0.0201 | 91.1973 | 5.5587 |
0.096 | 41.0 | 2214 | 0.0227 | 94.0951 | 5.5424 |
0.096 | 42.0 | 2268 | 0.0202 | 94.1776 | 5.5482 |
0.096 | 43.0 | 2322 | 0.0198 | 93.2822 | 5.5273 |
0.096 | 44.0 | 2376 | 0.0187 | 93.1389 | 5.5412 |
0.096 | 45.0 | 2430 | 0.0203 | 93.566 | 5.5285 |
0.096 | 46.0 | 2484 | 0.0272 | 94.3114 | 5.583 |
0.0649 | 47.0 | 2538 | 0.0177 | 91.3008 | 5.518 |
0.0649 | 48.0 | 2592 | 0.0189 | 91.7827 | 5.5285 |
0.0649 | 49.0 | 2646 | 0.0222 | 94.3196 | 5.5517 |
0.0649 | 50.0 | 2700 | 0.0145 | 94.1234 | 5.5273 |
0.0649 | 51.0 | 2754 | 0.0150 | 93.531 | 5.5494 |
0.0649 | 52.0 | 2808 | 0.0178 | 92.7418 | 5.5273 |
0.0649 | 53.0 | 2862 | 0.0186 | 94.4449 | 5.5308 |
0.0649 | 54.0 | 2916 | 0.0170 | 93.4147 | 5.5343 |
0.0649 | 55.0 | 2970 | 0.0147 | 93.0869 | 5.5203 |
0.054 | 56.0 | 3024 | 0.0142 | 94.5277 | 5.5494 |
0.054 | 57.0 | 3078 | 0.0116 | 94.773 | 5.5528 |
0.054 | 58.0 | 3132 | 0.0145 | 94.5484 | 5.5343 |
0.054 | 59.0 | 3186 | 0.0180 | 94.7317 | 5.5343 |
0.054 | 60.0 | 3240 | 0.0149 | 93.3068 | 5.5296 |
0.054 | 61.0 | 3294 | 0.0133 | 94.7317 | 5.5377 |
0.054 | 62.0 | 3348 | 0.0130 | 94.7524 | 5.5308 |
0.054 | 63.0 | 3402 | 0.0161 | 94.7524 | 5.5343 |
0.054 | 64.0 | 3456 | 0.0143 | 94.3074 | 5.518 |
0.0432 | 65.0 | 3510 | 0.0162 | 94.5484 | 5.5319 |
0.0432 | 66.0 | 3564 | 0.0121 | 94.773 | 5.5296 |
0.0432 | 67.0 | 3618 | 0.0128 | 94.773 | 5.5377 |
0.0432 | 68.0 | 3672 | 0.0111 | 94.773 | 5.5436 |
0.0432 | 69.0 | 3726 | 0.0225 | 93.3009 | 5.5528 |
0.0432 | 70.0 | 3780 | 0.0131 | 93.7534 | 5.5377 |
0.0432 | 71.0 | 3834 | 0.0126 | 94.3251 | 5.547 |
0.0432 | 72.0 | 3888 | 0.0113 | 94.5484 | 5.5226 |
0.0432 | 73.0 | 3942 | 0.0116 | 94.569 | 5.547 |
0.0432 | 74.0 | 3996 | 0.0122 | 94.773 | 5.5459 |
0.0318 | 75.0 | 4050 | 0.0108 | 94.773 | 5.547 |
0.0318 | 76.0 | 4104 | 0.0106 | 94.7937 | 5.5424 |
0.0318 | 77.0 | 4158 | 0.0143 | 94.6754 | 5.5261 |
0.0318 | 78.0 | 4212 | 0.0118 | 94.5484 | 5.5319 |
0.0318 | 79.0 | 4266 | 0.0124 | 94.7317 | 5.5366 |
0.0318 | 80.0 | 4320 | 0.0150 | 94.773 | 5.5436 |
0.0318 | 81.0 | 4374 | 0.0111 | 94.5095 | 5.5656 |
0.0318 | 82.0 | 4428 | 0.0179 | 94.5277 | 5.5482 |
0.0318 | 83.0 | 4482 | 0.0126 | 94.7524 | 5.5412 |
0.0285 | 84.0 | 4536 | 0.0122 | 94.5277 | 5.5366 |
0.0285 | 85.0 | 4590 | 0.0160 | 94.7524 | 5.5494 |
0.0285 | 86.0 | 4644 | 0.0127 | 93.455 | 5.5366 |
0.0285 | 87.0 | 4698 | 0.0100 | 94.7937 | 5.5377 |
0.0285 | 88.0 | 4752 | 0.0123 | 94.7524 | 5.5447 |
0.0285 | 89.0 | 4806 | 0.0108 | 94.773 | 5.5528 |
0.0285 | 90.0 | 4860 | 0.0111 | 94.773 | 5.5412 |
0.0285 | 91.0 | 4914 | 0.0102 | 94.7937 | 5.5354 |
0.0285 | 92.0 | 4968 | 0.0103 | 94.773 | 5.5494 |
0.0246 | 93.0 | 5022 | 0.0101 | 94.773 | 5.5296 |
0.0246 | 94.0 | 5076 | 0.0119 | 94.773 | 5.5331 |
0.0246 | 95.0 | 5130 | 0.0100 | 94.3503 | 5.5401 |
0.0246 | 96.0 | 5184 | 0.0110 | 94.773 | 5.5412 |
0.0246 | 97.0 | 5238 | 0.0097 | 94.7937 | 5.5192 |
0.0246 | 98.0 | 5292 | 0.0109 | 94.2228 | 5.5366 |
0.0246 | 99.0 | 5346 | 0.0106 | 94.7937 | 5.5447 |
0.0246 | 100.0 | 5400 | 0.0100 | 94.7937 | 5.5424 |
0.0246 | 101.0 | 5454 | 0.0097 | 94.7937 | 5.5447 |
0.0235 | 102.0 | 5508 | 0.0100 | 94.3327 | 5.5482 |
0.0235 | 103.0 | 5562 | 0.0103 | 94.773 | 5.5494 |
0.0235 | 104.0 | 5616 | 0.0094 | 94.3327 | 5.5587 |
0.0235 | 105.0 | 5670 | 0.0096 | 94.7937 | 5.547 |
0.0235 | 106.0 | 5724 | 0.0111 | 94.773 | 5.5494 |
0.0235 | 107.0 | 5778 | 0.0112 | 94.773 | 5.5447 |
0.0235 | 108.0 | 5832 | 0.0095 | 94.7937 | 5.5494 |
0.0235 | 109.0 | 5886 | 0.0100 | 94.7937 | 5.5308 |
0.0235 | 110.0 | 5940 | 0.0099 | 94.7937 | 5.5494 |
0.0235 | 111.0 | 5994 | 0.0120 | 94.7524 | 5.5377 |
0.0194 | 112.0 | 6048 | 0.0112 | 94.773 | 5.5563 |
0.0194 | 113.0 | 6102 | 0.0106 | 94.0307 | 5.5331 |
0.0194 | 114.0 | 6156 | 0.0093 | 94.7937 | 5.5424 |
0.0194 | 115.0 | 6210 | 0.0108 | 94.773 | 5.5377 |
0.0194 | 116.0 | 6264 | 0.0129 | 94.773 | 5.5273 |
0.0194 | 117.0 | 6318 | 0.0152 | 94.7524 | 5.5389 |
0.0194 | 118.0 | 6372 | 0.0120 | 94.7524 | 5.5482 |
0.0194 | 119.0 | 6426 | 0.0111 | 94.773 | 5.5459 |
0.0194 | 120.0 | 6480 | 0.0102 | 94.7937 | 5.5401 |
0.0188 | 121.0 | 6534 | 0.0096 | 94.7937 | 5.5285 |
0.0188 | 122.0 | 6588 | 0.0093 | 94.7937 | 5.5401 |
0.0188 | 123.0 | 6642 | 0.0096 | 94.7937 | 5.5447 |
0.0188 | 124.0 | 6696 | 0.0097 | 94.7937 | 5.5377 |
0.0188 | 125.0 | 6750 | 0.0094 | 94.7937 | 5.5354 |
0.0188 | 126.0 | 6804 | 0.0092 | 94.7937 | 5.554 |
0.0188 | 127.0 | 6858 | 0.0104 | 94.5183 | 5.5401 |
0.0188 | 128.0 | 6912 | 0.0107 | 93.7969 | 5.5261 |
0.0188 | 129.0 | 6966 | 0.0089 | 94.7937 | 5.5192 |
0.0165 | 130.0 | 7020 | 0.0093 | 94.7937 | 5.5308 |
0.0165 | 131.0 | 7074 | 0.0096 | 94.7937 | 5.5261 |
0.0165 | 132.0 | 7128 | 0.0091 | 94.7937 | 5.5447 |
0.0165 | 133.0 | 7182 | 0.0096 | 94.7937 | 5.5377 |
0.0165 | 134.0 | 7236 | 0.0091 | 94.7937 | 5.5377 |
0.0165 | 135.0 | 7290 | 0.0104 | 94.569 | 5.5354 |
0.0165 | 136.0 | 7344 | 0.0090 | 94.7937 | 5.5285 |
0.0165 | 137.0 | 7398 | 0.0092 | 94.7937 | 5.5261 |
0.0165 | 138.0 | 7452 | 0.0090 | 94.7937 | 5.5168 |
0.0151 | 139.0 | 7506 | 0.0093 | 94.7937 | 5.5215 |
0.0151 | 140.0 | 7560 | 0.0089 | 94.7937 | 5.5215 |
0.0151 | 141.0 | 7614 | 0.0092 | 94.7937 | 5.5401 |
0.0151 | 142.0 | 7668 | 0.0089 | 94.7937 | 5.5215 |
0.0151 | 143.0 | 7722 | 0.0091 | 94.7937 | 5.5377 |
0.0151 | 144.0 | 7776 | 0.0089 | 94.7937 | 5.5377 |
0.0151 | 145.0 | 7830 | 0.0097 | 94.7937 | 5.5308 |
0.0151 | 146.0 | 7884 | 0.0091 | 94.7937 | 5.5308 |
0.0151 | 147.0 | 7938 | 0.0087 | 94.7937 | 5.5331 |
0.0151 | 148.0 | 7992 | 0.0089 | 94.7937 | 5.5285 |
0.0132 | 149.0 | 8046 | 0.0088 | 94.7937 | 5.5401 |
0.0132 | 150.0 | 8100 | 0.0090 | 94.7937 | 5.5354 |
0.0132 | 151.0 | 8154 | 0.0086 | 94.7937 | 5.5331 |
0.0132 | 152.0 | 8208 | 0.0087 | 94.7937 | 5.5285 |
0.0132 | 153.0 | 8262 | 0.0089 | 94.7937 | 5.5285 |
0.0132 | 154.0 | 8316 | 0.0088 | 94.7937 | 5.5261 |
0.0132 | 155.0 | 8370 | 0.0089 | 94.7937 | 5.5401 |
0.0132 | 156.0 | 8424 | 0.0086 | 94.7937 | 5.5331 |
0.0132 | 157.0 | 8478 | 0.0088 | 94.7937 | 5.554 |
0.0121 | 158.0 | 8532 | 0.0088 | 94.7937 | 5.5401 |
0.0121 | 159.0 | 8586 | 0.0089 | 94.7937 | 5.5401 |
0.0121 | 160.0 | 8640 | 0.0092 | 94.7937 | 5.5261 |
0.0121 | 161.0 | 8694 | 0.0089 | 94.7937 | 5.5354 |
0.0121 | 162.0 | 8748 | 0.0089 | 94.7937 | 5.5238 |
0.0121 | 163.0 | 8802 | 0.0088 | 94.7937 | 5.5261 |
0.0121 | 164.0 | 8856 | 0.0087 | 94.7937 | 5.5331 |
0.0121 | 165.0 | 8910 | 0.0087 | 94.7937 | 5.5285 |
0.0121 | 166.0 | 8964 | 0.0090 | 94.7937 | 5.5261 |
0.0117 | 167.0 | 9018 | 0.0088 | 94.7937 | 5.5308 |
0.0117 | 168.0 | 9072 | 0.0085 | 94.7937 | 5.5377 |
0.0117 | 169.0 | 9126 | 0.0086 | 94.7937 | 5.5354 |
0.0117 | 170.0 | 9180 | 0.0086 | 94.7937 | 5.5192 |
0.0117 | 171.0 | 9234 | 0.0087 | 94.7937 | 5.5424 |
0.0117 | 172.0 | 9288 | 0.0090 | 94.4227 | 5.5354 |
0.0117 | 173.0 | 9342 | 0.0089 | 94.7937 | 5.5285 |
0.0117 | 174.0 | 9396 | 0.0087 | 94.7937 | 5.5261 |
0.0117 | 175.0 | 9450 | 0.0087 | 94.7937 | 5.5377 |
0.0107 | 176.0 | 9504 | 0.0087 | 94.7937 | 5.5261 |
0.0107 | 177.0 | 9558 | 0.0086 | 94.7937 | 5.5261 |
0.0107 | 178.0 | 9612 | 0.0088 | 94.7937 | 5.5377 |
0.0107 | 179.0 | 9666 | 0.0085 | 94.7937 | 5.5215 |
0.0107 | 180.0 | 9720 | 0.0085 | 94.7937 | 5.5377 |
0.0107 | 181.0 | 9774 | 0.0085 | 94.7937 | 5.5308 |
0.0107 | 182.0 | 9828 | 0.0085 | 94.7937 | 5.5285 |
0.0107 | 183.0 | 9882 | 0.0085 | 94.7937 | 5.5308 |
0.0107 | 184.0 | 9936 | 0.0085 | 94.7937 | 5.5261 |
0.0107 | 185.0 | 9990 | 0.0084 | 94.7937 | 5.5331 |
0.0106 | 186.0 | 10044 | 0.0084 | 94.7937 | 5.5354 |
0.0106 | 187.0 | 10098 | 0.0084 | 94.7937 | 5.5447 |
0.0106 | 188.0 | 10152 | 0.0085 | 94.7937 | 5.5354 |
0.0106 | 189.0 | 10206 | 0.0084 | 94.7937 | 5.5377 |
0.0106 | 190.0 | 10260 | 0.0084 | 94.7937 | 5.5354 |
0.0106 | 191.0 | 10314 | 0.0085 | 94.7937 | 5.5377 |
0.0106 | 192.0 | 10368 | 0.0084 | 94.7937 | 5.5377 |
0.0106 | 193.0 | 10422 | 0.0084 | 94.7937 | 5.5401 |
0.0106 | 194.0 | 10476 | 0.0085 | 94.7937 | 5.5401 |
0.0091 | 195.0 | 10530 | 0.0084 | 94.7937 | 5.5331 |
0.0091 | 196.0 | 10584 | 0.0084 | 94.7937 | 5.5401 |
0.0091 | 197.0 | 10638 | 0.0084 | 94.7937 | 5.5401 |
0.0091 | 198.0 | 10692 | 0.0084 | 94.7937 | 5.5401 |
0.0091 | 199.0 | 10746 | 0.0084 | 94.7937 | 5.5401 |
0.0091 | 200.0 | 10800 | 0.0084 | 94.7937 | 5.5401 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0