byt5-small-finetuned-yiddish-experiment-8

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3482
  • Cer: 0.1504
  • Wer: 0.4654

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 600
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Cer Wer
10.7996 0.4728 100 10.9325 0.2905 0.7232
7.586 0.9456 200 10.5771 0.2698 0.6850
8.641 1.4161 300 10.0041 0.2570 0.6571
8.2901 1.8889 400 9.1435 0.2478 0.6396
8.076 2.3593 500 8.1677 0.2394 0.6277
7.8061 2.8322 600 7.0784 0.2317 0.6142
5.6823 3.3026 700 6.0599 0.2232 0.6094
5.3586 3.7754 800 5.1075 0.2181 0.6038
4.9348 4.2459 900 4.2898 0.2155 0.6038
3.9539 4.7187 1000 3.6152 0.2119 0.5967
3.5873 5.1891 1100 2.9509 0.2096 0.5935
2.9099 5.6619 1200 2.4046 0.2062 0.5903
2.3472 6.1324 1300 1.9122 0.2044 0.5911
1.9884 6.6052 1400 1.4625 0.2007 0.5792
1.7857 7.0757 1500 1.2051 0.1973 0.5744
1.4299 7.5485 1600 1.1644 0.1950 0.5712
1.2853 8.0189 1700 1.1406 0.1928 0.5696
1.1917 8.4917 1800 1.0735 0.1910 0.5680
1.0714 8.9645 1900 0.9061 0.1910 0.5680
0.8871 9.4350 2000 0.7903 0.1684 0.4996
0.8589 9.9078 2100 0.7640 0.1667 0.4964
0.8172 10.3783 2200 0.7431 0.1646 0.4940
0.7284 10.8511 2300 0.7017 0.1622 0.4893
0.7358 11.3215 2400 0.6680 0.1613 0.4869
0.6926 11.7943 2500 0.6318 0.1595 0.4813
0.6425 12.2648 2600 0.5897 0.1601 0.4837
0.6201 12.7376 2700 0.5611 0.1585 0.4797
0.5984 13.2080 2800 0.5155 0.1585 0.4837
0.5619 13.6809 2900 0.4781 0.1575 0.4797
0.5316 14.1513 3000 0.4500 0.1562 0.4773
0.5086 14.6241 3100 0.4255 0.1558 0.4757
0.4776 15.0946 3200 0.4101 0.1551 0.4757
0.4841 15.5674 3300 0.4005 0.1558 0.4765
0.4533 16.0378 3400 0.3891 0.1544 0.4741
0.4599 16.5106 3500 0.3794 0.1542 0.4749
0.435 16.9835 3600 0.3801 0.1538 0.4718
0.4272 17.4539 3700 0.3748 0.1541 0.4718
0.4327 17.9267 3800 0.3685 0.1536 0.4718
0.418 18.3972 3900 0.3682 0.1542 0.4741
0.4082 18.8700 4000 0.3671 0.1541 0.4718
0.406 19.3404 4100 0.3625 0.1530 0.4694
0.4079 19.8132 4200 0.3605 0.1522 0.4686
0.3961 20.2837 4300 0.3592 0.1517 0.4678
0.3913 20.7565 4400 0.3575 0.1516 0.4678
0.391 21.2270 4500 0.3566 0.1514 0.4686
0.3865 21.6998 4600 0.3564 0.1507 0.4662
0.3884 22.1702 4700 0.3541 0.1510 0.4654
0.3855 22.6430 4800 0.3533 0.1508 0.4654
0.3794 23.1135 4900 0.3511 0.1508 0.4662
0.3926 23.5863 5000 0.3497 0.1507 0.4662
0.3802 24.0567 5100 0.3497 0.1508 0.4654
0.3798 24.5296 5200 0.3490 0.1508 0.4662
0.3722 25.0 5300 0.3489 0.1510 0.4654
0.3824 25.4728 5400 0.3484 0.1505 0.4654
0.3729 25.9456 5500 0.3482 0.1504 0.4654
0.3635 26.4161 5600 0.3486 0.1505 0.4654
0.3834 26.8889 5700 0.3475 0.1505 0.4654
0.3692 27.3593 5800 0.3470 0.1505 0.4654
0.3722 27.8322 5900 0.3466 0.1504 0.4654
0.3657 28.3026 6000 0.3461 0.1505 0.4654
0.3729 28.7754 6100 0.3466 0.1505 0.4646
0.3632 29.2459 6200 0.3464 0.1505 0.4646
0.372 29.7187 6300 0.3464 0.1504 0.4646

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu121
  • Datasets 2.14.4
  • Tokenizers 0.21.0
Downloads last month
7
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Addaci/byt5-small-finetuned-yiddish-experiment-8

Base model

google/byt5-small
Finetuned
(20)
this model