byt5-small-finetuned-yiddish-experiment-8

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3482
Cer: 0.1504
Wer: 0.4654

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 600
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
10.7996	0.4728	100	10.9325	0.2905	0.7232
7.586	0.9456	200	10.5771	0.2698	0.6850
8.641	1.4161	300	10.0041	0.2570	0.6571
8.2901	1.8889	400	9.1435	0.2478	0.6396
8.076	2.3593	500	8.1677	0.2394	0.6277
7.8061	2.8322	600	7.0784	0.2317	0.6142
5.6823	3.3026	700	6.0599	0.2232	0.6094
5.3586	3.7754	800	5.1075	0.2181	0.6038
4.9348	4.2459	900	4.2898	0.2155	0.6038
3.9539	4.7187	1000	3.6152	0.2119	0.5967
3.5873	5.1891	1100	2.9509	0.2096	0.5935
2.9099	5.6619	1200	2.4046	0.2062	0.5903
2.3472	6.1324	1300	1.9122	0.2044	0.5911
1.9884	6.6052	1400	1.4625	0.2007	0.5792
1.7857	7.0757	1500	1.2051	0.1973	0.5744
1.4299	7.5485	1600	1.1644	0.1950	0.5712
1.2853	8.0189	1700	1.1406	0.1928	0.5696
1.1917	8.4917	1800	1.0735	0.1910	0.5680
1.0714	8.9645	1900	0.9061	0.1910	0.5680
0.8871	9.4350	2000	0.7903	0.1684	0.4996
0.8589	9.9078	2100	0.7640	0.1667	0.4964
0.8172	10.3783	2200	0.7431	0.1646	0.4940
0.7284	10.8511	2300	0.7017	0.1622	0.4893
0.7358	11.3215	2400	0.6680	0.1613	0.4869
0.6926	11.7943	2500	0.6318	0.1595	0.4813
0.6425	12.2648	2600	0.5897	0.1601	0.4837
0.6201	12.7376	2700	0.5611	0.1585	0.4797
0.5984	13.2080	2800	0.5155	0.1585	0.4837
0.5619	13.6809	2900	0.4781	0.1575	0.4797
0.5316	14.1513	3000	0.4500	0.1562	0.4773
0.5086	14.6241	3100	0.4255	0.1558	0.4757
0.4776	15.0946	3200	0.4101	0.1551	0.4757
0.4841	15.5674	3300	0.4005	0.1558	0.4765
0.4533	16.0378	3400	0.3891	0.1544	0.4741
0.4599	16.5106	3500	0.3794	0.1542	0.4749
0.435	16.9835	3600	0.3801	0.1538	0.4718
0.4272	17.4539	3700	0.3748	0.1541	0.4718
0.4327	17.9267	3800	0.3685	0.1536	0.4718
0.418	18.3972	3900	0.3682	0.1542	0.4741
0.4082	18.8700	4000	0.3671	0.1541	0.4718
0.406	19.3404	4100	0.3625	0.1530	0.4694
0.4079	19.8132	4200	0.3605	0.1522	0.4686
0.3961	20.2837	4300	0.3592	0.1517	0.4678
0.3913	20.7565	4400	0.3575	0.1516	0.4678
0.391	21.2270	4500	0.3566	0.1514	0.4686
0.3865	21.6998	4600	0.3564	0.1507	0.4662
0.3884	22.1702	4700	0.3541	0.1510	0.4654
0.3855	22.6430	4800	0.3533	0.1508	0.4654
0.3794	23.1135	4900	0.3511	0.1508	0.4662
0.3926	23.5863	5000	0.3497	0.1507	0.4662
0.3802	24.0567	5100	0.3497	0.1508	0.4654
0.3798	24.5296	5200	0.3490	0.1508	0.4662
0.3722	25.0	5300	0.3489	0.1510	0.4654
0.3824	25.4728	5400	0.3484	0.1505	0.4654
0.3729	25.9456	5500	0.3482	0.1504	0.4654
0.3635	26.4161	5600	0.3486	0.1505	0.4654
0.3834	26.8889	5700	0.3475	0.1505	0.4654
0.3692	27.3593	5800	0.3470	0.1505	0.4654
0.3722	27.8322	5900	0.3466	0.1504	0.4654
0.3657	28.3026	6000	0.3461	0.1505	0.4654
0.3729	28.7754	6100	0.3466	0.1505	0.4646
0.3632	29.2459	6200	0.3464	0.1505	0.4646
0.372	29.7187	6300	0.3464	0.1504	0.4646

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 2.14.4
Tokenizers 0.21.0

Addaci
/

byt5-small-finetuned-yiddish-experiment-8

byt5-small-finetuned-yiddish-experiment-8

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Addaci/byt5-small-finetuned-yiddish-experiment-8

Evaluation results