metadata

license: apache-2.0
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
tags:
  - generated_from_trainer
datasets:
  - voxpopuli
metrics:
  - wer
model-index:
  - name: cs2fi_wav2vec2-large-xls-r-300m-czech-colab
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: voxpopuli
          type: voxpopuli
          config: fi
          split: test
          args: fi
        metrics:
          - name: Wer
            type: wer
            value: 1.0859538784067087

cs2fi_wav2vec2-large-xls-r-300m-czech-colab

This model is a fine-tuned version of facebook/wav2vec2-lv-60-espeak-cv-ft on the voxpopuli dataset. It achieves the following results on the evaluation set:

Loss: 507.5248
Wer: 1.0860

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3042.8738	3.51	100	422.1938	0.9518
362.1554	7.02	200	231.7486	1.0
208.092	10.53	300	196.4194	0.9958
189.1354	14.04	400	211.6223	0.9350
163.6355	17.54	500	235.3201	0.9182
140.7959	21.05	600	256.4028	0.9539
115.5506	24.56	700	311.4562	1.0147
93.6629	28.07	800	304.0882	1.2243
78.9694	31.58	900	354.5415	1.1279
67.4151	35.09	1000	423.6178	1.0860
55.1471	38.6	1100	468.3192	1.0922
55.8001	42.11	1200	408.8039	1.0839
46.9208	45.61	1300	524.1367	1.0650
43.7264	49.12	1400	507.5248	1.0860

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0