metadata

license: apache-2.0
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
tags:
  - generated_from_trainer
datasets:
  - voxpopuli
metrics:
  - wer
model-index:
  - name: cs2fi_wav2vec2-large-xls-r-300m-czech-colab
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: voxpopuli
          type: voxpopuli
          config: fi
          split: test
          args: fi
        metrics:
          - name: Wer
            type: wer
            value: 1.0754716981132075

cs2fi_wav2vec2-large-xls-r-300m-czech-colab

This model is a fine-tuned version of facebook/wav2vec2-lv-60-espeak-cv-ft on the voxpopuli dataset. It achieves the following results on the evaluation set:

Loss: 485.7458
Wer: 1.0755

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3007.5297	3.51	100	523.8923	0.9706
353.1859	7.02	200	270.8087	0.9665
207.1084	10.53	300	215.3542	0.9350
186.3063	14.04	400	210.1422	0.9119
171.7259	17.54	500	291.5182	1.0629
142.6091	21.05	600	219.2806	0.9602
118.6791	24.56	700	312.2755	1.1132
96.153	28.07	800	320.7119	1.0545
82.968	31.58	900	357.5117	1.0629
71.2426	35.09	1000	421.3889	0.9916
58.8083	38.6	1100	433.8375	1.1048
54.5225	42.11	1200	482.5988	1.0566
48.12	45.61	1300	479.3787	1.0860
43.3324	49.12	1400	485.7458	1.0755

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0