vitouphy's picture
Update README.md
8c7fe04
|
raw
history blame
3.11 kB
metadata
language:
  - km
license: apache-2.0
tags:
  - automatic-speech-recognition
  - openslr
  - robust-speech-event
  - km
  - generated_from_trainer
model-index:
  - name: wav2vec2-xls-r-1b-km
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: OpenSLR km
          type: openslr
          args: km
        metrics:
          - name: Test WER
            type: wer
            value: 44.9
          - name: Test CER
            type: cer
            value: 12.2
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Robust Speech Event - Dev Data
          type: speech-recognition-community-v2/dev_data
          args: km
        metrics:
          - name: Test WER
            type: wer
            value: 44.9
          - name: Test CER
            type: cer
            value: 12.2

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the openslr dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4239
  • Wer: 0.4221

Evaluation results on OpenSLR "test" (self-split 10%) (Running ./eval.py):

  • WER: 0.4490281634272114
  • CER: 0.12198285179047481

Note

  • Since this dataset is small (4 hours of voice recording), we decided not to train that for too long to avoid overfitting and under-generalization.
  • This model performs worse than its 300M-variant. Probably, we don't explore the hyper-parameter enough?

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2000
  • num_epochs: 75
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
3.5671 5.47 400 12.0218 1.0
3.5159 10.95 800 10.6337 1.0
2.4543 16.43 1200 1.8256 0.9839
1.9437 21.91 1600 1.1237 0.9173
1.696 27.39 2000 0.8246 0.7700
1.5342 32.87 2400 0.6433 0.6594
1.4509 38.35 2800 0.5500 0.5787
1.3478 43.83 3200 0.5070 0.4907
1.3096 49.31 3600 0.4692 0.4726
1.2532 54.79 4000 0.4448 0.4479
1.2291 60.27 4400 0.4374 0.4366
1.196 65.75 4800 0.4314 0.4310
1.1862 71.23 5200 0.4239 0.4221

Framework versions

  • Transformers 4.17.0.dev0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.2.dev0
  • Tokenizers 0.11.0