pere's picture
Saving weights and logs of step 10000 - epoch 0
28543d8
|
raw
history blame
2.66 kB
metadata
language:
  - 'no'
license: apache-2.0
tags:
  - audio
  - asr
  - automatic-speech-recognition
  - hf-asr-leaderboard
model-index:
  - name: scream_sextusdecimus_virtual_tsfix_medium_1e5
    results: []

scream_sextusdecimus_virtual_tsfix_medium_1e5

This model is a fine-tuned version of openai/whisper-medium on the NbAiLab/ncc_speech dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • lr_scheduler_type: linear
  • per_device_train_batch_size: 16
  • total_train_batch_size_per_node: 64
  • total_train_batch_size: 512
  • total_optimization_steps: 20,000
  • starting_optimization_step: None
  • finishing_optimization_step: 20,000
  • num_train_dataset_workers: 32
  • num_hosts: 8
  • total_num_training_examples: 10,240,000
  • steps_per_epoch: To be computed after first epoch
  • num_beams: None
  • dropout: True
  • bpe_dropout_probability: 0.1
  • activation_dropout_probability: 0.1

Training results

step eval_loss train_loss eval_wer eval_cer eval_exact_wer eval_exact_cer
0 5.5890 2.8362 17.4598 5.3906 17.4598 5.3906
1000 5.2798 1.0896 12.4926 3.8321 12.4926 3.8321
2000 5.2432 0.9018 11.0351 3.9899 11.0351 3.9899
3000 4.1719 0.8159 9.8453 3.8173 9.8453 3.8173
4000 3.0758 0.7799 9.6371 3.8716 9.6371 3.8716
5000 2.2223 0.7803 9.7264 3.9110 9.7264 3.9110
6000 2.0574 0.7206 9.5181 3.8864 9.5181 3.8864
7000 1.7271 0.7088 8.7745 3.7039 8.7745 3.7039
8000 1.5868 0.7528 8.2391 3.5362 8.2391 3.5362
9000 1.5781 0.6747 8.2094 3.5313 8.2094 3.5313
10000 1.6658 0.6830 8.1499 3.4277 8.1499 3.4277

Framework versions

  • Transformers 4.30.0.dev0
  • Datasets 2.12.1.dev0
  • Tokenizers 0.13.3