Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0552
  • Bleu: 33.24
  • Chrf: 55.16
  • Wer: 61.5038

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.5219 0.0138 100 2.1106 0.44 10.48 107.2490
2.4608 0.0276 200 2.1816 3.3 20.43 179.1535
2.3008 0.0414 300 2.0587 3.66 21.59 206.4836
2.2095 0.0552 400 1.9459 8.79 27.66 100.3602
2.0454 0.0690 500 1.8681 8.14 27.36 122.1522
1.9937 0.0828 600 1.8717 11.05 30.26 97.2535
1.868 0.0966 700 1.7917 9.14 29.03 129.0410
1.9924 0.1103 800 1.7170 12.62 33.2 89.6443
1.8646 0.1241 900 1.7252 11.98 30.77 97.8838
1.7644 0.1379 1000 1.6832 10.87 31.0 109.1851
1.692 0.1517 1100 1.6837 13.05 34.46 93.3814
1.7044 0.1655 1200 1.5527 20.95 37.42 75.2364
1.6824 0.1793 1300 1.5611 14.91 35.56 92.6159
1.6557 0.1931 1400 1.5554 14.0 36.54 99.8199
1.5456 0.2069 1500 1.5058 19.72 39.81 83.5660
1.3755 0.2207 1600 1.5039 18.04 37.95 82.9806
1.3959 0.2345 1700 1.4374 17.01 39.5 85.2319
1.5012 0.2483 1800 1.4242 14.93 39.24 114.4079
1.4278 0.2621 1900 1.3904 23.85 42.69 73.0302
1.3285 0.2759 2000 1.4493 17.7 37.23 83.8811
1.2655 0.2897 2100 1.3661 20.1 40.32 79.7839
1.2074 0.3034 2200 1.3387 24.45 43.79 72.9851
1.1893 0.3172 2300 1.3308 21.45 42.61 82.3953
1.1236 0.3310 2400 1.3050 22.77 44.17 77.3075
1.0934 0.3448 2500 1.2793 25.54 46.32 72.2647
1.06 0.3586 2600 1.2396 28.27 47.32 65.6911
1.0327 0.3724 2700 1.2577 28.45 47.01 67.3570
1.1623 0.3862 2800 1.2194 24.54 47.43 73.6155
1.0215 0.4 2900 1.2039 27.4 49.6 69.2481
0.9185 0.4138 3000 1.1724 27.04 49.24 67.8973
0.9003 0.4276 3100 1.1674 31.08 50.11 63.8001
0.9839 0.4414 3200 1.1580 30.24 50.63 64.5655
0.9396 0.4552 3300 1.1202 30.79 51.72 64.9257
0.9051 0.4690 3400 1.1180 30.34 53.08 66.4566
0.8621 0.4828 3500 1.1042 33.3 53.86 60.7834
0.8236 0.4966 3600 1.1070 32.77 53.21 62.0441
0.829 0.5103 3700 1.0771 32.49 54.21 62.5844
0.8375 0.5241 3800 1.0780 32.27 53.98 63.0797
0.8206 0.5379 3900 1.0615 33.26 55.07 61.6389
0.8059 0.5517 4000 1.0552 33.24 55.16 61.5038

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
14
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v6.3.0-4k-r

Finetuned
(546)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v6.3.0-4k-r

Collection including ymoslem/whisper-medium-ga2en-v6.3.0-4k-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    33.240
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    61.504