File size: 4,514 Bytes
61178d6 98620d9 61178d6 98620d9 61178d6 98620d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
- whisper-large
- mozilla-foundation/common_voice_11_0
- greek
datasets:
- mozilla-foundation/common_voice_11_0
- google/fleurs
metrics:
- wer
model-index:
- name: whisper-lg-el-intlv-xs-2
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 el
type: mozilla-foundation/common_voice_11_0
config: el
split: test
metrics:
- name: Wer
type: wer
value: 9.50037147102526
---
# whisper-lg-el-intlv-xs-2
This model is a fine-tuned version of [farsipal/whisper-lg-el-intlv-xs](https://huggingface.co/farsipal/whisper-lg-el-intlv-xs) on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2872
- Wer: 9.5004
## Model description
The model was trained on two interleaved datasets for transcription in the Greek language.
## Intended uses & limitations
Transcription in the Greek language
## Training and evaluation data
Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.
## Training procedure
```
--model_name_or_path 'farsipal/whisper-lg-el-intlv-xs' \
--model_revision main \
--do_train True \
--do_eval True \
--use_auth_token False \
--freeze_feature_encoder False \
--freeze_encoder False \
--model_index_name 'whisper-lg-el-intlv-xs-2' \
--dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
--dataset_config_name 'el,el_gr' \
--train_split_name 'train+validation,train+validation' \
--eval_split_name 'test,-' \
--text_column_name 'sentence,transcription' \
--audio_column_name 'audio,audio' \
--streaming False \
--max_duration_in_seconds 30 \
--do_lower_case False \
--do_remove_punctuation False \
--do_normalize_eval True \
--language greek \
--task transcribe \
--shuffle_buffer_size 500 \
--output_dir './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
--overwrite_output_dir True \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 3.5e-6 \
--dropout 0.15 \
--attention_dropout 0.05 \
--warmup_steps 500 \
--max_steps 5000 \
--eval_steps 1000 \
--gradient_checkpointing True \
--cache_dir '~/.cache' \
--fp16 True \
--evaluation_strategy steps \
--per_device_eval_batch_size 8 \
--predict_with_generate True \
--generation_max_length 225 \
--save_steps 1000 \
--logging_steps 25 \
--report_to tensorboard \
--load_best_model_at_end True \
--metric_for_best_model wer \
--greater_is_better False \
--push_to_hub False \
--dataloader_num_workers 6
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3.5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0813 | 2.49 | 1000 | 0.2147 | 10.8284 |
| 0.0379 | 4.98 | 2000 | 0.2439 | 10.0111 |
| 0.0195 | 7.46 | 3000 | 0.2767 | 9.8811 |
| 0.0126 | 9.95 | 4000 | 0.2872 | 9.5004 |
| 0.0103 | 12.44 | 5000 | 0.3021 | 9.6954 |
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.8.1.dev0
- Tokenizers 0.13.2
|