anuragshas commited on
Commit
3a5b628
·
1 Parent(s): e9bd3f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -4
README.md CHANGED
@@ -6,17 +6,46 @@ tags:
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_8_0
8
  - generated_from_trainer
 
9
  datasets:
10
- - common_voice
11
  model-index:
12
- - name: ''
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
- #
20
 
21
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SK dataset.
22
  It achieves the following results on the evaluation set:
@@ -86,3 +115,42 @@ The following hyperparameters were used during training:
86
  - Pytorch 1.10.2+cu102
87
  - Datasets 1.18.4.dev0
88
  - Tokenizers 0.11.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_8_0
8
  - generated_from_trainer
9
+ - robust-speech-event
10
  datasets:
11
+ - mozilla-foundation/common_voice_8_0
12
  model-index:
13
+ - name: XLS-R-300M - Slovak
14
+ results:
15
+ - task:
16
+ name: Automatic Speech Recognition
17
+ type: automatic-speech-recognition
18
+ dataset:
19
+ name: Common Voice 8
20
+ type: mozilla-foundation/common_voice_8_0
21
+ args: sk
22
+ metrics:
23
+ - name: Test WER
24
+ type: wer
25
+ value: 18.609
26
+ - name: Test CER
27
+ type: cer
28
+ value: 5.488
29
+ - task:
30
+ name: Automatic Speech Recognition
31
+ type: automatic-speech-recognition
32
+ dataset:
33
+ name: Robust Speech Event - Dev Data
34
+ type: speech-recognition-community-v2/dev_data
35
+ args: sk
36
+ metrics:
37
+ - name: Test WER
38
+ type: wer
39
+ value: 40.548
40
+ - name: Test CER
41
+ type: cer
42
+ value: 17.733
43
  ---
44
 
45
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
46
  should probably proofread and complete it, then remove this comment. -->
47
 
48
+ # XLS-R-300M - Slovak
49
 
50
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SK dataset.
51
  It achieves the following results on the evaluation set:
 
115
  - Pytorch 1.10.2+cu102
116
  - Datasets 1.18.4.dev0
117
  - Tokenizers 0.11.0
118
+
119
+ #### Evaluation Commands
120
+ 1. To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`
121
+
122
+ ```bash
123
+ python eval.py --model_id anuragshas/wav2vec2-xls-r-300m-sk-cv8-with-lm --dataset mozilla-foundation/common_voice_8_0 --config sk --split test
124
+ ```
125
+
126
+ 2. To evaluate on `speech-recognition-community-v2/dev_data`
127
+
128
+ ```bash
129
+ python eval.py --model_id anuragshas/wav2vec2-xls-r-300m-sk-cv8-with-lm --dataset speech-recognition-community-v2/dev_data --config sk --split validation --chunk_length_s 5.0 --stride_length_s 1.0
130
+ ```
131
+
132
+ ### Inference With LM
133
+
134
+ ```python
135
+ import torch
136
+ from datasets import load_dataset
137
+ from transformers import AutoModelForCTC, AutoProcessor
138
+ import torchaudio.functional as F
139
+ model_id = "anuragshas/wav2vec2-xls-r-300m-sk-cv8-with-lm"
140
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "sk", split="test", streaming=True, use_auth_token=True))
141
+ sample = next(sample_iter)
142
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
143
+ model = AutoModelForCTC.from_pretrained(model_id)
144
+ processor = AutoProcessor.from_pretrained(model_id)
145
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
146
+ with torch.no_grad():
147
+ logits = model(input_values).logits
148
+ transcription = processor.batch_decode(logits.numpy()).text
149
+ # => ""
150
+ ```
151
+
152
+ ### Eval results on Common Voice 8 "test" (WER):
153
+
154
+ | Without LM | With LM (run `./eval.py`) |
155
+ |---|---|
156
+ | 26.707 | 18.609 |