poonehmousavi commited on
Commit
77ec483
·
1 Parent(s): 488cfd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - en
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
@@ -15,31 +15,31 @@ metrics:
15
  - wer
16
  - cer
17
  model-index:
18
- - name: asr-wav2vec2-commonvoice-14-en
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
- name: CommonVoice Corpus 14.0 (English)
25
  type: mozilla-foundation/common_voice_14.0
26
- config: en
27
  split: test
28
  args:
29
- language: en
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
- value: '16.68'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
- # wav2vec 2.0 with CTC trained on CommonVoice English (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
- recognition from an end-to-end system pretrained on CommonVoice (English Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
@@ -47,14 +47,14 @@ The performance of the model is the following:
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
- | 15-08-23 | 7.92 | 16.86 | 1xV100 32GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
  - Tokenizer (unigram) that transforms words into unigrams and trained with
56
- the train transcriptions (train.tsv) of CommonVoice (en).
57
- - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-large-lv60](https://huggingface.co/facebook/wav2vec2-large-lv60)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
@@ -71,20 +71,20 @@ pip install speechbrain transformers
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
- ### Transcribing your own audio files (in English)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
- asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-14-en", savedir="pretrained_models/asr-wav2vec2-commonvoice-14-en")
80
- asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-14-en/example-en.wav")
81
 
82
  ```
83
  ### Inference on GPU
84
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
85
 
86
  ## Parallel Inference on a Batch
87
- Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
88
 
89
  ### Training
90
  The model was trained with SpeechBrain.
@@ -103,7 +103,7 @@ pip install -e .
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/CTC/
106
- python train_with_wav2vec.py hparams/train_en_with_wav2vec.yaml --data_folder=your_data_folder
107
  ```
108
 
109
  You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/ch10cnbhf1faz3w/AACdHFG65LC6582H0Tet_glTa?dl=0).
 
1
  ---
2
  language:
3
+ - fr
4
  thumbnail: null
5
  pipeline_tag: automatic-speech-recognition
6
  tags:
 
15
  - wer
16
  - cer
17
  model-index:
18
+ - name: asr-wav2vec2-commonvoice-14-fr
19
  results:
20
  - task:
21
  name: Automatic Speech Recognition
22
  type: automatic-speech-recognition
23
  dataset:
24
+ name: CommonVoice Corpus 14.0 (French)
25
  type: mozilla-foundation/common_voice_14.0
26
+ config: fr
27
  split: test
28
  args:
29
+ language: fr
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
+ value: '10.24'
34
  ---
35
 
36
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
37
  <br/><br/>
38
 
39
+ # wav2vec 2.0 with CTC trained on CommonVoice French (No LM)
40
 
41
  This repository provides all the necessary tools to perform automatic speech
42
+ recognition from an end-to-end system pretrained on CommonVoice (French Language) within
43
  SpeechBrain. For a better experience, we encourage you to learn more about
44
  [SpeechBrain](https://speechbrain.github.io).
45
 
 
47
 
48
  | Release | Test CER | Test WER | GPUs |
49
  |:-------------:|:--------------:|:--------------:| :--------:|
50
+ | 15-08-23 | 3.44 | 10.24 | 1xV100 32GB |
51
 
52
  ## Pipeline description
53
 
54
  This ASR system is composed of 2 different but linked blocks:
55
  - Tokenizer (unigram) that transforms words into unigrams and trained with
56
+ the train transcriptions (train.tsv) of CommonVoice (fr).
57
+ - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large)) is combined with two DNN layers and finetuned on CommonVoice DE.
58
  The obtained final acoustic representation is given to the CTC decoder.
59
 
60
  The system is trained with recordings sampled at 16kHz (single channel).
 
71
  Please notice that we encourage you to read our tutorials and learn more about
72
  [SpeechBrain](https://speechbrain.github.io).
73
 
74
+ ### Transcribing your own audio files (in French)
75
 
76
  ```python
77
  from speechbrain.pretrained import EncoderASR
78
 
79
+ asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-14-fr", savedir="pretrained_models/asr-wav2vec2-commonvoice-14-fr")
80
+ asr_model.transcribe_file("speechbrain/asr-wav2vec2-commonvoice-14-fr/example-fr.wav")
81
 
82
  ```
83
  ### Inference on GPU
84
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
85
 
86
  ## Parallel Inference on a Batch
87
+ Please, [see this Colab notebook](https://www.dropbox.com/sh/0i7esfa8jp3rxpp/AAArdi8IuCRmob2WAS7lg6M4a?dl=0) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
88
 
89
  ### Training
90
  The model was trained with SpeechBrain.
 
103
  3. Run Training:
104
  ```bash
105
  cd recipes/CommonVoice/ASR/CTC/
106
+ python train_with_wav2vec.py hparams/train_fr_with_wav2vec.yaml --data_folder=your_data_folder
107
  ```
108
 
109
  You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/ch10cnbhf1faz3w/AACdHFG65LC6582H0Tet_glTa?dl=0).