Update README.md
Browse files
README.md
CHANGED
@@ -90,7 +90,7 @@ img {
|
|
90 |
| [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
|
91 |
|
92 |
|
93 |
-
This model was trained on a composite dataset
|
94 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
95 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
|
96 |
It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
|
@@ -127,7 +127,7 @@ asr_model.transcribe(['2086-149220-0033.wav'])
|
|
127 |
|
128 |
```shell
|
129 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
130 |
-
pretrained_name="nvidia/
|
131 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
132 |
```
|
133 |
|
@@ -149,14 +149,14 @@ The NeMo toolkit [3] was used for training the models for over several hundred e
|
|
149 |
|
150 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
151 |
|
152 |
-
The checkpoint of the language model used
|
153 |
|
154 |
## Datasets
|
155 |
All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
|
156 |
|
157 |
-
-
|
158 |
-
-
|
159 |
-
- VoxPopuli 182 hours
|
160 |
|
161 |
Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
|
162 |
|
@@ -170,7 +170,7 @@ The latest model obtains the following greedy scores on the following evaluation
|
|
170 |
- 5.88 % on MLS dev
|
171 |
- 4.91 % on MLS test
|
172 |
|
173 |
-
With 128 beam search and 4gram KenLM model
|
174 |
|
175 |
- 7.95 % on MCV7.0 dev
|
176 |
- 9.16 % on MCV7.0 test
|
@@ -205,5 +205,3 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
205 |
|
206 |
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
207 |
|
208 |
-
|
209 |
-
---
|
|
|
90 |
| [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
|
91 |
|
92 |
|
93 |
+
This model was trained on a composite dataset comprising of over 1500 hours of French speech.
|
94 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
95 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
|
96 |
It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
|
|
|
127 |
|
128 |
```shell
|
129 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
130 |
+
pretrained_name="nvidia/stt_fr_conformer_ctc_large"
|
131 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
132 |
```
|
133 |
|
|
|
149 |
|
150 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
151 |
|
152 |
+
The checkpoint of the language model used for rescoring can be found [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
|
153 |
|
154 |
## Datasets
|
155 |
All the models in this collection are trained on a composite dataset (NeMo ASRSET) comprising of over a thousand hours of French speech:
|
156 |
|
157 |
+
- MozillaCommonVoice 7.0 - 356 hours
|
158 |
+
- Multilingual LibriSpeech - 1036 hours
|
159 |
+
- VoxPopuli - 182 hours
|
160 |
|
161 |
Both models use same dataset, excluding a preprocessing step to strip hyphen from data for secondary model's training.
|
162 |
|
|
|
170 |
- 5.88 % on MLS dev
|
171 |
- 4.91 % on MLS test
|
172 |
|
173 |
+
With 128 beam search and 4gram KenLM model:
|
174 |
|
175 |
- 7.95 % on MCV7.0 dev
|
176 |
- 9.16 % on MCV7.0 test
|
|
|
205 |
|
206 |
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
207 |
|
|
|
|