angel-poc commited on
Commit
3a69b38
·
2 Parent(s): a8f0f3d fd93bac

Merge branch 'main' of https://huggingface.co/projecte-aina/stt-ca-citrinet-512 into main

Browse files
Files changed (1) hide show
  1. README.md +81 -9
README.md CHANGED
@@ -9,24 +9,97 @@ tags:
9
  - automatic-speech-recognition
10
  - speech
11
  - audio
 
12
  - citrinet
13
  - pytorch
14
  - NeMo
15
- - hf-asr-leaderboard
16
  widget:
17
- - example_title: Common Voice sample 1
18
- src: https://huggingface.co/projecte-aina/stt-ca-citrinet-512/samples/common_voice_ca_34667058.wav
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
- # Aina Project's Catalan multi-speaker text-to-speech model
22
  ## Model description
23
 
24
- This model was fine-tuned from a pre-trained Spanish [stt-es-citrinet-512](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_es_citrinet_512) model using the [NeMo](https://github.com/NVIDIA/NeMo) toolkit
25
 
26
  ## Intended uses and limitations
27
 
28
- You can use this model for Automatic Speech Recognition (ASR) in catalan.
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Additional information
32
 
@@ -39,12 +112,11 @@ For further information, send an email to [email protected]
39
  ### Copyright
40
  Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
41
 
42
-
43
  ### Licensing Information
44
- [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
45
 
46
  ### Funding
47
- This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
48
 
49
 
50
  ## Disclaimer
 
9
  - automatic-speech-recognition
10
  - speech
11
  - audio
12
+ - CTC
13
  - citrinet
14
  - pytorch
15
  - NeMo
16
+ license: cc-by-4.0
17
  widget:
18
+ - example_title: CV sample 1
19
+ src: https://huggingface.co/projecte-aina/stt-ca-citrinet-512/tree/main/samples/common_voice_ca_34667058.wav
20
+ model-index:
21
+ - name: stt-ca-citrinet-512
22
+ results:
23
+ - task:
24
+ name: Automatic Speech Recognition
25
+ type: automatic-speech-recognition
26
+ dataset:
27
+ name: Mozilla Common Voice 11.0
28
+ type: mozilla-foundation/common_voice_11_0
29
+ config: ca
30
+ split: test
31
+ args:
32
+ language: en
33
+ metrics:
34
+ - name: Test WER
35
+ type: wer
36
+ value: 6.684
37
  ---
38
 
39
+ # Aina Project's Catalan text-to-speech model
40
  ## Model description
41
 
42
+ This model transcribes audio samples in Catalan to lowercase text without punctuation. The model was fine-tuned from a pre-trained Spanish [stt-es-citrinet-512](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_es_citrinet_512) model using the [NeMo](https://github.com/NVIDIA/NeMo) toolkit. It has around 36.5M parámeters and has been trained on [Common Voice 11.0](https://commonvoice.mozilla.org/en/datasets).
43
 
44
  ## Intended uses and limitations
45
 
46
+ You can use this model for Automatic Speech Recognition (ASR) in Catalan, to transcribe audio files in Catalan to plain text without punctuation.
47
 
48
+ ## How to use
49
+ ### Usage
50
+
51
+ Requiered libraries:
52
+
53
+ ```bash
54
+ pip install nemo_toolkit['all']
55
+ ```
56
+
57
+ Clone the repository to download the model:
58
+
59
+ ```bash
60
+ git clone https://huggingface.co/projecte-aina/stt-ca-citrinet-512
61
+ ```
62
+
63
+ Given that `NEMO_PATH` is the path that points to the downloaded `stt-ca-citrinet-512.nemo` file, to do inference over a set of `.wav` files you should:
64
+
65
+ ```python
66
+ # Load the model
67
+ model = nemo_asr.models.EncDecCTCModel.restore_from(NEMO_PATH)
68
+
69
+ # Create a list pointing to the audio files
70
+ paths2audio_files = ["audio_1.wav", ..., "audio_n.wav"]
71
+
72
+ # Fix the batch size to whatever number suits your purpose
73
+ batch_size = 8
74
+
75
+ # Transcribe the audio files
76
+ transcriptions = model.transcribe(paths2audio_files=paths2audio_files,
77
+ batch_size=batch_size)
78
+ # Visualize the transcriptions
79
+ print(transcriptions)
80
+
81
+ ```
82
+
83
+ ## Training data
84
+
85
+ This model has been trained on the training split of the Catalan version of [Common Voice 11.0](https://commonvoice.mozilla.org/en/datasets).
86
+
87
+ ## Training
88
+ ### Data preparation
89
+ We have processed [Common Voice 11.0](https://commonvoice.mozilla.org/en/datasets) using the NeMo toolkit. We used [get_commonvoice_data.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/get_commonvoice_data.py) to process the manifests and made a subsequent data cleaning step.
90
+
91
+ After cleaning the dataset and normalizing the `ñ` character to `ny`, we have used the following charset to create the final NeMo manifests for training.
92
+ ```python
93
+ ['c', ' ', 'ó', 'g', 'a', 'o', 'ü', 'v', 'p', 't', "'", '—', 'f', 'k', 'à', 'ï', 'í', 'ú', 'd', 'l', 'z', 'é', 'w', 'm', 'r', 'n', 'y', '-', 'u', 'i', 'h', 'ç', 'e', '·', 'q', 'è', 'ò', 'j', 'x', 's', 'b']
94
+ ```
95
+
96
+ ### Training procedure
97
+ This model was trained starting from a pre-trained Spanish [stt-es-citrinet-512](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_es_citrinet_512) model. The initial learning rate was set to 0.005 and the minimum lr for weight decay was set to 1e-7.
98
+
99
+ The model was trained for 90 steps and then continued training for another 90 steps starting from a learning rate of 1e-4.
100
+
101
+ ## Evaluation
102
+ After evaluation on the test split of Common Voice 11.0 we have obtained a WER of 6.684.
103
 
104
  ## Additional information
105
 
 
112
  ### Copyright
113
  Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
114
 
 
115
  ### Licensing Information
116
+ [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
117
 
118
  ### Funding
119
+ This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
120
 
121
 
122
  ## Disclaimer