ksingla025 commited on
Commit
1ec76a9
·
1 Parent(s): e80128c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -180
README.md CHANGED
@@ -1,215 +1,104 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  model-index:
3
- - name: ksingla025/1step_ctc_ner_emotion_commonvoice500hrs
4
- results:
5
- - task:
6
- type: automatic-speech-recognition
7
- dataset:
8
- name: commonvoice
9
- type: commonvoice
10
- config: other
11
- split: test
12
- args:
13
- language: en
14
- metrics:
15
- - type: wer
16
- value: 15.1
17
- name: WER
18
- ---
19
-
20
- # Model Card for Model ID
21
-
22
- <!-- Provide a quick summary of what the model is/does. -->
23
-
24
-
25
-
26
- ## Model Details
27
-
28
- ### Model Description
29
-
30
- <!-- Provide a longer summary of what this model is. -->
31
-
32
-
33
-
34
- - **Developed by:** [More Information Needed]
35
- - **Funded by [optional]:** [More Information Needed]
36
- - **Shared by [optional]:** [More Information Needed]
37
- - **Model type:** [More Information Needed]
38
- - **Language(s) (NLP):** [More Information Needed]
39
- - **License:** [More Information Needed]
40
- - **Finetuned from model [optional]:** [More Information Needed]
41
-
42
- ### Model Sources [optional]
43
-
44
- <!-- Provide the basic links for the model. -->
45
-
46
- - **Repository:** [More Information Needed]
47
- - **Paper [optional]:** [More Information Needed]
48
- - **Demo [optional]:** [More Information Needed]
49
-
50
- ## Uses
51
-
52
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
53
-
54
- ### Direct Use
55
-
56
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
57
-
58
- [More Information Needed]
59
-
60
- ### Downstream Use [optional]
61
-
62
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
63
-
64
- [More Information Needed]
65
-
66
- ### Out-of-Scope Use
67
-
68
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
69
-
70
- [More Information Needed]
71
-
72
- ## Bias, Risks, and Limitations
73
-
74
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
75
-
76
- [More Information Needed]
77
-
78
- ### Recommendations
79
-
80
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
81
-
82
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
83
-
84
- ## How to Get Started with the Model
85
-
86
- Use the code below to get started with the model.
87
 
88
- [More Information Needed]
89
-
90
- ## Training Details
91
-
92
- ### Training Data
93
-
94
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
95
-
96
- [More Information Needed]
97
-
98
- ### Training Procedure
99
-
100
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
101
-
102
- #### Preprocessing [optional]
103
-
104
- [More Information Needed]
105
-
106
-
107
- #### Training Hyperparameters
108
-
109
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
110
-
111
- #### Speeds, Sizes, Times [optional]
112
-
113
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
114
-
115
- [More Information Needed]
116
-
117
- ## Evaluation
118
-
119
- <!-- This section describes the evaluation protocols and provides the results. -->
120
-
121
- ### Testing Data, Factors & Metrics
122
-
123
- #### Testing Data
124
-
125
- <!-- This should link to a Dataset Card if possible. -->
126
-
127
- [More Information Needed]
128
-
129
- #### Factors
130
-
131
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
132
-
133
- [More Information Needed]
134
-
135
- #### Metrics
136
-
137
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
138
-
139
- [More Information Needed]
140
-
141
- ### Results
142
-
143
- [More Information Needed]
144
-
145
- #### Summary
146
-
147
-
148
-
149
- ## Model Examination [optional]
150
 
151
- <!-- Relevant interpretability work for the model goes here -->
152
 
153
- [More Information Needed]
154
 
155
- ## Environmental Impact
156
 
157
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
158
 
159
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
160
 
161
- - **Hardware Type:** [More Information Needed]
162
- - **Hours used:** [More Information Needed]
163
- - **Cloud Provider:** [More Information Needed]
164
- - **Compute Region:** [More Information Needed]
165
- - **Carbon Emitted:** [More Information Needed]
166
 
167
- ## Technical Specifications [optional]
168
 
169
- ### Model Architecture and Objective
170
 
171
- [More Information Needed]
 
 
 
172
 
173
- ### Compute Infrastructure
 
 
 
 
 
 
 
 
 
174
 
175
- [More Information Needed]
176
 
177
- #### Hardware
 
 
178
 
179
- [More Information Needed]
180
 
181
- #### Software
182
 
183
- [More Information Needed]
184
 
185
- ## Citation [optional]
186
 
187
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
188
 
189
- **BibTeX:**
190
 
191
- [More Information Needed]
192
 
193
- **APA:**
194
 
195
- [More Information Needed]
196
 
197
- ## Glossary [optional]
198
 
199
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
200
 
201
- [More Information Needed]
 
 
202
 
203
- ## More Information [optional]
204
 
205
- [More Information Needed]
206
 
207
- ## Model Card Authors [optional]
 
208
 
209
- [More Information Needed]
210
 
211
- ## Model Card Contact
212
 
213
- [More Information Needed]
214
 
 
215
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: cc-by-nc-nd-4.0
5
+ library_name: nemo
6
+ datasets:
7
+ - commonvoice
8
+ thumbnail: null
9
+ tags:
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - audio
13
+ - CTC
14
+ - named-entity-recognition
15
+ - emotion-classification
16
+ - Transformer
17
+ - NeMo
18
+ - pytorch
19
  model-index:
20
+ - name: 1step_ctc_ner_emotion_commonvoice500hrs
21
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
 
25
 
26
+ ## ASR+NL Model Overview
27
 
28
+ Recoganize begin and end of digit sequences and also transcribe
29
 
30
+ ## NVIDIA NeMo: Training
31
 
32
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
33
+ ```
34
+ pip install nemo_toolkit['all']
35
+ ```
36
 
37
+ ## How to Use this Model
 
 
 
 
38
 
39
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
40
 
41
+ ### Automatically instantiate the model
42
 
43
+ ```python
44
+ import nemo.collections.asr as nemo_asr
45
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("ksingla025/1step_ctc_ner_emotion_commonvoice500hrs")
46
+ ```
47
 
48
+ ### Transcribe and tag using Python
49
+ First, let's get a sample
50
+ ```
51
+ wget https://www.dropbox.com/s/fmre0xkl3ism62e/audio.zip?dl=0
52
+ unzip audio.zip
53
+ ```
54
+ Then simply do:
55
+ ```
56
+ asr_model.transcribe(['audio/digits1.wav'])
57
+ ```
58
 
59
+ ### Transcribing many audio files
60
 
61
+ ```shell
62
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="ksingla025/1step_ctc_ner_emotion_commonvoice500hrs" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
63
+ ```
64
 
65
+ ### Input
66
 
67
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
68
 
69
+ ### Output
70
 
71
+ This model provides transcribed speech as a string for a given audio sample.
72
 
73
+ ## Model Architecture
74
 
75
+ <ADD SOME INFORMATION ABOUT THE ARCHITECTURE>
76
 
77
+ ## Training
78
 
79
+ <ADD INFORMATION ABOUT HOW THE MODEL WAS TRAINED - HOW MANY EPOCHS, AMOUNT OF COMPUTE ETC>
80
 
81
+ ### Datasets
82
 
83
+ <LIST THE NAME AND SPLITS OF DATASETS USED TO TRAIN THIS MODEL (ALONG WITH LANGUAGE AND ANY ADDITIONAL INFORMATION)>
84
 
85
+ ## Performance
86
 
87
+ <LIST THE SCORES OF THE MODEL -
88
+ OR
89
+ USE THE Hugging Face Evaluate LiBRARY TO UPLOAD METRICS>
90
 
91
+ ## Limitations
92
 
93
+ <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
94
 
95
+ Eg:
96
+ Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
97
 
 
98
 
99
+ ## References
100
 
101
+ <ADD ANY REFERENCES HERE AS NEEDED>
102
 
103
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
104