ypluit commited on
Commit
8c1a38e
·
1 Parent(s): 99312fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -1
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ language:
2
+ - kr
3
  license: cc-by-4.0
4
+ library_name: nemo
5
+ datasets:
6
+ - RealCallData
7
+ thumbnail: null
8
+ tags:
9
+ - automatic-speech-recognition
10
+ - speech
11
+ - audio
12
+ - Citrinet1024
13
+ - NeMo
14
+ - pytorch
15
+ model-index:
16
+ - name: stt_kr_citrinet1024_PublicCallCenter_1000H_0.26
17
+ results: []
18
  ---
19
+
20
+ ## Model Overview
21
+
22
+ <DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
23
+
24
+ ## NVIDIA NeMo: Training
25
+
26
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
27
+ ```
28
+ pip install nemo_toolkit['all']
29
+ ```
30
+
31
+ ## How to Use this Model
32
+
33
+ The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
34
+
35
+
36
+ ### Automatically instantiate the model
37
+
38
+ ```python
39
+ import nemo.collections.asr as nemo_asr
40
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("ypluit/stt_kr_citrinet1024_PublicCallCenter_1000H_0.26")
41
+ ```
42
+
43
+
44
+ ### Transcribing using Python
45
+ First, let's get a sample
46
+ ```
47
+ get any korean telephone voice wave file
48
+ ```
49
+ Then simply do:
50
+ ```
51
+ asr_model.transcribe(['sample-kr.wav'])
52
+ ```
53
+
54
+ ### Transcribing many audio files
55
+
56
+ ```shell
57
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="model" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
58
+ ```
59
+
60
+ ### Input
61
+
62
+ This model accepts 16000Hz Mono-channel Audio (wav files) as input.
63
+
64
+ ### Output
65
+
66
+ This model provides transcribed speech as a string for a given audio sample.
67
+
68
+
69
+ ## Model Architecture
70
+
71
+ See nemo toolkit and reference papers.
72
+ ## Training
73
+
74
+ Learned about 20 days on 2 A6000
75
+
76
+ ### Datasets
77
+
78
+ Private call center real data (1200hour)
79
+
80
+ ## Performance
81
+
82
+ 0.26 WER
83
+
84
+ ## Limitations
85
+
86
+ This model was trained with 1200 hours of Korean telephone voice data for customer service in a call center. might be Poor performance for general-purpose dialogue and specific accents.
87
+
88
+ ## References
89
+
90
+
91
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)