Yurii Paniv commited on
Commit
0b43409
·
1 Parent(s): a979b42

Add README with instructions how to create a model

Browse files
Files changed (1) hide show
  1. README.md +224 -5
README.md CHANGED
@@ -1,11 +1,230 @@
1
  # voice-recognition-ua
2
- How to run:
3
- 1. Make sure to download:
4
- 2. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.2/uk.tflite
 
 
 
 
5
  3. https://github.com/mozilla/DeepSpeech/releases/download/v0.9.1/deepspeech-0.9.1-models.tflite
6
 
7
- How to launch:
8
  ```
9
  export FLASK_APP=main.py
10
  flask run
11
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # voice-recognition-ua
2
+ This is a repository with aim to apply state of the art speech recognition technologies for Ukrainian language.
3
+ You can see online demo here https://voice-recognition-ua.herokuapp.com/ (your voice is not stored).
4
+ Source code is in this repository together with auto-deploy pipeline scripts.
5
+
6
+ ## Pre-run requirements
7
+ Make sure to download:
8
+ 1. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.2/uk.tflite
9
  3. https://github.com/mozilla/DeepSpeech/releases/download/v0.9.1/deepspeech-0.9.1-models.tflite
10
 
11
+ ## How to launch
12
  ```
13
  export FLASK_APP=main.py
14
  flask run
15
+ ```
16
+
17
+ # How to train your own model
18
+
19
+ Most of the guide is took from there:
20
+ https://deepspeech.readthedocs.io/en/v0.9.1/TRAINING.html
21
+
22
+ ## Steps:
23
+ 1. Create g4dn.xlarge instance on AWS, Deep Learning AMI (Ubuntu 18.04), 150 GB of space.
24
+
25
+ 2. Install Python requirements:
26
+ ```
27
+ sudo apt-get install python3-dev sox libsox-fmt-mp3 # sox is used for audio reading
28
+ ```
29
+
30
+ 3. Clone DeepSpeech branch v0.9.1
31
+ ```
32
+ git clone --branch v0.9.1 https://github.com/mozilla/DeepSpeech
33
+ ```
34
+ 4. Go into DeepSpeech directory:
35
+ ```
36
+ cd DeepSpeech
37
+ ```
38
+ 5. Create virtual environment using conda (it will be easier to manage CUDA libraries):
39
+ ```
40
+ conda create --prefix $HOME/tmp/deepspeech-train-venv/ python=3.7
41
+ ```
42
+ 6. Activate it:
43
+ ```
44
+ conda activate /home/ubuntu/tmp/deepspeech-train-venv
45
+ ```
46
+ 7. Install DeepSpeech requirements:
47
+ ```
48
+ pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0
49
+ pip3 install --upgrade -e .
50
+ ```
51
+ 8. Install required CUDA libraries:
52
+ ```
53
+ conda install cudnn=7.6=cuda10.1_0
54
+ pip3 install 'tensorflow-gpu==1.15.4'
55
+ ```
56
+ 9. Open https://commonvoice.mozilla.org/uk/datasets and copy link to Ukrainian dataset.
57
+ ```
58
+ cd ..
59
+ wget <your_link_to_dataset>
60
+ tar -xf uk.tar.gz
61
+ ```
62
+ You'll get a folder named `cv-corpus-5.1-2020-06-22`
63
+ 10. Download alphabet, used for dataset.
64
+ Alphabet is a file with all possible symbols, that are going to be in a dataset. Outputs are directly formed from alphabet. Alphabet is also used for filtering, data, that contain symbols not in alphabet, will be skipped.
65
+ ```
66
+ cd ./DeepSpeech
67
+ mkdir data_uk
68
+ cd ./data_uk
69
+ wget https://github.com/robinhad/voice-recognition-ua/releases/download/v0.2/alphabet.txt
70
+ ```
71
+ NOTE: if you create your alphabet, make sure it's in UTF-8 format
72
+
73
+ 11. Filter data, that contains symbols not in alphabet:
74
+ ```
75
+ cd .. # DeepSpeech
76
+ bin/import_cv2.py --filter_alphabet ./data_uk/alphabet.txt ../cv-corpus-5.1-2020-06-22/uk
77
+ ```
78
+ 12. (Optional step if you want to create model from scratch, expect low performance because of small dataset (~20 hours for Ukrainian))
79
+ ```
80
+ python3 DeepSpeech.py --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv
81
+ ```
82
+ 13. Transfer Learning
83
+ Transfer learning is method of using existing, pre-trained model on one dataset and apply it on similar, but another. In example, if we do speech recognition, we can use a fact that with each layer model deals with more general concept. Starting layers recognize different sound and low-level patterns, whereas later layers are more involved in final output (letters). So in that case we freeze all the layers (they don't update during training) except the specified last ones, where we substitute English alphabet with Ukrainian one.
84
+ Below we will download English model checkpoint and create folder for Ukrainian one.
85
+ ```
86
+ mkdir checkpoints
87
+ cd ./checkpoints
88
+ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.1/deepspeech-0.9.1-checkpoint.tar.gz
89
+ tar -xf deepspeech-0.9.1-checkpoint.tar.gz
90
+ mkdir uk_transfer_checkpoint
91
+ cd ..
92
+ ```
93
+ 14. Start a training itself. (if you want to make changes to training parameters, run `python3 DeepSpeech.py --helpfull` for list of all parameters).
94
+ When model finishes training, there will be error due to bug in DeepSpeech that will prevent evaluating performance for now, we will fix it in the next step.
95
+ It will take a while, ~11 minutes per epoch.
96
+ ```
97
+ python3 DeepSpeech.py \
98
+ --train_cudnn \
99
+ --drop_source_layers 2 \
100
+ --alphabet_config_path ./data_uk/alphabet.txt \
101
+ --save_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
102
+ --load_checkpoint_dir ./checkpoints/deepspeech-0.9.1-checkpoint \
103
+ --train_files ../cv-corpus-5.1-2020-06-22/uk/clips/train.csv \
104
+ --dev_files ../cv-corpus-5.1-2020-06-22/uk/clips/dev.csv \
105
+ --test_files ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv \
106
+ --epochs 10 \
107
+ ```
108
+ 15. Evaluate model:
109
+ ```
110
+ python3 DeepSpeech.py \
111
+ --train_cudnn \
112
+ --alphabet_config_path ./data_uk/alphabet.txt \
113
+ --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
114
+ --train_files ../cv-corpus-5.1-2020-06-22/uk/clips/train.csv \
115
+ --dev_files ../cv-corpus-5.1-2020-06-22/uk/clips/dev.csv \
116
+ --test_files ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv \
117
+ --test_batch_size 40 \
118
+ --epochs 0
119
+ ```
120
+ It will take a while, approximately 20-30 minutes.
121
+
122
+ You will get performance report:
123
+ WER - Word Error Rate, calculates how much characters were guessed correctly.
124
+ CER - Character Error Rate, calculates how much characters were guessed correctly.
125
+ Here we have WER 95% and CER 36%.
126
+ It is high because we don't use scorer (language model that maps chacter sequence to the closest word match) during training, you can improve performance if you create scorer for Ukrainian language. As a text corpus you can use Wikipedia articles.
127
+ ```
128
+ Test on ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv - WER: 0.950863, CER: 0.357779, loss: 59.444176
129
+ --------------------------------------------------------------------------------
130
+ Best WER:
131
+ --------------------------------------------------------------------------------
132
+ WER: 0.000000, CER: 0.000000, loss: 2.696858
133
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21203420.wav
134
+ - src: "я замер"
135
+ - res: "я замер"
136
+ --------------------------------------------------------------------------------
137
+ WER: 0.000000, CER: 0.000000, loss: 1.772630
138
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21755897.wav
139
+ - src: "що саме"
140
+ - res: "що саме"
141
+ --------------------------------------------------------------------------------
142
+ WER: 0.000000, CER: 0.000000, loss: 0.269474
143
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21350648.wav
144
+ - src: "ні"
145
+ - res: "ні"
146
+ --------------------------------------------------------------------------------
147
+ WER: 0.250000, CER: 0.066667, loss: 7.652889
148
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22161067.wav
149
+ - src: "і вухом не веде"
150
+ - res: "і вухом не виде"
151
+ --------------------------------------------------------------------------------
152
+ WER: 0.333333, CER: 0.142857, loss: 22.727850
153
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_20894315.wav
154
+ - src: "подробиці наразі уточнюються"
155
+ - res: "подробиці наразі удочнвітцся"
156
+ --------------------------------------------------------------------------------
157
+ Median WER:
158
+ --------------------------------------------------------------------------------
159
+ WER: 1.000000, CER: 0.408163, loss: 77.099953
160
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21565481.wav
161
+ - src: "це було висвітлено і в засобах масової інформації"
162
+ - res: "сцеболовистітоно ів засовавнасавинсерматції"
163
+ --------------------------------------------------------------------------------
164
+ WER: 1.000000, CER: 0.304878, loss: 76.661797
165
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21568626.wav
166
+ - src: "всі ці зірки для тебе сказав хлопчик і ударив дівчинку металевим тазіком по голові"
167
+ - res: "сицізяртідлетебе сказавни хлобчик юдаревдів чимкуметалевимтазіком поговолі"
168
+ --------------------------------------------------------------------------------
169
+ WER: 1.000000, CER: 0.261364, loss: 76.638161
170
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22071941.wav
171
+ - src: "кабінет міністрів україни складає повноваження перед новообраною верховною радою україни"
172
+ - res: "кабіна міністрівукаїни колале повнваженя перебновообрануюварховли радийву країни"
173
+ --------------------------------------------------------------------------------
174
+ WER: 1.000000, CER: 0.403846, loss: 76.634865
175
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21381457.wav
176
+ - src: "механізм формування агатів остаточно не встановлений"
177
+ - res: "махенізаформовання оатья востотачномистоновлими"
178
+ --------------------------------------------------------------------------------
179
+ WER: 1.000000, CER: 0.415094, loss: 76.133347
180
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21567387.wav
181
+ - src: "засідання верховної ради україни проводяться відкрито"
182
+ - res: "засі веневорковмаградиукраїне проодізівікрипо"
183
+ --------------------------------------------------------------------------------
184
+ Worst WER:
185
+ --------------------------------------------------------------------------------
186
+ WER: 1.500000, CER: 0.266667, loss: 18.258444
187
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_20900153.wav
188
+ - src: "вона віддасться"
189
+ - res: "пона віддас ця"
190
+ --------------------------------------------------------------------------------
191
+ WER: 1.500000, CER: 0.307692, loss: 15.984250
192
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22247322.wav
193
+ - src: "ескулап лікар"
194
+ - res: "е скула лліка"
195
+ --------------------------------------------------------------------------------
196
+ WER: 1.500000, CER: 0.277778, loss: 15.076320
197
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21582521.wav
198
+ - src: "цензура заборонена"
199
+ - res: "зан зура забороонено"
200
+ --------------------------------------------------------------------------------
201
+ WER: 1.666667, CER: 0.478261, loss: 42.762665
202
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21568871.wav
203
+ - src: "пегас символізує поезію"
204
+ - res: "веляс це волі зуя поєсі"
205
+ --------------------------------------------------------------------------------
206
+ WER: 2.000000, CER: 0.333333, loss: 10.796988
207
+ - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21563967.wav
208
+ - src: "легітимність"
209
+ - res: "вегі пимнсть"
210
+ --------------------------------------------------------------------------------
211
+ ```
212
+ 16. To export model for later usage:
213
+ ```
214
+ mkdir model
215
+ # export .pb file
216
+ python3 DeepSpeech.py \
217
+ --train_cudnn \
218
+ --alphabet_config_path ./data_uk/alphabet.txt \
219
+ --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
220
+ --export_dir ./model \
221
+ --epochs 0
222
+ # export .tflite file for embedded usage
223
+ python3 DeepSpeech.py \
224
+ --train_cudnn \
225
+ --alphabet_config_path ./data_uk/alphabet.txt \
226
+ --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
227
+ --export_tflite --export_dir ./model \
228
+ --epochs 0
229
+ ```
230
+ For advanced usage please refer to https://deepspeech.readthedocs.io/en/v0.9.1/USING.html