Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
library_name: nemo
|
6 |
+
pipeline_tag: automatic-speech-recognition
|
7 |
+
tags:
|
8 |
+
- conformer-ctc
|
9 |
+
metrics:
|
10 |
+
- wer
|
11 |
+
---
|
12 |
+
# Conformer-ctc-medium-ko
|
13 |
+
ํด๋น ๋ชจ๋ธ์ [RIVA Conformer ASR Korean](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechtotext_ko_kr_conformer)์ AI hub dataset์ ๋ํด ํ์ธํ๋์ ์งํํ์ต๋๋ค. <br>
|
14 |
+
Conformer ๊ธฐ๋ฐ์ ๋ชจ๋ธ์ whisper์ ๊ฐ์ attention ๊ธฐ๋ฐ ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ streaming์ ์งํํ์ฌ๋ ์ฑ๋ฅ์ด ํฌ๊ฒ ๋จ์ด์ง์ง ์๊ณ , ์๋๊ฐ ๋น ๋ฅด๋ค๋ ์ฅ์ ์ด ์์ต๋๋ค.
|
15 |
+
|
16 |
+
|
17 |
+
### dataset
|
18 |
+
|
19 |
+
| ๋ฐ์ดํฐ์
์ด๋ฆ | ๋ฐ์ดํฐ ์ํ ์(train/test) |
|
20 |
+
| --- | --- |
|
21 |
+
| ๊ณ ๊ฐ์๋์์ฑ | 2067668/21092 |
|
22 |
+
| ํ๊ตญ์ด ์์ฑ | 620000/3000 |
|
23 |
+
| ํ๊ตญ์ธ ๋ํ ์์ฑ | 2483570/142399 |
|
24 |
+
| ์์ ๋ํ์์ฑ(์ผ๋ฐ๋จ๋
) | 1886882/263371 |
|
25 |
+
| ๋ณต์ง ๋ถ์ผ ์ฝ์ผํฐ ์๋ด๋ฐ์ดํฐ | 1096704/206470 |
|
26 |
+
| ์ฐจ๋๋ด ๋ํ ๋ฐ์ดํฐ | 2624132/332787 |
|
27 |
+
| ๋ช
๋ น์ด ์์ฑ(๋
ธ์ธ๋จ์ฌ) | 137467/237469 |
|
28 |
+
| ์ ์ฒด | 10916423(13946์๊ฐ)/1206588(1474์๊ฐ) |
|
29 |
+
|
30 |
+
|
31 |
+
## Training procedure
|
32 |
+
|
33 |
+
### Training hyperparameters
|
34 |
+
|
35 |
+
The following hyperparameters were used during training:
|
36 |
+
- learning_rate: 1e-05
|
37 |
+
- train_batch_size: 16
|
38 |
+
- eval_batch_size: 16
|
39 |
+
- num_train_epoch: 2
|
40 |
+
- sample_rate: 16000
|
41 |
+
- max_duration: 20.0
|
42 |
+
|
43 |
+
### Training results
|
44 |
+
|
45 |
+
| Training Loss | Epoch | Wer |
|
46 |
+
|:-------------:|:-----:|:-------:|
|
47 |
+
| 9.09 | 1.0 | 11.51 |
|