|
--- |
|
license: apache-2.0 |
|
language: |
|
- ko |
|
library_name: nemo |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- conformer-ctc |
|
metrics: |
|
- wer |
|
--- |
|
# Conformer-ctc-medium-ko |
|
ν΄λΉ λͺ¨λΈμ [RIVA Conformer ASR Korean](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechtotext_ko_kr_conformer)μ AI hub datasetμ λν΄ νμΈνλμ μ§ννμ΅λλ€. <br> |
|
Conformer κΈ°λ°μ λͺ¨λΈμ whisperμ κ°μ attention κΈ°λ° λͺ¨λΈκ³Ό λ¬λ¦¬ streamingμ μ§ννμ¬λ μ±λ₯μ΄ ν¬κ² λ¨μ΄μ§μ§ μκ³ , μλκ° λΉ λ₯΄λ€λ μ₯μ μ΄ μμ΅λλ€.<br> |
|
V100 GPUμμλ RTFκ° 0.05, CPU(7 cores)μμλ 0.35 μ λ λμ€λ κ²μ νμΈν μ μμμ΅λλ€.<br> |
|
μ€λμ€ chunk size 2μ΄μ streaming ν
μ€νΈμμλ μ 체 μ€λμ€λ₯Ό λ£λ κ²μ λΉν΄μλ 20% μ λ μ±λ₯μ νκ° μμΌλ μΆ©λΆν μ¬μ©ν μ μλ μ±λ₯μ
λλ€.<br> |
|
μΆκ°λ‘ open domainμ΄ μλ κ³ κ° μλ μμ±κ³Ό κ°μ domainμμλ kenlmμ μΆκ°νμμ λ WER 13.45μμ WER 5.27λ‘ ν¬κ² μ±λ₯ ν₯μμ΄ μμμ΅λλ€.<br> |
|
νμ§λ§ κ·Έ μΈμ domainμμλ kenlmμ μΆκ°κ° ν° μ±λ₯ ν₯μμΌλ‘ μ΄μ΄μ§μ§ μμμ΅λλ€. |
|
|
|
Streaming μ½λμ Denoise modelμ΄ ν¬ν¨λ μ½λλ μλ κΉνμμ νμΈν μ μμ΅λλ€. |
|
[https://github.com/SUNGBEOMCHOI/Korean-Streaming-ASR](https://github.com/SUNGBEOMCHOI/Korean-Streaming-ASR) |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Wer | |
|
|:-------------:|:-----:|:-------:| |
|
| 9.09 | 1.0 | 11.51 | |
|
|
|
|
|
### dataset |
|
|
|
| λ°μ΄ν°μ
μ΄λ¦ | λ°μ΄ν° μν μ(train/test) | |
|
| --- | --- | |
|
| κ³ κ°μλμμ± | 2067668/21092 | |
|
| νκ΅μ΄ μμ± | 620000/3000 | |
|
| νκ΅μΈ λν μμ± | 2483570/142399 | |
|
| μμ λνμμ±(μΌλ°λ¨λ
) | 1886882/263371 | |
|
| λ³΅μ§ λΆμΌ μ½μΌν° μλ΄λ°μ΄ν° | 1096704/206470 | |
|
| μ°¨λλ΄ λν λ°μ΄ν° | 2624132/332787 | |
|
| λͺ
λ Ήμ΄ μμ±(λ
ΈμΈλ¨μ¬) | 137467/237469 | |
|
| μ 체 | 10916423(13946μκ°)/1206588(1474μκ°) | |
|
|
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- num_train_epoch: 1 |
|
- sample_rate: 16000 |
|
- max_duration: 20.0 |