SungBeom's picture
Update README.md
68acc64
|
raw
history blame
2.02 kB
metadata
license: apache-2.0
language:
  - ko
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
  - conformer-ctc
metrics:
  - wer

Conformer-ctc-medium-ko

ํ•ด๋‹น ๋ชจ๋ธ์€ RIVA Conformer ASR Korean์„ AI hub dataset์— ๋Œ€ํ•ด ํŒŒ์ธํŠœ๋‹์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
Conformer ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์€ whisper์™€ ๊ฐ™์€ attention ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ streaming์„ ์ง„ํ–‰ํ•˜์—ฌ๋„ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๋–จ์–ด์ง€์ง€ ์•Š๊ณ , ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
V100 GPU์—์„œ๋Š” RTF๊ฐ€ 0.05, CPU(7 cores)์—์„œ๋Š” 0.35 ์ •๋„ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
์˜ค๋””์˜ค chunk size 2์ดˆ์˜ streaming ํ…Œ์ŠคํŠธ์—์„œ๋Š” ์ „์ฒด ์˜ค๋””์˜ค๋ฅผ ๋„ฃ๋Š” ๊ฒƒ์— ๋น„ํ•ด์„œ๋Š” 20% ์ •๋„ ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ์žˆ์œผ๋‚˜ ์ถฉ๋ถ„ํžˆ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์„ฑ๋Šฅ์ž…๋‹ˆ๋‹ค.
์ถ”๊ฐ€๋กœ open domain์ด ์•„๋‹Œ ๊ณ ๊ฐ ์‘๋Œ€ ์Œ์„ฑ๊ณผ ๊ฐ™์€ domain์—์„œ๋Š” kenlm์„ ์ถ”๊ฐ€ํ•˜์˜€์„ ๋•Œ WER 13.45์—์„œ WER 5.27๋กœ ํฌ๊ฒŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ๊ทธ ์™ธ์˜ domain์—์„œ๋Š” kenlm์˜ ์ถ”๊ฐ€๊ฐ€ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

dataset

๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ ์ˆ˜(train/test)
๊ณ ๊ฐ์‘๋Œ€์Œ์„ฑ 2067668/21092
ํ•œ๊ตญ์–ด ์Œ์„ฑ 620000/3000
ํ•œ๊ตญ์ธ ๋Œ€ํ™” ์Œ์„ฑ 2483570/142399
์ž์œ ๋Œ€ํ™”์Œ์„ฑ(์ผ๋ฐ˜๋‚จ๋…€) 1886882/263371
๋ณต์ง€ ๋ถ„์•ผ ์ฝœ์„ผํ„ฐ ์ƒ๋‹ด๋ฐ์ดํ„ฐ 1096704/206470
์ฐจ๋Ÿ‰๋‚ด ๋Œ€ํ™” ๋ฐ์ดํ„ฐ 2624132/332787
๋ช…๋ น์–ด ์Œ์„ฑ(๋…ธ์ธ๋‚จ์—ฌ) 137467/237469
์ „์ฒด 10916423(13946์‹œ๊ฐ„)/1206588(1474์‹œ๊ฐ„)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • num_train_epoch: 1
  • sample_rate: 16000
  • max_duration: 20.0

Training results

Training Loss Epoch Wer
9.09 1.0 11.51