luomingshuang's picture
Update README.md
98587d9

Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/428

Pre-trained Transducer-Stateless5 models for the TAL_CSASR dataset with icefall.

The model was trained on the far data of TAL_CSASR with the scripts in icefall based on the latest version k2.

Training procedure

The main repositories are list below, we will update the training and decoding scripts with the update of version.
k2: https://github.com/k2-fsa/k2 icefall: https://github.com/k2-fsa/icefall lhotse: https://github.com/lhotse-speech/lhotse

git clone https://github.com/k2-fsa/icefall
cd icefall
  • Preparing data.
cd egs/tal_csasr/ASR
bash ./prepare.sh
  • Training
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./pruned_transducer_stateless5/train.py \
                  --world-size 6 \
                  --num-epochs 30 \
                  --start-epoch 1 \
                  --exp-dir pruned_transducer_stateless5/exp \
                  --lang-dir data/lang_char \
                  --max-duration 90

Evaluation results

The decoding results (CER%) on TAL_CSASR(dev and test) are listed below:

decoding-method epoch(iter) avg dev test
greedy_search 30 24 7.49 7.58
modified_beam_search 30 24 7.33 7.38
fast_beam_search 30 24 7.32 7.42
greedy_search(use-averaged-model=True) 30 24 7.30 7.39
modified_beam_search(use-averaged-model=True) 30 24 7.15 7.22
fast_beam_search(use-averaged-model=True) 30 24 7.18 7.27
greedy_search 348000 30 7.46 7.54
modified_beam_search 348000 30 7.24 7.36
fast_beam_search 348000 30 7.25 7.39

The results (CER(%) and WER(%)) for Chinese CER and English WER respectivly (zh: Chinese, en: English):

decoding-method epoch(iter) avg dev dev_zh dev_en test test_zh test_en
greedy_search(use-averaged-model=True) 30 24 7.30 6.48 19.19 7.39 6.66 19.13
modified_beam_search(use-averaged-model=True) 30 24 7.15 6.35 18.95 7.22 6.50 18.70
fast_beam_search(use-averaged-model=True) 30 24 7.18 6.39 18.90 7.27 6.55 18.77