YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/428
Pre-trained Transducer-Stateless5 models for the TAL_CSASR dataset with icefall.
The model was trained on the far data of TAL_CSASR with the scripts in icefall based on the latest version k2.
Training procedure
The main repositories are list below, we will update the training and decoding scripts with the update of version.
k2: https://github.com/k2-fsa/k2
icefall: https://github.com/k2-fsa/icefall
lhotse: https://github.com/lhotse-speech/lhotse
- Install k2 and lhotse, k2 installation guide refers to https://k2.readthedocs.io/en/latest/installation/index.html, lhotse refers to https://lhotse.readthedocs.io/en/latest/getting-started.html#installation. I think the latest version would be ok. And please also install the requirements listed in icefall.
- Clone icefall(https://github.com/k2-fsa/icefall) and check to the commit showed above.
git clone https://github.com/k2-fsa/icefall
cd icefall
- Preparing data.
cd egs/tal_csasr/ASR
bash ./prepare.sh
- Training
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./pruned_transducer_stateless5/train.py \
--world-size 6 \
--num-epochs 30 \
--start-epoch 1 \
--exp-dir pruned_transducer_stateless5/exp \
--lang-dir data/lang_char \
--max-duration 90
Evaluation results
The decoding results (CER%) on TAL_CSASR(dev and test) are listed below:
decoding-method | epoch(iter) | avg | dev | test |
---|---|---|---|---|
greedy_search | 30 | 24 | 7.49 | 7.58 |
modified_beam_search | 30 | 24 | 7.33 | 7.38 |
fast_beam_search | 30 | 24 | 7.32 | 7.42 |
greedy_search(use-averaged-model=True) | 30 | 24 | 7.30 | 7.39 |
modified_beam_search(use-averaged-model=True) | 30 | 24 | 7.15 | 7.22 |
fast_beam_search(use-averaged-model=True) | 30 | 24 | 7.18 | 7.27 |
greedy_search | 348000 | 30 | 7.46 | 7.54 |
modified_beam_search | 348000 | 30 | 7.24 | 7.36 |
fast_beam_search | 348000 | 30 | 7.25 | 7.39 |
The results (CER(%) and WER(%)) for Chinese CER and English WER respectivly (zh: Chinese, en: English):
decoding-method | epoch(iter) | avg | dev | dev_zh | dev_en | test | test_zh | test_en |
---|---|---|---|---|---|---|---|---|
greedy_search(use-averaged-model=True) | 30 | 24 | 7.30 | 6.48 | 19.19 | 7.39 | 6.66 | 19.13 |
modified_beam_search(use-averaged-model=True) | 30 | 24 | 7.15 | 6.35 | 18.95 | 7.22 | 6.50 | 18.70 |
fast_beam_search(use-averaged-model=True) | 30 | 24 | 7.18 | 6.39 | 18.90 | 7.27 | 6.55 | 18.77 |