--- license: other license_name: espnet license_link: LICENSE language: - ko library_name: espnet tags: - asr pipeline_tag: automatic-speech-recognition --- ## ESPnet2 ASR model ### `espnet/shihlun_asr_whisper_medium_finetuned_chime4` This model was trained by Shih-Lun Wu (slseanwu) using the chime4 recipe in [espnet](https://github.com/espnet/espnet/). ### Demo: How to use in ESPnet2 #!/usr/bin/env bash # Set bash to 'debug' mode, it will exit on : # -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands', set -e set -u set -o pipefail train_set=train valid_set=dev test_sets="dev test1" asr_config=conf/train_asr_whisper_large_lora_finetune.yaml inference_config=conf/decode_asr_whisper_noctc_beam10.yaml lm_config=conf/train_lm_transformer.yaml use_lm=false use_wordlm=false # speed perturbation related # (train_set will be "${train_set}_sp" if speed_perturb_factors is specified) speed_perturb_factors="0.9 1.0 1.1" ./asr.sh \ ./asr.sh \ --skip_data_prep false \ --skip_train false \ --gpu_inference true \ --ngpu 4 \ --lang ko \ --token_type whisper_multilingual \ --feats_normalize "" \ --stage 11 \ --use_lm ${use_lm} \ --use_word_lm ${use_wordlm} \ --lm_config "${lm_config}" \ --cleaner whisper_basic \ --asr_config "${asr_config}" \ --inference_config "${inference_config}" \ --train_set "${train_set}" \ --valid_set "${valid_set}" \ --test_sets "${test_sets}" \ --speed_perturb_factors "${speed_perturb_factors}" \ --asr_speech_fold_length 512 \ --asr_text_fold_length 150 \ --lm_fold_length 150 \ --lm_train_text "data/${train_set}/text" "$@" ``` # RESULTS ## Environments - date: `Tue Jan 10 04:15:30 CST 2023` - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]` - espnet version: `espnet 202211` - pytorch version: `pytorch 1.12.1` - Git hash: `d89be931dcc8f61437ac49cbe39a773f2054c50c` - Commit date: `Mon Jan 9 11:06:45 2023 -0600` ## whisper_large_v2_lora_fintuning ### WER |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|24791|97.8|1.7|0.5|0.3|2.5|24.5| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|24792|96.1|3.0|0.9|0.5|4.4|35.6| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|19341|96.4|2.9|0.7|0.5|4.1|33.0| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|19344|93.4|5.0|1.7|0.8|7.4|41.8| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|24791|97.7|1.8|0.5|0.4|2.8|25.5| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|24792|96.0|3.3|0.8|0.7|4.8|36.0| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|19341|96.1|3.3|0.6|0.7|4.6|34.9| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|19344|92.9|5.8|1.3|1.2|8.3|43.2| ### CER |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|141889|99.1|0.3|0.5|0.3|1.2|24.5| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|141900|98.2|0.8|1.0|0.5|2.3|35.6| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|110558|98.5|0.7|0.8|0.5|1.9|33.0| |decode_asr_whisper_noctc_beam20_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|110572|96.5|1.6|1.9|0.8|4.3|41.8| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|141889|99.1|0.4|0.5|0.5|1.3|25.5| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|141900|98.2|0.9|0.9|0.6|2.4|36.0| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|110558|98.4|0.9|0.7|0.6|2.2|34.9| |decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|110572|96.3|2.0|1.7|1.2|4.9|43.2|