zipformer-streaming-robust-sw / exp-causal /greedy_search /log-decode-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model-2024-03-07-08-38-15
w11wo's picture
Added MOdel
d4ce303
2024-03-07 08:38:15,365 INFO [decode.py:764] Decoding started
2024-03-07 08:38:15,365 INFO [decode.py:770] Device: cuda:0
2024-03-07 08:38:15,366 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
2024-03-07 08:38:15,369 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/greedy_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
2024-03-07 08:38:15,369 INFO [decode.py:780] About to create model
2024-03-07 08:38:15,616 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
2024-03-07 08:38:16,521 INFO [decode.py:908] Number of model parameters: 65182863
2024-03-07 08:38:16,521 INFO [multidataset.py:81] About to get FLEURS test cuts
2024-03-07 08:38:16,521 INFO [multidataset.py:83] Loading FLEURS in lazy mode
2024-03-07 08:38:16,522 INFO [multidataset.py:90] About to get Common Voice test cuts
2024-03-07 08:38:16,522 INFO [multidataset.py:92] Loading Common Voice in lazy mode
2024-03-07 08:38:17,254 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
2024-03-07 08:38:21,746 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6919, 1.8132, 1.6442, 1.9278, 2.0388, 1.9352, 2.1089, 1.7078],
device='cuda:0')
2024-03-07 08:38:23,390 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.4804, 4.7386, 4.4316, 4.4934], device='cuda:0')
2024-03-07 08:38:23,574 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
2024-03-07 08:38:23,615 INFO [utils.py:656] [test-fleurs-greedy_search] %WER 6.58% [4054 / 61587, 1612 ins, 1154 del, 1288 sub ]
2024-03-07 08:38:23,708 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
2024-03-07 08:38:23,708 INFO [decode.py:690]
For test-fleurs, WER of different settings are:
greedy_search 6.58 best for test-fleurs
2024-03-07 08:38:24,424 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
2024-03-07 08:38:26,567 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.7743, 3.4609, 3.1923, 3.4857], device='cuda:0')
2024-03-07 08:38:29,582 INFO [decode.py:651] batch 50/?, cuts processed until now is 1611
2024-03-07 08:38:34,671 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
2024-03-07 08:38:38,604 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.8230, 2.9017, 2.8725, 2.2266], device='cuda:0')
2024-03-07 08:38:39,685 INFO [decode.py:651] batch 150/?, cuts processed until now is 4896
2024-03-07 08:38:44,671 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
2024-03-07 08:38:49,727 INFO [decode.py:651] batch 250/?, cuts processed until now is 8173
2024-03-07 08:38:50,536 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2378, 3.0733, 2.4014, 2.7650], device='cuda:0')
2024-03-07 08:38:50,823 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.5433, 1.8018, 1.6968, 1.5210, 1.7601, 1.7079, 1.5636, 1.8637],
device='cuda:0')
2024-03-07 08:38:51,210 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0157, 2.9472, 2.5216, 2.4810], device='cuda:0')
2024-03-07 08:38:54,561 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
2024-03-07 08:38:55,938 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.0921, 4.2424, 3.6664, 3.7351], device='cuda:0')
2024-03-07 08:38:59,584 INFO [decode.py:651] batch 350/?, cuts processed until now is 11592
2024-03-07 08:39:00,118 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5326, 4.0145, 4.0918, 4.3280], device='cuda:0')
2024-03-07 08:39:02,056 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
2024-03-07 08:39:02,445 INFO [utils.py:656] [test-commonvoice-greedy_search] %WER 7.71% [48192 / 624874, 10414 ins, 21811 del, 15967 sub ]
2024-03-07 08:39:03,298 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
2024-03-07 08:39:03,298 INFO [decode.py:690]
For test-commonvoice, WER of different settings are:
greedy_search 7.71 best for test-commonvoice
2024-03-07 08:39:03,299 INFO [decode.py:944] Done!