w11wo commited on
Commit
d4ce303
·
1 Parent(s): f2b1dfb

Added MOdel

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +195 -0
  2. data/lang_phone/L.pt +3 -0
  3. data/lang_phone/L_disambig.pt +3 -0
  4. data/lang_phone/Linv.pt +3 -0
  5. data/lang_phone/lexicon.txt +37 -0
  6. data/lang_phone/lexicon_disambig.txt +37 -0
  7. data/lang_phone/tokens.txt +39 -0
  8. data/lang_phone/words.txt +41 -0
  9. exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  10. exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  11. exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-57-38 +7 -0
  12. exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-59-38 +37 -0
  13. exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  14. exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  15. exp-causal/ctc-decoding/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +2 -0
  16. exp-causal/ctc-decoding/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +2 -0
  17. exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  18. exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  19. exp-causal/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2024-03-07-08-39-05 +66 -0
  20. exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  21. exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
  22. exp-causal/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
  23. exp-causal/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
  24. exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  25. exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  26. exp-causal/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model-2024-03-07-08-38-15 +46 -0
  27. exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  28. exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
  29. exp-causal/greedy_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
  30. exp-causal/greedy_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
  31. exp-causal/jit_script_chunk_32_left_128.pt +3 -0
  32. exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  33. exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  34. exp-causal/modified_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model-2024-03-07-08-41-07 +56 -0
  35. exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  36. exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
  37. exp-causal/modified_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
  38. exp-causal/modified_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
  39. exp-causal/pretrained.pt +3 -0
  40. exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  41. exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  42. exp-causal/streaming/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model-2024-03-07-08-56-46 +154 -0
  43. exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  44. exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
  45. exp-causal/streaming/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +2 -0
  46. exp-causal/streaming/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +2 -0
  47. exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  48. exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
  49. exp-causal/streaming/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-08-54-56 +154 -0
  50. exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
README.md CHANGED
@@ -1,3 +1,198 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: sw
3
  license: apache-2.0
4
+ tags:
5
+ - icefall
6
+ - phoneme-recognition
7
+ - automatic-speech-recognition
8
+ datasets:
9
+ - bookbot/ALFFA_swahili
10
+ - bookbot/fleurs_sw
11
+ - bookbot/common_voice_16_1_sw
12
  ---
13
+
14
+ # Pruned Stateless Zipformer RNN-T Streaming Robust SW
15
+
16
+ Pruned Stateless Zipformer RNN-T Streaming Robust SW is an automatic speech recognition model trained on the following datasets:
17
+
18
+ - [ALFFA Swahili](https://huggingface.co/datasets/bookbot/ALFFA_swahili)
19
+ - [FLEURS Swahili](https://huggingface.co/datasets/bookbot/fleurs_sw)
20
+ - [Common Voice 16.1 Swahili](https://huggingface.co/datasets/bookbot/common_voice_16_1_sw)
21
+
22
+ Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. `["w", "ɑ", "ʃ", "i", "ɑ"]`. Therefore, the model's [vocabulary](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/blob/main/data/lang_phone/tokens.txt) contains the different IPA phonemes found in [gruut](https://github.com/rhasspy/gruut).
23
+
24
+ This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on a Scaleway RENDER-S VM with a NVIDIA H100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/tensorboard) logged via Tensorboard.
25
+
26
+ ## Evaluation Results
27
+
28
+ ### Simulated Streaming
29
+
30
+ ```sh
31
+ for m in greedy_search fast_beam_search modified_beam_search; do
32
+ ./zipformer/decode.py \
33
+ --epoch 40 \
34
+ --avg 7 \
35
+ --causal 1 \
36
+ --chunk-size 32 \
37
+ --left-context-frames 128 \
38
+ --exp-dir zipformer/exp-causal \
39
+ --use-transducer True --use-ctc True \
40
+ --decoding-method $m
41
+ done
42
+ ```
43
+
44
+ ```sh
45
+ ./zipformer/ctc_decode.py \
46
+ --epoch 40 \
47
+ --avg 7 \
48
+ --causal 1 \
49
+ --chunk-size 32 \
50
+ --left-context-frames 128 \
51
+ --exp-dir zipformer/exp-causal \
52
+ --decoding-method ctc-decoding \
53
+ --use-transducer True --use-ctc True
54
+ ```
55
+
56
+ The model achieves the following phoneme error rates on the different test sets:
57
+
58
+ | Decoding | Common Voice 16.1 | FLEURS |
59
+ | -------------------- | :---------------: | :----: |
60
+ | Greedy Search | 7.71 | 6.58 |
61
+ | Modified Beam Search | 7.53 | 6.4 |
62
+ | Fast Beam Search | 7.73 | 6.61 |
63
+ | CTC Greedy Search | 7.78 | 6.72 |
64
+
65
+ ### Chunk-wise Streaming
66
+
67
+ ```sh
68
+ for m in greedy_search fast_beam_search modified_beam_search; do
69
+ ./zipformer/streaming_decode.py \
70
+ --epoch 40 \
71
+ --avg 7 \
72
+ --causal 1 \
73
+ --chunk-size 32 \
74
+ --left-context-frames 128 \
75
+ --exp-dir zipformer/exp-causal \
76
+ --use-transducer True --use-ctc True \
77
+ --decoding-method $m \
78
+ --num-decode-streams 1000
79
+ done
80
+ ```
81
+
82
+ The model achieves the following phoneme error rates on the different test sets:
83
+
84
+ | Decoding | Common Voice 16.1 | FLEURS |
85
+ | -------------------- | :---------------: | :----: |
86
+ | Greedy Search | 7.75 | 6.59 |
87
+ | Modified Beam Search | 7.57 | 6.37 |
88
+ | Fast Beam Search | 7.72 | 6.44 |
89
+
90
+ ## Usage
91
+
92
+ ### Download Pre-trained Model
93
+
94
+ ```sh
95
+ cd egs/bookbot_sw/ASR
96
+ mkdir tmp
97
+ cd tmp
98
+ git lfs install
99
+ git clone https://huggingface.co/bookbot/zipformer-streaming-robust-sw/
100
+ ```
101
+
102
+ ### Inference
103
+
104
+ To decode with greedy search, run:
105
+
106
+ ```sh
107
+ ./zipformer/jit_pretrained_streaming.py \
108
+ --nn-model-filename ./tmp/zipformer-streaming-robust-sw/exp-causal/jit_script_chunk_32_left_128.pt \
109
+ --tokens ./tmp/zipformer-streaming-robust-sw/data/lang_phone/tokens.txt \
110
+ ./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
111
+ ```
112
+
113
+ <details>
114
+ <summary>Decoding Output</summary>
115
+
116
+ ```
117
+ 2024-03-07 11:07:41,231 INFO [jit_pretrained_streaming.py:184] device: cuda:0
118
+ 2024-03-07 11:07:41,865 INFO [jit_pretrained_streaming.py:197] Constructing Fbank computer
119
+ 2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:200] Reading sound files: ./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
120
+ 2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:205] torch.Size([125568])
121
+ 2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:207] Decoding started
122
+ 2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:212] chunk_length: 64
123
+ 2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:213] T: 77
124
+ 2024-03-07 11:07:41,876 INFO [jit_pretrained_streaming.py:229] 0/130368
125
+ 2024-03-07 11:07:41,877 INFO [jit_pretrained_streaming.py:229] 4000/130368
126
+ 2024-03-07 11:07:41,878 INFO [jit_pretrained_streaming.py:229] 8000/130368
127
+ 2024-03-07 11:07:41,879 INFO [jit_pretrained_streaming.py:229] 12000/130368
128
+ 2024-03-07 11:07:42,103 INFO [jit_pretrained_streaming.py:229] 16000/130368
129
+ 2024-03-07 11:07:42,104 INFO [jit_pretrained_streaming.py:229] 20000/130368
130
+ 2024-03-07 11:07:42,126 INFO [jit_pretrained_streaming.py:229] 24000/130368
131
+ 2024-03-07 11:07:42,127 INFO [jit_pretrained_streaming.py:229] 28000/130368
132
+ 2024-03-07 11:07:42,128 INFO [jit_pretrained_streaming.py:229] 32000/130368
133
+ 2024-03-07 11:07:42,151 INFO [jit_pretrained_streaming.py:229] 36000/130368
134
+ 2024-03-07 11:07:42,152 INFO [jit_pretrained_streaming.py:229] 40000/130368
135
+ 2024-03-07 11:07:42,175 INFO [jit_pretrained_streaming.py:229] 44000/130368
136
+ 2024-03-07 11:07:42,176 INFO [jit_pretrained_streaming.py:229] 48000/130368
137
+ 2024-03-07 11:07:42,177 INFO [jit_pretrained_streaming.py:229] 52000/130368
138
+ 2024-03-07 11:07:42,200 INFO [jit_pretrained_streaming.py:229] 56000/130368
139
+ 2024-03-07 11:07:42,201 INFO [jit_pretrained_streaming.py:229] 60000/130368
140
+ 2024-03-07 11:07:42,224 INFO [jit_pretrained_streaming.py:229] 64000/130368
141
+ 2024-03-07 11:07:42,226 INFO [jit_pretrained_streaming.py:229] 68000/130368
142
+ 2024-03-07 11:07:42,226 INFO [jit_pretrained_streaming.py:229] 72000/130368
143
+ 2024-03-07 11:07:42,250 INFO [jit_pretrained_streaming.py:229] 76000/130368
144
+ 2024-03-07 11:07:42,251 INFO [jit_pretrained_streaming.py:229] 80000/130368
145
+ 2024-03-07 11:07:42,252 INFO [jit_pretrained_streaming.py:229] 84000/130368
146
+ 2024-03-07 11:07:42,275 INFO [jit_pretrained_streaming.py:229] 88000/130368
147
+ 2024-03-07 11:07:42,276 INFO [jit_pretrained_streaming.py:229] 92000/130368
148
+ 2024-03-07 11:07:42,299 INFO [jit_pretrained_streaming.py:229] 96000/130368
149
+ 2024-03-07 11:07:42,300 INFO [jit_pretrained_streaming.py:229] 100000/130368
150
+ 2024-03-07 11:07:42,301 INFO [jit_pretrained_streaming.py:229] 104000/130368
151
+ 2024-03-07 11:07:42,325 INFO [jit_pretrained_streaming.py:229] 108000/130368
152
+ 2024-03-07 11:07:42,326 INFO [jit_pretrained_streaming.py:229] 112000/130368
153
+ 2024-03-07 11:07:42,349 INFO [jit_pretrained_streaming.py:229] 116000/130368
154
+ 2024-03-07 11:07:42,350 INFO [jit_pretrained_streaming.py:229] 120000/130368
155
+ 2024-03-07 11:07:42,351 INFO [jit_pretrained_streaming.py:229] 124000/130368
156
+ 2024-03-07 11:07:42,373 INFO [jit_pretrained_streaming.py:229] 128000/130368
157
+ 2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:259] ./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
158
+ 2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:260] ʃiɑ|ɑᵐɓɑɔ|wɑnɑiʃi|hɑsɑ|kɑtikɑ|ɛnɛɔ|lɑ|mɑʃɑɾiki|kɑtikɑ|ufɑlmɛ|huɔ|wɛnjɛ|utɑʄiɾi|wɑ|mɑfutɑ
159
+ 2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:262] Decoding Done
160
+ ```
161
+
162
+ </details>
163
+
164
+ ## Training procedure
165
+
166
+ ### Install icefall
167
+
168
+ ```sh
169
+ git clone https://github.com/bookbot-hive/icefall
170
+ cd icefall
171
+ export PYTHONPATH=`pwd`:$PYTHONPATH
172
+ ```
173
+
174
+ ### Prepare Data
175
+
176
+ ```sh
177
+ cd egs/bookbot_sw/ASR
178
+ ./prepare.sh
179
+ ```
180
+
181
+ ### Train
182
+
183
+ ```sh
184
+ export CUDA_VISIBLE_DEVICES="0"
185
+ ./zipformer/train.py \
186
+ --num-epochs 40 \
187
+ --use-fp16 1 \
188
+ --exp-dir zipformer/exp-causal \
189
+ --causal 1 \
190
+ --max-duration 800 \
191
+ --use-transducer True --use-ctc True
192
+ ```
193
+
194
+ ## Frameworks
195
+
196
+ - [k2](https://github.com/k2-fsa/k2)
197
+ - [icefall](https://github.com/bookbot-hive/icefall)
198
+ - [lhotse](https://github.com/bookbot-hive/lhotse)
data/lang_phone/L.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:521562864ec9620dcf30c713f16614a861b4570d6f633e1c5a006b8743a3a304
3
+ size 1679
data/lang_phone/L_disambig.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2fb6bfaace3c1d9b8c0472e64a5621422eb0222ec4917875bde509e5ace233a
3
+ size 1715
data/lang_phone/Linv.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29794d6988b503cfcec0bd6e7dcbe1f0450442c31e162820214429accafaaa3d
3
+ size 1691
data/lang_phone/lexicon.txt ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ f f
2
+ h h
3
+ i i
4
+ j j
5
+ k k
6
+ l l
7
+ m m
8
+ n n
9
+ p p
10
+ s s
11
+ t t
12
+ t͡ʃ t͡ʃ
13
+ u u
14
+ v v
15
+ w w
16
+ x x
17
+ z z
18
+ | |
19
+ ð ð
20
+ ɑ ɑ
21
+ ɓ ɓ
22
+ ɔ ɔ
23
+ ɗ ɗ
24
+ ɛ ɛ
25
+ ɠ ɠ
26
+ ɣ ɣ
27
+ ɾ ɾ
28
+ ʃ ʃ
29
+ ʄ ʄ
30
+ θ θ
31
+ ᵐɓ ᵐɓ
32
+ ᵑg ᵑg
33
+ ᶬv ᶬv
34
+ ⁿz ⁿz
35
+ ⁿɗ ⁿɗ
36
+ ⁿɗ͡ʒ ⁿɗ͡ʒ
37
+ <UNK> <UNK>
data/lang_phone/lexicon_disambig.txt ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ f f
2
+ h h
3
+ i i
4
+ j j
5
+ k k
6
+ l l
7
+ m m
8
+ n n
9
+ p p
10
+ s s
11
+ t t
12
+ t͡ʃ t͡ʃ
13
+ u u
14
+ v v
15
+ w w
16
+ x x
17
+ z z
18
+ | |
19
+ ð ð
20
+ ɑ ɑ
21
+ ɓ ɓ
22
+ ɔ ɔ
23
+ ɗ ɗ
24
+ ɛ ɛ
25
+ ɠ ɠ
26
+ ɣ ɣ
27
+ ɾ ɾ
28
+ ʃ ʃ
29
+ ʄ ʄ
30
+ θ θ
31
+ ᵐɓ ᵐɓ
32
+ ᵑg ᵑg
33
+ ᶬv ᶬv
34
+ ⁿz ⁿz
35
+ ⁿɗ ⁿɗ
36
+ ⁿɗ͡ʒ ⁿɗ͡ʒ
37
+ <UNK> <UNK>
data/lang_phone/tokens.txt ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ s 1
3
+ ð 2
4
+ ᵑg 3
5
+ ᶬv 4
6
+ ʃ 5
7
+ ɔ 6
8
+ x 7
9
+ t 8
10
+ ɛ 9
11
+ v 10
12
+ ⁿɗ͡ʒ 11
13
+ f 12
14
+ n 13
15
+ | 14
16
+ ⁿz 15
17
+ k 16
18
+ h 17
19
+ t͡ʃ 18
20
+ <UNK> 19
21
+ ɗ 20
22
+ z 21
23
+ m 22
24
+ ʄ 23
25
+ ɠ 24
26
+ θ 25
27
+ j 26
28
+ ᵐɓ 27
29
+ u 28
30
+ ɣ 29
31
+ ɓ 30
32
+ i 31
33
+ l 32
34
+ ɾ 33
35
+ ⁿɗ 34
36
+ w 35
37
+ p 36
38
+ ɑ 37
39
+ #0 38
data/lang_phone/words.txt ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <eps> 0
2
+ <UNK> 1
3
+ f 2
4
+ h 3
5
+ i 4
6
+ j 5
7
+ k 6
8
+ l 7
9
+ m 8
10
+ n 9
11
+ p 10
12
+ s 11
13
+ t 12
14
+ t͡ʃ 13
15
+ u 14
16
+ v 15
17
+ w 16
18
+ x 17
19
+ z 18
20
+ | 19
21
+ ð 20
22
+ ɑ 21
23
+ ɓ 22
24
+ ɔ 23
25
+ ɗ 24
26
+ ɛ 25
27
+ ɠ 26
28
+ ɣ 27
29
+ ɾ 28
30
+ ʃ 29
31
+ ʄ 30
32
+ θ 31
33
+ ᵐɓ 32
34
+ ᵑg 33
35
+ ᶬv 34
36
+ ⁿz 35
37
+ ⁿɗ 36
38
+ ⁿɗ͡ʒ 37
39
+ #0 38
40
+ <s> 39
41
+ </s> 40
exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-57-38 ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ 2024-03-07 10:57:38,784 INFO [ctc_decode.py:631] Decoding started
2
+ 2024-03-07 10:57:38,784 INFO [ctc_decode.py:637] Device: cuda:0
3
+ 2024-03-07 10:57:38,784 INFO [ctc_decode.py:638] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'frame_shift_ms': 10, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'context_size': 2, 'decoding_method': 'ctc-decoding', 'num_paths': 100, 'nbest_scale': 1.0, 'hlg_scale': 0.6, 'lm_dir': PosixPath('data/lm'), 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/ctc-decoding'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model'}
4
+ 2024-03-07 10:57:38,784 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
5
+ 2024-03-07 10:57:38,948 INFO [ctc_decode.py:713] About to create model
6
+ 2024-03-07 10:57:39,170 INFO [ctc_decode.py:780] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 10:57:39,809 INFO [ctc_decode.py:797] Number of model parameters: 65182863
exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-59-38 ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 10:59:38,354 INFO [ctc_decode.py:621] Decoding started
2
+ 2024-03-07 10:59:38,354 INFO [ctc_decode.py:627] Device: cuda:0
3
+ 2024-03-07 10:59:38,354 INFO [ctc_decode.py:628] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'frame_shift_ms': 10, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'context_size': 2, 'decoding_method': 'ctc-decoding', 'num_paths': 100, 'nbest_scale': 1.0, 'hlg_scale': 0.6, 'lm_dir': PosixPath('data/lm'), 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/ctc-decoding'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model'}
4
+ 2024-03-07 10:59:38,355 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
5
+ 2024-03-07 10:59:38,522 INFO [ctc_decode.py:701] About to create model
6
+ 2024-03-07 10:59:38,744 INFO [ctc_decode.py:756] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 10:59:39,391 INFO [ctc_decode.py:772] Number of model parameters: 65182863
8
+ 2024-03-07 10:59:39,392 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 10:59:39,392 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 10:59:39,392 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 10:59:39,392 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 10:59:39,992 INFO [ctc_decode.py:542] batch 0/?, cuts processed until now is 11
13
+ 2024-03-07 10:59:44,584 INFO [ctc_decode.py:556] The transcripts are stored in zipformer/exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
14
+ 2024-03-07 10:59:44,625 INFO [utils.py:656] [test-fleurs-ctc-decoding] %WER 6.72% [4137 / 61587, 1757 ins, 1036 del, 1344 sub ]
15
+ 2024-03-07 10:59:44,719 INFO [ctc_decode.py:565] Wrote detailed error stats to zipformer/exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
16
+ 2024-03-07 10:59:44,719 INFO [ctc_decode.py:579]
17
+ For test-fleurs, WER of different settings are:
18
+ ctc-decoding 6.72 best for test-fleurs
19
+
20
+ 2024-03-07 10:59:45,379 INFO [ctc_decode.py:542] batch 0/?, cuts processed until now is 28
21
+ 2024-03-07 10:59:52,644 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4930, 1.6467, 1.4691, 1.4399, 1.6736, 1.6094, 1.5041, 1.7942],
22
+ device='cuda:0')
23
+ 2024-03-07 10:59:53,068 INFO [ctc_decode.py:542] batch 100/?, cuts processed until now is 3210
24
+ 2024-03-07 10:59:57,567 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6248, 2.7026, 2.6528, 1.9603], device='cuda:0')
25
+ 2024-03-07 10:59:58,159 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1807, 3.0239, 2.3377, 2.7052], device='cuda:0')
26
+ 2024-03-07 10:59:58,511 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.7430, 1.7438, 1.6404, 1.8630, 1.9495, 1.9508, 2.0193, 1.6035],
27
+ device='cuda:0')
28
+ 2024-03-07 11:00:00,641 INFO [ctc_decode.py:542] batch 200/?, cuts processed until now is 6582
29
+ 2024-03-07 11:00:08,194 INFO [ctc_decode.py:542] batch 300/?, cuts processed until now is 9972
30
+ 2024-03-07 11:00:13,903 INFO [ctc_decode.py:556] The transcripts are stored in zipformer/exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
31
+ 2024-03-07 11:00:14,287 INFO [utils.py:656] [test-commonvoice-ctc-decoding] %WER 7.78% [48615 / 624874, 12396 ins, 20104 del, 16115 sub ]
32
+ 2024-03-07 11:00:15,129 INFO [ctc_decode.py:565] Wrote detailed error stats to zipformer/exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
33
+ 2024-03-07 11:00:15,130 INFO [ctc_decode.py:579]
34
+ For test-commonvoice, WER of different settings are:
35
+ ctc-decoding 7.78 best for test-commonvoice
36
+
37
+ 2024-03-07 11:00:15,130 INFO [ctc_decode.py:806] Done!
exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/ctc-decoding/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ ctc-decoding 7.78
exp-causal/ctc-decoding/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ ctc-decoding 6.72
exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2024-03-07-08-39-05 ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 08:39:05,727 INFO [decode.py:764] Decoding started
2
+ 2024-03-07 08:39:05,727 INFO [decode.py:770] Device: cuda:0
3
+ 2024-03-07 08:39:05,727 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2024-03-07 08:39:05,728 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/fast_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
5
+ 2024-03-07 08:39:05,728 INFO [decode.py:780] About to create model
6
+ 2024-03-07 08:39:05,976 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 08:39:06,839 INFO [decode.py:908] Number of model parameters: 65182863
8
+ 2024-03-07 08:39:06,839 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 08:39:06,839 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 08:39:06,839 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 08:39:06,839 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 08:39:07,885 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
13
+ 2024-03-07 08:39:09,245 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2843, 3.2412, 2.7568, 2.6566], device='cuda:0')
14
+ 2024-03-07 08:39:17,182 INFO [decode.py:651] batch 20/?, cuts processed until now is 270
15
+ 2024-03-07 08:39:26,436 INFO [decode.py:651] batch 40/?, cuts processed until now is 487
16
+ 2024-03-07 08:39:26,499 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
17
+ 2024-03-07 08:39:26,539 INFO [utils.py:656] [test-fleurs-beam_20.0_max_contexts_8_max_states_64] %WER 6.61% [4072 / 61587, 1548 ins, 1229 del, 1295 sub ]
18
+ 2024-03-07 08:39:26,632 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
19
+ 2024-03-07 08:39:26,633 INFO [decode.py:690]
20
+ For test-fleurs, WER of different settings are:
21
+ beam_20.0_max_contexts_8_max_states_64 6.61 best for test-fleurs
22
+
23
+ 2024-03-07 08:39:27,522 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
24
+ 2024-03-07 08:39:31,747 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.3396, 2.5719, 2.6157, 2.8669], device='cuda:0')
25
+ 2024-03-07 08:39:32,952 INFO [decode.py:651] batch 20/?, cuts processed until now is 628
26
+ 2024-03-07 08:39:38,196 INFO [decode.py:651] batch 40/?, cuts processed until now is 1253
27
+ 2024-03-07 08:39:43,193 INFO [decode.py:651] batch 60/?, cuts processed until now is 1940
28
+ 2024-03-07 08:39:45,893 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1773, 2.6102, 2.7401, 2.4829], device='cuda:0')
29
+ 2024-03-07 08:39:46,485 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.7060, 2.8868, 2.8695, 2.0508], device='cuda:0')
30
+ 2024-03-07 08:39:48,212 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.4578, 4.5392, 4.1744, 4.1141], device='cuda:0')
31
+ 2024-03-07 08:39:48,733 INFO [decode.py:651] batch 80/?, cuts processed until now is 2513
32
+ 2024-03-07 08:39:49,794 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1706, 2.6006, 2.7836, 2.5051], device='cuda:0')
33
+ 2024-03-07 08:39:51,031 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5083, 4.5611, 4.1912, 4.1362], device='cuda:0')
34
+ 2024-03-07 08:39:52,617 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.1998, 2.3700, 2.7924, 2.7577], device='cuda:0')
35
+ 2024-03-07 08:39:53,711 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
36
+ 2024-03-07 08:39:59,086 INFO [decode.py:651] batch 120/?, cuts processed until now is 3814
37
+ 2024-03-07 08:40:04,040 INFO [decode.py:651] batch 140/?, cuts processed until now is 4529
38
+ 2024-03-07 08:40:08,988 INFO [decode.py:651] batch 160/?, cuts processed until now is 5256
39
+ 2024-03-07 08:40:09,916 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5227, 4.0025, 4.1148, 4.3108], device='cuda:0')
40
+ 2024-03-07 08:40:11,126 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4430, 1.6413, 1.6686, 1.8403, 1.5950, 1.9201, 2.0724, 1.6969],
41
+ device='cuda:0')
42
+ 2024-03-07 08:40:14,084 INFO [decode.py:651] batch 180/?, cuts processed until now is 5927
43
+ 2024-03-07 08:40:18,458 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.8823, 3.2936, 3.0880, 3.2756], device='cuda:0')
44
+ 2024-03-07 08:40:19,214 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
45
+ 2024-03-07 08:40:22,724 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.5198, 1.6440, 1.5930, 1.8840, 1.5783, 1.8650, 2.0609, 1.6564],
46
+ device='cuda:0')
47
+ 2024-03-07 08:40:24,453 INFO [decode.py:651] batch 220/?, cuts processed until now is 7221
48
+ 2024-03-07 08:40:29,626 INFO [decode.py:651] batch 240/?, cuts processed until now is 7878
49
+ 2024-03-07 08:40:33,319 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.3668, 2.7808, 2.7429, 2.0601], device='cuda:0')
50
+ 2024-03-07 08:40:34,832 INFO [decode.py:651] batch 260/?, cuts processed until now is 8528
51
+ 2024-03-07 08:40:39,681 INFO [decode.py:651] batch 280/?, cuts processed until now is 9263
52
+ 2024-03-07 08:40:44,602 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
53
+ 2024-03-07 08:40:49,952 INFO [decode.py:651] batch 320/?, cuts processed until now is 10574
54
+ 2024-03-07 08:40:51,376 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0097, 2.9823, 2.5350, 2.4735], device='cuda:0')
55
+ 2024-03-07 08:40:54,529 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4511, 1.6635, 1.4021, 1.3549, 1.6671, 1.8209, 1.7242, 1.7077],
56
+ device='cuda:0')
57
+ 2024-03-07 08:40:54,975 INFO [decode.py:651] batch 340/?, cuts processed until now is 11255
58
+ 2024-03-07 08:41:00,108 INFO [decode.py:651] batch 360/?, cuts processed until now is 11900
59
+ 2024-03-07 08:41:03,743 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
60
+ 2024-03-07 08:41:04,136 INFO [utils.py:656] [test-commonvoice-beam_20.0_max_contexts_8_max_states_64] %WER 7.73% [48322 / 624874, 9678 ins, 23552 del, 15092 sub ]
61
+ 2024-03-07 08:41:04,996 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
62
+ 2024-03-07 08:41:04,997 INFO [decode.py:690]
63
+ For test-commonvoice, WER of different settings are:
64
+ beam_20.0_max_contexts_8_max_states_64 7.73 best for test-commonvoice
65
+
66
+ 2024-03-07 08:41:04,997 INFO [decode.py:944] Done!
exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_20.0_max_contexts_8_max_states_64 7.73
exp-causal/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_20.0_max_contexts_8_max_states_64 6.61
exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model-2024-03-07-08-38-15 ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 08:38:15,365 INFO [decode.py:764] Decoding started
2
+ 2024-03-07 08:38:15,365 INFO [decode.py:770] Device: cuda:0
3
+ 2024-03-07 08:38:15,366 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2024-03-07 08:38:15,369 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/greedy_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
5
+ 2024-03-07 08:38:15,369 INFO [decode.py:780] About to create model
6
+ 2024-03-07 08:38:15,616 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 08:38:16,521 INFO [decode.py:908] Number of model parameters: 65182863
8
+ 2024-03-07 08:38:16,521 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 08:38:16,521 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 08:38:16,522 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 08:38:16,522 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 08:38:17,254 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
13
+ 2024-03-07 08:38:21,746 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6919, 1.8132, 1.6442, 1.9278, 2.0388, 1.9352, 2.1089, 1.7078],
14
+ device='cuda:0')
15
+ 2024-03-07 08:38:23,390 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.4804, 4.7386, 4.4316, 4.4934], device='cuda:0')
16
+ 2024-03-07 08:38:23,574 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
17
+ 2024-03-07 08:38:23,615 INFO [utils.py:656] [test-fleurs-greedy_search] %WER 6.58% [4054 / 61587, 1612 ins, 1154 del, 1288 sub ]
18
+ 2024-03-07 08:38:23,708 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
19
+ 2024-03-07 08:38:23,708 INFO [decode.py:690]
20
+ For test-fleurs, WER of different settings are:
21
+ greedy_search 6.58 best for test-fleurs
22
+
23
+ 2024-03-07 08:38:24,424 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
24
+ 2024-03-07 08:38:26,567 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.7743, 3.4609, 3.1923, 3.4857], device='cuda:0')
25
+ 2024-03-07 08:38:29,582 INFO [decode.py:651] batch 50/?, cuts processed until now is 1611
26
+ 2024-03-07 08:38:34,671 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
27
+ 2024-03-07 08:38:38,604 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.8230, 2.9017, 2.8725, 2.2266], device='cuda:0')
28
+ 2024-03-07 08:38:39,685 INFO [decode.py:651] batch 150/?, cuts processed until now is 4896
29
+ 2024-03-07 08:38:44,671 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
30
+ 2024-03-07 08:38:49,727 INFO [decode.py:651] batch 250/?, cuts processed until now is 8173
31
+ 2024-03-07 08:38:50,536 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2378, 3.0733, 2.4014, 2.7650], device='cuda:0')
32
+ 2024-03-07 08:38:50,823 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.5433, 1.8018, 1.6968, 1.5210, 1.7601, 1.7079, 1.5636, 1.8637],
33
+ device='cuda:0')
34
+ 2024-03-07 08:38:51,210 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0157, 2.9472, 2.5216, 2.4810], device='cuda:0')
35
+ 2024-03-07 08:38:54,561 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
36
+ 2024-03-07 08:38:55,938 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.0921, 4.2424, 3.6664, 3.7351], device='cuda:0')
37
+ 2024-03-07 08:38:59,584 INFO [decode.py:651] batch 350/?, cuts processed until now is 11592
38
+ 2024-03-07 08:39:00,118 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5326, 4.0145, 4.0918, 4.3280], device='cuda:0')
39
+ 2024-03-07 08:39:02,056 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
40
+ 2024-03-07 08:39:02,445 INFO [utils.py:656] [test-commonvoice-greedy_search] %WER 7.71% [48192 / 624874, 10414 ins, 21811 del, 15967 sub ]
41
+ 2024-03-07 08:39:03,298 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
42
+ 2024-03-07 08:39:03,298 INFO [decode.py:690]
43
+ For test-commonvoice, WER of different settings are:
44
+ greedy_search 7.71 best for test-commonvoice
45
+
46
+ 2024-03-07 08:39:03,299 INFO [decode.py:944] Done!
exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/greedy_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ greedy_search 7.71
exp-causal/greedy_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ greedy_search 6.58
exp-causal/jit_script_chunk_32_left_128.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f7af6590fca59f1026cad0a0a2abee9332f146142d8e5c6d0b4975b6ce35f97
3
+ size 263594678
exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/modified_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model-2024-03-07-08-41-07 ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 08:41:07,436 INFO [decode.py:764] Decoding started
2
+ 2024-03-07 08:41:07,436 INFO [decode.py:770] Device: cuda:0
3
+ 2024-03-07 08:41:07,437 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2024-03-07 08:41:07,437 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'modified_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/modified_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
5
+ 2024-03-07 08:41:07,438 INFO [decode.py:780] About to create model
6
+ 2024-03-07 08:41:07,691 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 08:41:08,529 INFO [decode.py:908] Number of model parameters: 65182863
8
+ 2024-03-07 08:41:08,529 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 08:41:08,529 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 08:41:08,530 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 08:41:08,530 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 08:41:10,007 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
13
+ 2024-03-07 08:41:24,636 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.9650, 3.6353, 3.4530, 3.6656], device='cuda:0')
14
+ 2024-03-07 08:41:27,015 INFO [decode.py:651] batch 20/?, cuts processed until now is 270
15
+ 2024-03-07 08:41:41,955 INFO [decode.py:651] batch 40/?, cuts processed until now is 487
16
+ 2024-03-07 08:41:42,017 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
17
+ 2024-03-07 08:41:42,058 INFO [utils.py:656] [test-fleurs-beam_size_4] %WER 6.40% [3942 / 61587, 1687 ins, 950 del, 1305 sub ]
18
+ 2024-03-07 08:41:42,153 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
19
+ 2024-03-07 08:41:42,153 INFO [decode.py:690]
20
+ For test-fleurs, WER of different settings are:
21
+ beam_size_4 6.4 best for test-fleurs
22
+
23
+ 2024-03-07 08:41:43,500 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
24
+ 2024-03-07 08:41:57,852 INFO [decode.py:651] batch 20/?, cuts processed until now is 628
25
+ 2024-03-07 08:42:07,812 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.1801, 2.3482, 2.7994, 2.7716], device='cuda:0')
26
+ 2024-03-07 08:42:12,022 INFO [decode.py:651] batch 40/?, cuts processed until now is 1253
27
+ 2024-03-07 08:42:21,068 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6309, 1.6829, 1.5267, 1.7630, 1.8531, 1.8195, 1.8756, 1.6031],
28
+ device='cuda:0')
29
+ 2024-03-07 08:42:25,949 INFO [decode.py:651] batch 60/?, cuts processed until now is 1940
30
+ 2024-03-07 08:42:36,123 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2243, 3.0776, 2.3565, 2.7223], device='cuda:0')
31
+ 2024-03-07 08:42:40,387 INFO [decode.py:651] batch 80/?, cuts processed until now is 2513
32
+ 2024-03-07 08:42:54,535 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
33
+ 2024-03-07 08:43:08,702 INFO [decode.py:651] batch 120/?, cuts processed until now is 3814
34
+ 2024-03-07 08:43:22,478 INFO [decode.py:651] batch 140/?, cuts processed until now is 4529
35
+ 2024-03-07 08:43:30,734 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2739, 2.6799, 2.8831, 2.6336], device='cuda:0')
36
+ 2024-03-07 08:43:36,235 INFO [decode.py:651] batch 160/?, cuts processed until now is 5256
37
+ 2024-03-07 08:43:38,339 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0916, 2.5364, 2.6418, 2.3536], device='cuda:0')
38
+ 2024-03-07 08:43:50,472 INFO [decode.py:651] batch 180/?, cuts processed until now is 5927
39
+ 2024-03-07 08:44:04,600 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
40
+ 2024-03-07 08:44:18,698 INFO [decode.py:651] batch 220/?, cuts processed until now is 7221
41
+ 2024-03-07 08:44:32,720 INFO [decode.py:651] batch 240/?, cuts processed until now is 7878
42
+ 2024-03-07 08:44:46,812 INFO [decode.py:651] batch 260/?, cuts processed until now is 8528
43
+ 2024-03-07 08:45:00,678 INFO [decode.py:651] batch 280/?, cuts processed until now is 9263
44
+ 2024-03-07 08:45:02,883 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.2417, 2.2166, 2.4622, 2.6655], device='cuda:0')
45
+ 2024-03-07 08:45:14,478 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
46
+ 2024-03-07 08:45:28,719 INFO [decode.py:651] batch 320/?, cuts processed until now is 10574
47
+ 2024-03-07 08:45:42,779 INFO [decode.py:651] batch 340/?, cuts processed until now is 11255
48
+ 2024-03-07 08:45:56,922 INFO [decode.py:651] batch 360/?, cuts processed until now is 11900
49
+ 2024-03-07 08:46:05,025 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
50
+ 2024-03-07 08:46:05,434 INFO [utils.py:656] [test-commonvoice-beam_size_4] %WER 7.53% [47039 / 624874, 11050 ins, 20152 del, 15837 sub ]
51
+ 2024-03-07 08:46:06,304 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
52
+ 2024-03-07 08:46:06,304 INFO [decode.py:690]
53
+ For test-commonvoice, WER of different settings are:
54
+ beam_size_4 7.53 best for test-commonvoice
55
+
56
+ 2024-03-07 08:46:06,304 INFO [decode.py:944] Done!
exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/modified_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_size_4 7.53
exp-causal/modified_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_size_4 6.4
exp-causal/pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a57d5529d997fe029a18bccae562257f58bdc66d4eb2648dc4526c66618e8d8
3
+ size 261184016
exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model-2024-03-07-08-56-46 ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 08:56:46,168 INFO [streaming_decode.py:723] Decoding started
2
+ 2024-03-07 08:56:46,168 INFO [streaming_decode.py:729] Device: cuda:0
3
+ 2024-03-07 08:56:46,168 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2024-03-07 08:56:46,170 INFO [streaming_decode.py:737] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1000, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/streaming/fast_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
5
+ 2024-03-07 08:56:46,170 INFO [streaming_decode.py:739] About to create model
6
+ 2024-03-07 08:56:46,400 INFO [streaming_decode.py:806] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 08:56:47,216 INFO [streaming_decode.py:828] Number of model parameters: 65182863
8
+ 2024-03-07 08:56:47,216 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 08:56:47,216 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 08:56:47,217 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 08:56:47,217 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 08:56:47,250 INFO [streaming_decode.py:615] Cuts processed until now is 0.
13
+ 2024-03-07 08:56:47,505 INFO [streaming_decode.py:615] Cuts processed until now is 100.
14
+ 2024-03-07 08:56:47,761 INFO [streaming_decode.py:615] Cuts processed until now is 200.
15
+ 2024-03-07 08:56:48,125 INFO [streaming_decode.py:615] Cuts processed until now is 300.
16
+ 2024-03-07 08:56:48,388 INFO [streaming_decode.py:615] Cuts processed until now is 400.
17
+ 2024-03-07 08:56:59,328 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
18
+ 2024-03-07 08:56:59,374 INFO [utils.py:656] [test-fleurs-beam_4_max_contexts_4_max_states_32] %WER 6.44% [3966 / 61587, 1562 ins, 1114 del, 1290 sub ]
19
+ 2024-03-07 08:56:59,473 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
20
+ 2024-03-07 08:56:59,473 INFO [streaming_decode.py:685]
21
+ For test-fleurs, WER of different settings are:
22
+ beam_4_max_contexts_4_max_states_32 6.44 best for test-fleurs
23
+
24
+ 2024-03-07 08:56:59,533 INFO [streaming_decode.py:615] Cuts processed until now is 0.
25
+ 2024-03-07 08:56:59,987 INFO [streaming_decode.py:615] Cuts processed until now is 100.
26
+ 2024-03-07 08:57:00,402 INFO [streaming_decode.py:615] Cuts processed until now is 200.
27
+ 2024-03-07 08:57:00,805 INFO [streaming_decode.py:615] Cuts processed until now is 300.
28
+ 2024-03-07 08:57:01,205 INFO [streaming_decode.py:615] Cuts processed until now is 400.
29
+ 2024-03-07 08:57:01,610 INFO [streaming_decode.py:615] Cuts processed until now is 500.
30
+ 2024-03-07 08:57:02,035 INFO [streaming_decode.py:615] Cuts processed until now is 600.
31
+ 2024-03-07 08:57:02,444 INFO [streaming_decode.py:615] Cuts processed until now is 700.
32
+ 2024-03-07 08:57:02,848 INFO [streaming_decode.py:615] Cuts processed until now is 800.
33
+ 2024-03-07 08:57:03,267 INFO [streaming_decode.py:615] Cuts processed until now is 900.
34
+ 2024-03-07 08:57:06,208 INFO [streaming_decode.py:615] Cuts processed until now is 1000.
35
+ 2024-03-07 08:57:07,867 INFO [streaming_decode.py:615] Cuts processed until now is 1100.
36
+ 2024-03-07 08:57:09,022 INFO [streaming_decode.py:615] Cuts processed until now is 1200.
37
+ 2024-03-07 08:57:10,227 INFO [streaming_decode.py:615] Cuts processed until now is 1300.
38
+ 2024-03-07 08:57:11,297 INFO [streaming_decode.py:615] Cuts processed until now is 1400.
39
+ 2024-03-07 08:57:12,537 INFO [streaming_decode.py:615] Cuts processed until now is 1500.
40
+ 2024-03-07 08:57:13,834 INFO [streaming_decode.py:615] Cuts processed until now is 1600.
41
+ 2024-03-07 08:57:14,242 INFO [streaming_decode.py:615] Cuts processed until now is 1700.
42
+ 2024-03-07 08:57:15,348 INFO [streaming_decode.py:615] Cuts processed until now is 1800.
43
+ 2024-03-07 08:57:16,629 INFO [streaming_decode.py:615] Cuts processed until now is 1900.
44
+ 2024-03-07 08:57:17,927 INFO [streaming_decode.py:615] Cuts processed until now is 2000.
45
+ 2024-03-07 08:57:19,033 INFO [streaming_decode.py:615] Cuts processed until now is 2100.
46
+ 2024-03-07 08:57:20,293 INFO [streaming_decode.py:615] Cuts processed until now is 2200.
47
+ 2024-03-07 08:57:21,399 INFO [streaming_decode.py:615] Cuts processed until now is 2300.
48
+ 2024-03-07 08:57:23,520 INFO [streaming_decode.py:615] Cuts processed until now is 2400.
49
+ 2024-03-07 08:57:24,627 INFO [streaming_decode.py:615] Cuts processed until now is 2500.
50
+ 2024-03-07 08:57:25,903 INFO [streaming_decode.py:615] Cuts processed until now is 2600.
51
+ 2024-03-07 08:57:27,193 INFO [streaming_decode.py:615] Cuts processed until now is 2700.
52
+ 2024-03-07 08:57:28,345 INFO [streaming_decode.py:615] Cuts processed until now is 2800.
53
+ 2024-03-07 08:57:28,729 INFO [streaming_decode.py:615] Cuts processed until now is 2900.
54
+ 2024-03-07 08:57:30,011 INFO [streaming_decode.py:615] Cuts processed until now is 3000.
55
+ 2024-03-07 08:57:31,313 INFO [streaming_decode.py:615] Cuts processed until now is 3100.
56
+ 2024-03-07 08:57:32,611 INFO [streaming_decode.py:615] Cuts processed until now is 3200.
57
+ 2024-03-07 08:57:33,723 INFO [streaming_decode.py:615] Cuts processed until now is 3300.
58
+ 2024-03-07 08:57:35,022 INFO [streaming_decode.py:615] Cuts processed until now is 3400.
59
+ 2024-03-07 08:57:36,346 INFO [streaming_decode.py:615] Cuts processed until now is 3500.
60
+ 2024-03-07 08:57:37,464 INFO [streaming_decode.py:615] Cuts processed until now is 3600.
61
+ 2024-03-07 08:57:38,782 INFO [streaming_decode.py:615] Cuts processed until now is 3700.
62
+ 2024-03-07 08:57:40,089 INFO [streaming_decode.py:615] Cuts processed until now is 3800.
63
+ 2024-03-07 08:57:41,219 INFO [streaming_decode.py:615] Cuts processed until now is 3900.
64
+ 2024-03-07 08:57:42,516 INFO [streaming_decode.py:615] Cuts processed until now is 4000.
65
+ 2024-03-07 08:57:43,861 INFO [streaming_decode.py:615] Cuts processed until now is 4100.
66
+ 2024-03-07 08:57:44,985 INFO [streaming_decode.py:615] Cuts processed until now is 4200.
67
+ 2024-03-07 08:57:46,301 INFO [streaming_decode.py:615] Cuts processed until now is 4300.
68
+ 2024-03-07 08:57:47,589 INFO [streaming_decode.py:615] Cuts processed until now is 4400.
69
+ 2024-03-07 08:57:48,724 INFO [streaming_decode.py:615] Cuts processed until now is 4500.
70
+ 2024-03-07 08:57:50,031 INFO [streaming_decode.py:615] Cuts processed until now is 4600.
71
+ 2024-03-07 08:57:51,375 INFO [streaming_decode.py:615] Cuts processed until now is 4700.
72
+ 2024-03-07 08:57:52,513 INFO [streaming_decode.py:615] Cuts processed until now is 4800.
73
+ 2024-03-07 08:57:53,839 INFO [streaming_decode.py:615] Cuts processed until now is 4900.
74
+ 2024-03-07 08:57:54,958 INFO [streaming_decode.py:615] Cuts processed until now is 5000.
75
+ 2024-03-07 08:57:56,282 INFO [streaming_decode.py:615] Cuts processed until now is 5100.
76
+ 2024-03-07 08:57:57,610 INFO [streaming_decode.py:615] Cuts processed until now is 5200.
77
+ 2024-03-07 08:57:58,744 INFO [streaming_decode.py:615] Cuts processed until now is 5300.
78
+ 2024-03-07 08:58:00,031 INFO [streaming_decode.py:615] Cuts processed until now is 5400.
79
+ 2024-03-07 08:58:01,344 INFO [streaming_decode.py:615] Cuts processed until now is 5500.
80
+ 2024-03-07 08:58:02,494 INFO [streaming_decode.py:615] Cuts processed until now is 5600.
81
+ 2024-03-07 08:58:03,800 INFO [streaming_decode.py:615] Cuts processed until now is 5700.
82
+ 2024-03-07 08:58:05,143 INFO [streaming_decode.py:615] Cuts processed until now is 5800.
83
+ 2024-03-07 08:58:06,296 INFO [streaming_decode.py:615] Cuts processed until now is 5900.
84
+ 2024-03-07 08:58:07,598 INFO [streaming_decode.py:615] Cuts processed until now is 6000.
85
+ 2024-03-07 08:58:08,934 INFO [streaming_decode.py:615] Cuts processed until now is 6100.
86
+ 2024-03-07 08:58:10,061 INFO [streaming_decode.py:615] Cuts processed until now is 6200.
87
+ 2024-03-07 08:58:11,363 INFO [streaming_decode.py:615] Cuts processed until now is 6300.
88
+ 2024-03-07 08:58:12,697 INFO [streaming_decode.py:615] Cuts processed until now is 6400.
89
+ 2024-03-07 08:58:13,841 INFO [streaming_decode.py:615] Cuts processed until now is 6500.
90
+ 2024-03-07 08:58:15,142 INFO [streaming_decode.py:615] Cuts processed until now is 6600.
91
+ 2024-03-07 08:58:16,473 INFO [streaming_decode.py:615] Cuts processed until now is 6700.
92
+ 2024-03-07 08:58:17,624 INFO [streaming_decode.py:615] Cuts processed until now is 6800.
93
+ 2024-03-07 08:58:18,925 INFO [streaming_decode.py:615] Cuts processed until now is 6900.
94
+ 2024-03-07 08:58:20,264 INFO [streaming_decode.py:615] Cuts processed until now is 7000.
95
+ 2024-03-07 08:58:21,416 INFO [streaming_decode.py:615] Cuts processed until now is 7100.
96
+ 2024-03-07 08:58:22,705 INFO [streaming_decode.py:615] Cuts processed until now is 7200.
97
+ 2024-03-07 08:58:24,056 INFO [streaming_decode.py:615] Cuts processed until now is 7300.
98
+ 2024-03-07 08:58:25,222 INFO [streaming_decode.py:615] Cuts processed until now is 7400.
99
+ 2024-03-07 08:58:26,532 INFO [streaming_decode.py:615] Cuts processed until now is 7500.
100
+ 2024-03-07 08:58:27,880 INFO [streaming_decode.py:615] Cuts processed until now is 7600.
101
+ 2024-03-07 08:58:29,014 INFO [streaming_decode.py:615] Cuts processed until now is 7700.
102
+ 2024-03-07 08:58:30,320 INFO [streaming_decode.py:615] Cuts processed until now is 7800.
103
+ 2024-03-07 08:58:31,679 INFO [streaming_decode.py:615] Cuts processed until now is 7900.
104
+ 2024-03-07 08:58:32,790 INFO [streaming_decode.py:615] Cuts processed until now is 8000.
105
+ 2024-03-07 08:58:34,120 INFO [streaming_decode.py:615] Cuts processed until now is 8100.
106
+ 2024-03-07 08:58:35,488 INFO [streaming_decode.py:615] Cuts processed until now is 8200.
107
+ 2024-03-07 08:58:36,616 INFO [streaming_decode.py:615] Cuts processed until now is 8300.
108
+ 2024-03-07 08:58:37,953 INFO [streaming_decode.py:615] Cuts processed until now is 8400.
109
+ 2024-03-07 08:58:39,086 INFO [streaming_decode.py:615] Cuts processed until now is 8500.
110
+ 2024-03-07 08:58:40,391 INFO [streaming_decode.py:615] Cuts processed until now is 8600.
111
+ 2024-03-07 08:58:41,731 INFO [streaming_decode.py:615] Cuts processed until now is 8700.
112
+ 2024-03-07 08:58:42,864 INFO [streaming_decode.py:615] Cuts processed until now is 8800.
113
+ 2024-03-07 08:58:44,169 INFO [streaming_decode.py:615] Cuts processed until now is 8900.
114
+ 2024-03-07 08:58:45,542 INFO [streaming_decode.py:615] Cuts processed until now is 9000.
115
+ 2024-03-07 08:58:46,679 INFO [streaming_decode.py:615] Cuts processed until now is 9100.
116
+ 2024-03-07 08:58:47,998 INFO [streaming_decode.py:615] Cuts processed until now is 9200.
117
+ 2024-03-07 08:58:49,366 INFO [streaming_decode.py:615] Cuts processed until now is 9300.
118
+ 2024-03-07 08:58:50,494 INFO [streaming_decode.py:615] Cuts processed until now is 9400.
119
+ 2024-03-07 08:58:51,824 INFO [streaming_decode.py:615] Cuts processed until now is 9500.
120
+ 2024-03-07 08:58:53,228 INFO [streaming_decode.py:615] Cuts processed until now is 9600.
121
+ 2024-03-07 08:58:54,385 INFO [streaming_decode.py:615] Cuts processed until now is 9700.
122
+ 2024-03-07 08:58:55,761 INFO [streaming_decode.py:615] Cuts processed until now is 9800.
123
+ 2024-03-07 08:58:56,178 INFO [streaming_decode.py:615] Cuts processed until now is 9900.
124
+ 2024-03-07 08:58:57,326 INFO [streaming_decode.py:615] Cuts processed until now is 10000.
125
+ 2024-03-07 08:58:59,584 INFO [streaming_decode.py:615] Cuts processed until now is 10100.
126
+ 2024-03-07 08:59:00,723 INFO [streaming_decode.py:615] Cuts processed until now is 10200.
127
+ 2024-03-07 08:59:02,083 INFO [streaming_decode.py:615] Cuts processed until now is 10300.
128
+ 2024-03-07 08:59:03,461 INFO [streaming_decode.py:615] Cuts processed until now is 10400.
129
+ 2024-03-07 08:59:04,588 INFO [streaming_decode.py:615] Cuts processed until now is 10500.
130
+ 2024-03-07 08:59:05,955 INFO [streaming_decode.py:615] Cuts processed until now is 10600.
131
+ 2024-03-07 08:59:07,086 INFO [streaming_decode.py:615] Cuts processed until now is 10700.
132
+ 2024-03-07 08:59:08,428 INFO [streaming_decode.py:615] Cuts processed until now is 10800.
133
+ 2024-03-07 08:59:09,807 INFO [streaming_decode.py:615] Cuts processed until now is 10900.
134
+ 2024-03-07 08:59:10,955 INFO [streaming_decode.py:615] Cuts processed until now is 11000.
135
+ 2024-03-07 08:59:12,305 INFO [streaming_decode.py:615] Cuts processed until now is 11100.
136
+ 2024-03-07 08:59:13,713 INFO [streaming_decode.py:615] Cuts processed until now is 11200.
137
+ 2024-03-07 08:59:14,126 INFO [streaming_decode.py:615] Cuts processed until now is 11300.
138
+ 2024-03-07 08:59:16,212 INFO [streaming_decode.py:615] Cuts processed until now is 11400.
139
+ 2024-03-07 08:59:17,370 INFO [streaming_decode.py:615] Cuts processed until now is 11500.
140
+ 2024-03-07 08:59:18,717 INFO [streaming_decode.py:615] Cuts processed until now is 11600.
141
+ 2024-03-07 08:59:19,119 INFO [streaming_decode.py:615] Cuts processed until now is 11700.
142
+ 2024-03-07 08:59:20,499 INFO [streaming_decode.py:615] Cuts processed until now is 11800.
143
+ 2024-03-07 08:59:21,634 INFO [streaming_decode.py:615] Cuts processed until now is 11900.
144
+ 2024-03-07 08:59:22,973 INFO [streaming_decode.py:615] Cuts processed until now is 12000.
145
+ 2024-03-07 08:59:25,070 INFO [streaming_decode.py:615] Cuts processed until now is 12100.
146
+ 2024-03-07 08:59:26,456 INFO [streaming_decode.py:615] Cuts processed until now is 12200.
147
+ 2024-03-07 08:59:32,292 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
148
+ 2024-03-07 08:59:32,774 INFO [utils.py:656] [test-commonvoice-beam_4_max_contexts_4_max_states_32] %WER 7.72% [48248 / 624874, 9842 ins, 22929 del, 15477 sub ]
149
+ 2024-03-07 08:59:33,754 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
150
+ 2024-03-07 08:59:33,754 INFO [streaming_decode.py:685]
151
+ For test-commonvoice, WER of different settings are:
152
+ beam_4_max_contexts_4_max_states_32 7.72 best for test-commonvoice
153
+
154
+ 2024-03-07 08:59:33,754 INFO [streaming_decode.py:853] Done!
exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_4_max_contexts_4_max_states_32 7.72
exp-causal/streaming/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ settings WER
2
+ beam_4_max_contexts_4_max_states_32 6.44
exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff
 
exp-causal/streaming/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-08-54-56 ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-03-07 08:54:56,454 INFO [streaming_decode.py:723] Decoding started
2
+ 2024-03-07 08:54:56,455 INFO [streaming_decode.py:729] Device: cuda:0
3
+ 2024-03-07 08:54:56,455 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
4
+ 2024-03-07 08:54:56,457 INFO [streaming_decode.py:737] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'greedy_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1000, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/streaming/greedy_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
5
+ 2024-03-07 08:54:56,457 INFO [streaming_decode.py:739] About to create model
6
+ 2024-03-07 08:54:56,690 INFO [streaming_decode.py:806] Calculating the averaged model over epoch range from 33 (excluded) to 40
7
+ 2024-03-07 08:54:57,485 INFO [streaming_decode.py:828] Number of model parameters: 65182863
8
+ 2024-03-07 08:54:57,485 INFO [multidataset.py:81] About to get FLEURS test cuts
9
+ 2024-03-07 08:54:57,485 INFO [multidataset.py:83] Loading FLEURS in lazy mode
10
+ 2024-03-07 08:54:57,486 INFO [multidataset.py:90] About to get Common Voice test cuts
11
+ 2024-03-07 08:54:57,486 INFO [multidataset.py:92] Loading Common Voice in lazy mode
12
+ 2024-03-07 08:54:57,520 INFO [streaming_decode.py:615] Cuts processed until now is 0.
13
+ 2024-03-07 08:54:57,771 INFO [streaming_decode.py:615] Cuts processed until now is 100.
14
+ 2024-03-07 08:54:58,025 INFO [streaming_decode.py:615] Cuts processed until now is 200.
15
+ 2024-03-07 08:54:58,389 INFO [streaming_decode.py:615] Cuts processed until now is 300.
16
+ 2024-03-07 08:54:58,649 INFO [streaming_decode.py:615] Cuts processed until now is 400.
17
+ 2024-03-07 08:55:03,961 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
18
+ 2024-03-07 08:55:04,004 INFO [utils.py:656] [test-fleurs-greedy_search] %WER 6.59% [4058 / 61587, 1608 ins, 1149 del, 1301 sub ]
19
+ 2024-03-07 08:55:04,100 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
20
+ 2024-03-07 08:55:04,101 INFO [streaming_decode.py:685]
21
+ For test-fleurs, WER of different settings are:
22
+ greedy_search 6.59 best for test-fleurs
23
+
24
+ 2024-03-07 08:55:04,160 INFO [streaming_decode.py:615] Cuts processed until now is 0.
25
+ 2024-03-07 08:55:04,586 INFO [streaming_decode.py:615] Cuts processed until now is 100.
26
+ 2024-03-07 08:55:04,991 INFO [streaming_decode.py:615] Cuts processed until now is 200.
27
+ 2024-03-07 08:55:05,379 INFO [streaming_decode.py:615] Cuts processed until now is 300.
28
+ 2024-03-07 08:55:05,766 INFO [streaming_decode.py:615] Cuts processed until now is 400.
29
+ 2024-03-07 08:55:06,160 INFO [streaming_decode.py:615] Cuts processed until now is 500.
30
+ 2024-03-07 08:55:06,568 INFO [streaming_decode.py:615] Cuts processed until now is 600.
31
+ 2024-03-07 08:55:06,968 INFO [streaming_decode.py:615] Cuts processed until now is 700.
32
+ 2024-03-07 08:55:07,371 INFO [streaming_decode.py:615] Cuts processed until now is 800.
33
+ 2024-03-07 08:55:07,784 INFO [streaming_decode.py:615] Cuts processed until now is 900.
34
+ 2024-03-07 08:55:09,627 INFO [streaming_decode.py:615] Cuts processed until now is 1000.
35
+ 2024-03-07 08:55:10,682 INFO [streaming_decode.py:615] Cuts processed until now is 1100.
36
+ 2024-03-07 08:55:11,495 INFO [streaming_decode.py:615] Cuts processed until now is 1200.
37
+ 2024-03-07 08:55:12,174 INFO [streaming_decode.py:615] Cuts processed until now is 1300.
38
+ 2024-03-07 08:55:12,992 INFO [streaming_decode.py:615] Cuts processed until now is 1400.
39
+ 2024-03-07 08:55:13,818 INFO [streaming_decode.py:615] Cuts processed until now is 1500.
40
+ 2024-03-07 08:55:14,510 INFO [streaming_decode.py:615] Cuts processed until now is 1600.
41
+ 2024-03-07 08:55:15,045 INFO [streaming_decode.py:615] Cuts processed until now is 1700.
42
+ 2024-03-07 08:55:15,725 INFO [streaming_decode.py:615] Cuts processed until now is 1800.
43
+ 2024-03-07 08:55:16,557 INFO [streaming_decode.py:615] Cuts processed until now is 1900.
44
+ 2024-03-07 08:55:17,411 INFO [streaming_decode.py:615] Cuts processed until now is 2000.
45
+ 2024-03-07 08:55:18,087 INFO [streaming_decode.py:615] Cuts processed until now is 2100.
46
+ 2024-03-07 08:55:18,919 INFO [streaming_decode.py:615] Cuts processed until now is 2200.
47
+ 2024-03-07 08:55:19,591 INFO [streaming_decode.py:615] Cuts processed until now is 2300.
48
+ 2024-03-07 08:55:20,850 INFO [streaming_decode.py:615] Cuts processed until now is 2400.
49
+ 2024-03-07 08:55:21,527 INFO [streaming_decode.py:615] Cuts processed until now is 2500.
50
+ 2024-03-07 08:55:22,359 INFO [streaming_decode.py:615] Cuts processed until now is 2600.
51
+ 2024-03-07 08:55:23,201 INFO [streaming_decode.py:615] Cuts processed until now is 2700.
52
+ 2024-03-07 08:55:23,888 INFO [streaming_decode.py:615] Cuts processed until now is 2800.
53
+ 2024-03-07 08:55:24,270 INFO [streaming_decode.py:615] Cuts processed until now is 2900.
54
+ 2024-03-07 08:55:25,106 INFO [streaming_decode.py:615] Cuts processed until now is 3000.
55
+ 2024-03-07 08:55:25,958 INFO [streaming_decode.py:615] Cuts processed until now is 3100.
56
+ 2024-03-07 08:55:26,804 INFO [streaming_decode.py:615] Cuts processed until now is 3200.
57
+ 2024-03-07 08:55:27,480 INFO [streaming_decode.py:615] Cuts processed until now is 3300.
58
+ 2024-03-07 08:55:28,327 INFO [streaming_decode.py:615] Cuts processed until now is 3400.
59
+ 2024-03-07 08:55:29,199 INFO [streaming_decode.py:615] Cuts processed until now is 3500.
60
+ 2024-03-07 08:55:29,882 INFO [streaming_decode.py:615] Cuts processed until now is 3600.
61
+ 2024-03-07 08:55:30,749 INFO [streaming_decode.py:615] Cuts processed until now is 3700.
62
+ 2024-03-07 08:55:31,445 INFO [streaming_decode.py:615] Cuts processed until now is 3800.
63
+ 2024-03-07 08:55:32,285 INFO [streaming_decode.py:615] Cuts processed until now is 3900.
64
+ 2024-03-07 08:55:33,146 INFO [streaming_decode.py:615] Cuts processed until now is 4000.
65
+ 2024-03-07 08:55:33,839 INFO [streaming_decode.py:615] Cuts processed until now is 4100.
66
+ 2024-03-07 08:55:34,687 INFO [streaming_decode.py:615] Cuts processed until now is 4200.
67
+ 2024-03-07 08:55:35,556 INFO [streaming_decode.py:615] Cuts processed until now is 4300.
68
+ 2024-03-07 08:55:36,242 INFO [streaming_decode.py:615] Cuts processed until now is 4400.
69
+ 2024-03-07 08:55:37,118 INFO [streaming_decode.py:615] Cuts processed until now is 4500.
70
+ 2024-03-07 08:55:37,989 INFO [streaming_decode.py:615] Cuts processed until now is 4600.
71
+ 2024-03-07 08:55:38,683 INFO [streaming_decode.py:615] Cuts processed until now is 4700.
72
+ 2024-03-07 08:55:39,534 INFO [streaming_decode.py:615] Cuts processed until now is 4800.
73
+ 2024-03-07 08:55:40,399 INFO [streaming_decode.py:615] Cuts processed until now is 4900.
74
+ 2024-03-07 08:55:41,089 INFO [streaming_decode.py:615] Cuts processed until now is 5000.
75
+ 2024-03-07 08:55:41,951 INFO [streaming_decode.py:615] Cuts processed until now is 5100.
76
+ 2024-03-07 08:55:42,815 INFO [streaming_decode.py:615] Cuts processed until now is 5200.
77
+ 2024-03-07 08:55:43,506 INFO [streaming_decode.py:615] Cuts processed until now is 5300.
78
+ 2024-03-07 08:55:44,345 INFO [streaming_decode.py:615] Cuts processed until now is 5400.
79
+ 2024-03-07 08:55:45,218 INFO [streaming_decode.py:615] Cuts processed until now is 5500.
80
+ 2024-03-07 08:55:45,911 INFO [streaming_decode.py:615] Cuts processed until now is 5600.
81
+ 2024-03-07 08:55:46,761 INFO [streaming_decode.py:615] Cuts processed until now is 5700.
82
+ 2024-03-07 08:55:47,641 INFO [streaming_decode.py:615] Cuts processed until now is 5800.
83
+ 2024-03-07 08:55:48,336 INFO [streaming_decode.py:615] Cuts processed until now is 5900.
84
+ 2024-03-07 08:55:49,182 INFO [streaming_decode.py:615] Cuts processed until now is 6000.
85
+ 2024-03-07 08:55:50,051 INFO [streaming_decode.py:615] Cuts processed until now is 6100.
86
+ 2024-03-07 08:55:50,744 INFO [streaming_decode.py:615] Cuts processed until now is 6200.
87
+ 2024-03-07 08:55:51,596 INFO [streaming_decode.py:615] Cuts processed until now is 6300.
88
+ 2024-03-07 08:55:52,469 INFO [streaming_decode.py:615] Cuts processed until now is 6400.
89
+ 2024-03-07 08:55:53,170 INFO [streaming_decode.py:615] Cuts processed until now is 6500.
90
+ 2024-03-07 08:55:54,019 INFO [streaming_decode.py:615] Cuts processed until now is 6600.
91
+ 2024-03-07 08:55:54,889 INFO [streaming_decode.py:615] Cuts processed until now is 6700.
92
+ 2024-03-07 08:55:55,582 INFO [streaming_decode.py:615] Cuts processed until now is 6800.
93
+ 2024-03-07 08:55:56,430 INFO [streaming_decode.py:615] Cuts processed until now is 6900.
94
+ 2024-03-07 08:55:57,304 INFO [streaming_decode.py:615] Cuts processed until now is 7000.
95
+ 2024-03-07 08:55:58,003 INFO [streaming_decode.py:615] Cuts processed until now is 7100.
96
+ 2024-03-07 08:55:58,859 INFO [streaming_decode.py:615] Cuts processed until now is 7200.
97
+ 2024-03-07 08:55:59,737 INFO [streaming_decode.py:615] Cuts processed until now is 7300.
98
+ 2024-03-07 08:56:00,438 INFO [streaming_decode.py:615] Cuts processed until now is 7400.
99
+ 2024-03-07 08:56:01,288 INFO [streaming_decode.py:615] Cuts processed until now is 7500.
100
+ 2024-03-07 08:56:02,163 INFO [streaming_decode.py:615] Cuts processed until now is 7600.
101
+ 2024-03-07 08:56:02,862 INFO [streaming_decode.py:615] Cuts processed until now is 7700.
102
+ 2024-03-07 08:56:03,730 INFO [streaming_decode.py:615] Cuts processed until now is 7800.
103
+ 2024-03-07 08:56:04,615 INFO [streaming_decode.py:615] Cuts processed until now is 7900.
104
+ 2024-03-07 08:56:05,298 INFO [streaming_decode.py:615] Cuts processed until now is 8000.
105
+ 2024-03-07 08:56:06,159 INFO [streaming_decode.py:615] Cuts processed until now is 8100.
106
+ 2024-03-07 08:56:07,036 INFO [streaming_decode.py:615] Cuts processed until now is 8200.
107
+ 2024-03-07 08:56:07,727 INFO [streaming_decode.py:615] Cuts processed until now is 8300.
108
+ 2024-03-07 08:56:08,586 INFO [streaming_decode.py:615] Cuts processed until now is 8400.
109
+ 2024-03-07 08:56:09,472 INFO [streaming_decode.py:615] Cuts processed until now is 8500.
110
+ 2024-03-07 08:56:10,163 INFO [streaming_decode.py:615] Cuts processed until now is 8600.
111
+ 2024-03-07 08:56:11,024 INFO [streaming_decode.py:615] Cuts processed until now is 8700.
112
+ 2024-03-07 08:56:11,913 INFO [streaming_decode.py:615] Cuts processed until now is 8800.
113
+ 2024-03-07 08:56:12,600 INFO [streaming_decode.py:615] Cuts processed until now is 8900.
114
+ 2024-03-07 08:56:13,467 INFO [streaming_decode.py:615] Cuts processed until now is 9000.
115
+ 2024-03-07 08:56:14,370 INFO [streaming_decode.py:615] Cuts processed until now is 9100.
116
+ 2024-03-07 08:56:15,069 INFO [streaming_decode.py:615] Cuts processed until now is 9200.
117
+ 2024-03-07 08:56:15,959 INFO [streaming_decode.py:615] Cuts processed until now is 9300.
118
+ 2024-03-07 08:56:16,866 INFO [streaming_decode.py:615] Cuts processed until now is 9400.
119
+ 2024-03-07 08:56:17,556 INFO [streaming_decode.py:615] Cuts processed until now is 9500.
120
+ 2024-03-07 08:56:18,460 INFO [streaming_decode.py:615] Cuts processed until now is 9600.
121
+ 2024-03-07 08:56:19,157 INFO [streaming_decode.py:615] Cuts processed until now is 9700.
122
+ 2024-03-07 08:56:20,036 INFO [streaming_decode.py:615] Cuts processed until now is 9800.
123
+ 2024-03-07 08:56:20,439 INFO [streaming_decode.py:615] Cuts processed until now is 9900.
124
+ 2024-03-07 08:56:21,347 INFO [streaming_decode.py:615] Cuts processed until now is 10000.
125
+ 2024-03-07 08:56:22,519 INFO [streaming_decode.py:615] Cuts processed until now is 10100.
126
+ 2024-03-07 08:56:23,219 INFO [streaming_decode.py:615] Cuts processed until now is 10200.
127
+ 2024-03-07 08:56:24,092 INFO [streaming_decode.py:615] Cuts processed until now is 10300.
128
+ 2024-03-07 08:56:24,984 INFO [streaming_decode.py:615] Cuts processed until now is 10400.
129
+ 2024-03-07 08:56:25,673 INFO [streaming_decode.py:615] Cuts processed until now is 10500.
130
+ 2024-03-07 08:56:26,535 INFO [streaming_decode.py:615] Cuts processed until now is 10600.
131
+ 2024-03-07 08:56:27,422 INFO [streaming_decode.py:615] Cuts processed until now is 10700.
132
+ 2024-03-07 08:56:28,110 INFO [streaming_decode.py:615] Cuts processed until now is 10800.
133
+ 2024-03-07 08:56:28,983 INFO [streaming_decode.py:615] Cuts processed until now is 10900.
134
+ 2024-03-07 08:56:29,887 INFO [streaming_decode.py:615] Cuts processed until now is 11000.
135
+ 2024-03-07 08:56:30,582 INFO [streaming_decode.py:615] Cuts processed until now is 11100.
136
+ 2024-03-07 08:56:31,468 INFO [streaming_decode.py:615] Cuts processed until now is 11200.
137
+ 2024-03-07 08:56:31,870 INFO [streaming_decode.py:615] Cuts processed until now is 11300.
138
+ 2024-03-07 08:56:33,076 INFO [streaming_decode.py:615] Cuts processed until now is 11400.
139
+ 2024-03-07 08:56:33,986 INFO [streaming_decode.py:615] Cuts processed until now is 11500.
140
+ 2024-03-07 08:56:34,684 INFO [streaming_decode.py:615] Cuts processed until now is 11600.
141
+ 2024-03-07 08:56:35,077 INFO [streaming_decode.py:615] Cuts processed until now is 11700.
142
+ 2024-03-07 08:56:35,970 INFO [streaming_decode.py:615] Cuts processed until now is 11800.
143
+ 2024-03-07 08:56:36,873 INFO [streaming_decode.py:615] Cuts processed until now is 11900.
144
+ 2024-03-07 08:56:37,568 INFO [streaming_decode.py:615] Cuts processed until now is 12000.
145
+ 2024-03-07 08:56:38,745 INFO [streaming_decode.py:615] Cuts processed until now is 12100.
146
+ 2024-03-07 08:56:39,618 INFO [streaming_decode.py:615] Cuts processed until now is 12200.
147
+ 2024-03-07 08:56:42,452 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
148
+ 2024-03-07 08:56:42,883 INFO [utils.py:656] [test-commonvoice-greedy_search] %WER 7.75% [48447 / 624874, 10530 ins, 21899 del, 16018 sub ]
149
+ 2024-03-07 08:56:43,788 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
150
+ 2024-03-07 08:56:43,788 INFO [streaming_decode.py:685]
151
+ For test-commonvoice, WER of different settings are:
152
+ greedy_search 7.75 best for test-commonvoice
153
+
154
+ 2024-03-07 08:56:43,788 INFO [streaming_decode.py:853] Done!
exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt ADDED
The diff for this file is too large to render. See raw diff