Added MOdel
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- README.md +195 -0
- data/lang_phone/L.pt +3 -0
- data/lang_phone/L_disambig.pt +3 -0
- data/lang_phone/Linv.pt +3 -0
- data/lang_phone/lexicon.txt +37 -0
- data/lang_phone/lexicon_disambig.txt +37 -0
- data/lang_phone/tokens.txt +39 -0
- data/lang_phone/words.txt +41 -0
- exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-57-38 +7 -0
- exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-59-38 +37 -0
- exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/ctc-decoding/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +2 -0
- exp-causal/ctc-decoding/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +2 -0
- exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp-causal/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2024-03-07-08-39-05 +66 -0
- exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp-causal/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
- exp-causal/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
- exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp-causal/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model-2024-03-07-08-38-15 +46 -0
- exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp-causal/greedy_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
- exp-causal/greedy_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
- exp-causal/jit_script_chunk_32_left_128.pt +3 -0
- exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp-causal/modified_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model-2024-03-07-08-41-07 +56 -0
- exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp-causal/modified_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
- exp-causal/modified_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
- exp-causal/pretrained.pt +3 -0
- exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp-causal/streaming/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model-2024-03-07-08-56-46 +154 -0
- exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp-causal/streaming/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +2 -0
- exp-causal/streaming/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +2 -0
- exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
- exp-causal/streaming/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-08-54-56 +154 -0
- exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt +0 -0
README.md
CHANGED
@@ -1,3 +1,198 @@
|
|
1 |
---
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: sw
|
3 |
license: apache-2.0
|
4 |
+
tags:
|
5 |
+
- icefall
|
6 |
+
- phoneme-recognition
|
7 |
+
- automatic-speech-recognition
|
8 |
+
datasets:
|
9 |
+
- bookbot/ALFFA_swahili
|
10 |
+
- bookbot/fleurs_sw
|
11 |
+
- bookbot/common_voice_16_1_sw
|
12 |
---
|
13 |
+
|
14 |
+
# Pruned Stateless Zipformer RNN-T Streaming Robust SW
|
15 |
+
|
16 |
+
Pruned Stateless Zipformer RNN-T Streaming Robust SW is an automatic speech recognition model trained on the following datasets:
|
17 |
+
|
18 |
+
- [ALFFA Swahili](https://huggingface.co/datasets/bookbot/ALFFA_swahili)
|
19 |
+
- [FLEURS Swahili](https://huggingface.co/datasets/bookbot/fleurs_sw)
|
20 |
+
- [Common Voice 16.1 Swahili](https://huggingface.co/datasets/bookbot/common_voice_16_1_sw)
|
21 |
+
|
22 |
+
Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. `["w", "ɑ", "ʃ", "i", "ɑ"]`. Therefore, the model's [vocabulary](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/blob/main/data/lang_phone/tokens.txt) contains the different IPA phonemes found in [gruut](https://github.com/rhasspy/gruut).
|
23 |
+
|
24 |
+
This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on a Scaleway RENDER-S VM with a NVIDIA H100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/zipformer-streaming-robust-sw/tensorboard) logged via Tensorboard.
|
25 |
+
|
26 |
+
## Evaluation Results
|
27 |
+
|
28 |
+
### Simulated Streaming
|
29 |
+
|
30 |
+
```sh
|
31 |
+
for m in greedy_search fast_beam_search modified_beam_search; do
|
32 |
+
./zipformer/decode.py \
|
33 |
+
--epoch 40 \
|
34 |
+
--avg 7 \
|
35 |
+
--causal 1 \
|
36 |
+
--chunk-size 32 \
|
37 |
+
--left-context-frames 128 \
|
38 |
+
--exp-dir zipformer/exp-causal \
|
39 |
+
--use-transducer True --use-ctc True \
|
40 |
+
--decoding-method $m
|
41 |
+
done
|
42 |
+
```
|
43 |
+
|
44 |
+
```sh
|
45 |
+
./zipformer/ctc_decode.py \
|
46 |
+
--epoch 40 \
|
47 |
+
--avg 7 \
|
48 |
+
--causal 1 \
|
49 |
+
--chunk-size 32 \
|
50 |
+
--left-context-frames 128 \
|
51 |
+
--exp-dir zipformer/exp-causal \
|
52 |
+
--decoding-method ctc-decoding \
|
53 |
+
--use-transducer True --use-ctc True
|
54 |
+
```
|
55 |
+
|
56 |
+
The model achieves the following phoneme error rates on the different test sets:
|
57 |
+
|
58 |
+
| Decoding | Common Voice 16.1 | FLEURS |
|
59 |
+
| -------------------- | :---------------: | :----: |
|
60 |
+
| Greedy Search | 7.71 | 6.58 |
|
61 |
+
| Modified Beam Search | 7.53 | 6.4 |
|
62 |
+
| Fast Beam Search | 7.73 | 6.61 |
|
63 |
+
| CTC Greedy Search | 7.78 | 6.72 |
|
64 |
+
|
65 |
+
### Chunk-wise Streaming
|
66 |
+
|
67 |
+
```sh
|
68 |
+
for m in greedy_search fast_beam_search modified_beam_search; do
|
69 |
+
./zipformer/streaming_decode.py \
|
70 |
+
--epoch 40 \
|
71 |
+
--avg 7 \
|
72 |
+
--causal 1 \
|
73 |
+
--chunk-size 32 \
|
74 |
+
--left-context-frames 128 \
|
75 |
+
--exp-dir zipformer/exp-causal \
|
76 |
+
--use-transducer True --use-ctc True \
|
77 |
+
--decoding-method $m \
|
78 |
+
--num-decode-streams 1000
|
79 |
+
done
|
80 |
+
```
|
81 |
+
|
82 |
+
The model achieves the following phoneme error rates on the different test sets:
|
83 |
+
|
84 |
+
| Decoding | Common Voice 16.1 | FLEURS |
|
85 |
+
| -------------------- | :---------------: | :----: |
|
86 |
+
| Greedy Search | 7.75 | 6.59 |
|
87 |
+
| Modified Beam Search | 7.57 | 6.37 |
|
88 |
+
| Fast Beam Search | 7.72 | 6.44 |
|
89 |
+
|
90 |
+
## Usage
|
91 |
+
|
92 |
+
### Download Pre-trained Model
|
93 |
+
|
94 |
+
```sh
|
95 |
+
cd egs/bookbot_sw/ASR
|
96 |
+
mkdir tmp
|
97 |
+
cd tmp
|
98 |
+
git lfs install
|
99 |
+
git clone https://huggingface.co/bookbot/zipformer-streaming-robust-sw/
|
100 |
+
```
|
101 |
+
|
102 |
+
### Inference
|
103 |
+
|
104 |
+
To decode with greedy search, run:
|
105 |
+
|
106 |
+
```sh
|
107 |
+
./zipformer/jit_pretrained_streaming.py \
|
108 |
+
--nn-model-filename ./tmp/zipformer-streaming-robust-sw/exp-causal/jit_script_chunk_32_left_128.pt \
|
109 |
+
--tokens ./tmp/zipformer-streaming-robust-sw/data/lang_phone/tokens.txt \
|
110 |
+
./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
|
111 |
+
```
|
112 |
+
|
113 |
+
<details>
|
114 |
+
<summary>Decoding Output</summary>
|
115 |
+
|
116 |
+
```
|
117 |
+
2024-03-07 11:07:41,231 INFO [jit_pretrained_streaming.py:184] device: cuda:0
|
118 |
+
2024-03-07 11:07:41,865 INFO [jit_pretrained_streaming.py:197] Constructing Fbank computer
|
119 |
+
2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:200] Reading sound files: ./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
|
120 |
+
2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:205] torch.Size([125568])
|
121 |
+
2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:207] Decoding started
|
122 |
+
2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:212] chunk_length: 64
|
123 |
+
2024-03-07 11:07:41,866 INFO [jit_pretrained_streaming.py:213] T: 77
|
124 |
+
2024-03-07 11:07:41,876 INFO [jit_pretrained_streaming.py:229] 0/130368
|
125 |
+
2024-03-07 11:07:41,877 INFO [jit_pretrained_streaming.py:229] 4000/130368
|
126 |
+
2024-03-07 11:07:41,878 INFO [jit_pretrained_streaming.py:229] 8000/130368
|
127 |
+
2024-03-07 11:07:41,879 INFO [jit_pretrained_streaming.py:229] 12000/130368
|
128 |
+
2024-03-07 11:07:42,103 INFO [jit_pretrained_streaming.py:229] 16000/130368
|
129 |
+
2024-03-07 11:07:42,104 INFO [jit_pretrained_streaming.py:229] 20000/130368
|
130 |
+
2024-03-07 11:07:42,126 INFO [jit_pretrained_streaming.py:229] 24000/130368
|
131 |
+
2024-03-07 11:07:42,127 INFO [jit_pretrained_streaming.py:229] 28000/130368
|
132 |
+
2024-03-07 11:07:42,128 INFO [jit_pretrained_streaming.py:229] 32000/130368
|
133 |
+
2024-03-07 11:07:42,151 INFO [jit_pretrained_streaming.py:229] 36000/130368
|
134 |
+
2024-03-07 11:07:42,152 INFO [jit_pretrained_streaming.py:229] 40000/130368
|
135 |
+
2024-03-07 11:07:42,175 INFO [jit_pretrained_streaming.py:229] 44000/130368
|
136 |
+
2024-03-07 11:07:42,176 INFO [jit_pretrained_streaming.py:229] 48000/130368
|
137 |
+
2024-03-07 11:07:42,177 INFO [jit_pretrained_streaming.py:229] 52000/130368
|
138 |
+
2024-03-07 11:07:42,200 INFO [jit_pretrained_streaming.py:229] 56000/130368
|
139 |
+
2024-03-07 11:07:42,201 INFO [jit_pretrained_streaming.py:229] 60000/130368
|
140 |
+
2024-03-07 11:07:42,224 INFO [jit_pretrained_streaming.py:229] 64000/130368
|
141 |
+
2024-03-07 11:07:42,226 INFO [jit_pretrained_streaming.py:229] 68000/130368
|
142 |
+
2024-03-07 11:07:42,226 INFO [jit_pretrained_streaming.py:229] 72000/130368
|
143 |
+
2024-03-07 11:07:42,250 INFO [jit_pretrained_streaming.py:229] 76000/130368
|
144 |
+
2024-03-07 11:07:42,251 INFO [jit_pretrained_streaming.py:229] 80000/130368
|
145 |
+
2024-03-07 11:07:42,252 INFO [jit_pretrained_streaming.py:229] 84000/130368
|
146 |
+
2024-03-07 11:07:42,275 INFO [jit_pretrained_streaming.py:229] 88000/130368
|
147 |
+
2024-03-07 11:07:42,276 INFO [jit_pretrained_streaming.py:229] 92000/130368
|
148 |
+
2024-03-07 11:07:42,299 INFO [jit_pretrained_streaming.py:229] 96000/130368
|
149 |
+
2024-03-07 11:07:42,300 INFO [jit_pretrained_streaming.py:229] 100000/130368
|
150 |
+
2024-03-07 11:07:42,301 INFO [jit_pretrained_streaming.py:229] 104000/130368
|
151 |
+
2024-03-07 11:07:42,325 INFO [jit_pretrained_streaming.py:229] 108000/130368
|
152 |
+
2024-03-07 11:07:42,326 INFO [jit_pretrained_streaming.py:229] 112000/130368
|
153 |
+
2024-03-07 11:07:42,349 INFO [jit_pretrained_streaming.py:229] 116000/130368
|
154 |
+
2024-03-07 11:07:42,350 INFO [jit_pretrained_streaming.py:229] 120000/130368
|
155 |
+
2024-03-07 11:07:42,351 INFO [jit_pretrained_streaming.py:229] 124000/130368
|
156 |
+
2024-03-07 11:07:42,373 INFO [jit_pretrained_streaming.py:229] 128000/130368
|
157 |
+
2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:259] ./tmp/zipformer-streaming-robust-sw/test_waves/sample1.wav
|
158 |
+
2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:260] ʃiɑ|ɑᵐɓɑɔ|wɑnɑiʃi|hɑsɑ|kɑtikɑ|ɛnɛɔ|lɑ|mɑʃɑɾiki|kɑtikɑ|ufɑlmɛ|huɔ|wɛnjɛ|utɑʄiɾi|wɑ|mɑfutɑ
|
159 |
+
2024-03-07 11:07:42,374 INFO [jit_pretrained_streaming.py:262] Decoding Done
|
160 |
+
```
|
161 |
+
|
162 |
+
</details>
|
163 |
+
|
164 |
+
## Training procedure
|
165 |
+
|
166 |
+
### Install icefall
|
167 |
+
|
168 |
+
```sh
|
169 |
+
git clone https://github.com/bookbot-hive/icefall
|
170 |
+
cd icefall
|
171 |
+
export PYTHONPATH=`pwd`:$PYTHONPATH
|
172 |
+
```
|
173 |
+
|
174 |
+
### Prepare Data
|
175 |
+
|
176 |
+
```sh
|
177 |
+
cd egs/bookbot_sw/ASR
|
178 |
+
./prepare.sh
|
179 |
+
```
|
180 |
+
|
181 |
+
### Train
|
182 |
+
|
183 |
+
```sh
|
184 |
+
export CUDA_VISIBLE_DEVICES="0"
|
185 |
+
./zipformer/train.py \
|
186 |
+
--num-epochs 40 \
|
187 |
+
--use-fp16 1 \
|
188 |
+
--exp-dir zipformer/exp-causal \
|
189 |
+
--causal 1 \
|
190 |
+
--max-duration 800 \
|
191 |
+
--use-transducer True --use-ctc True
|
192 |
+
```
|
193 |
+
|
194 |
+
## Frameworks
|
195 |
+
|
196 |
+
- [k2](https://github.com/k2-fsa/k2)
|
197 |
+
- [icefall](https://github.com/bookbot-hive/icefall)
|
198 |
+
- [lhotse](https://github.com/bookbot-hive/lhotse)
|
data/lang_phone/L.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:521562864ec9620dcf30c713f16614a861b4570d6f633e1c5a006b8743a3a304
|
3 |
+
size 1679
|
data/lang_phone/L_disambig.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a2fb6bfaace3c1d9b8c0472e64a5621422eb0222ec4917875bde509e5ace233a
|
3 |
+
size 1715
|
data/lang_phone/Linv.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:29794d6988b503cfcec0bd6e7dcbe1f0450442c31e162820214429accafaaa3d
|
3 |
+
size 1691
|
data/lang_phone/lexicon.txt
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
f f
|
2 |
+
h h
|
3 |
+
i i
|
4 |
+
j j
|
5 |
+
k k
|
6 |
+
l l
|
7 |
+
m m
|
8 |
+
n n
|
9 |
+
p p
|
10 |
+
s s
|
11 |
+
t t
|
12 |
+
t͡ʃ t͡ʃ
|
13 |
+
u u
|
14 |
+
v v
|
15 |
+
w w
|
16 |
+
x x
|
17 |
+
z z
|
18 |
+
| |
|
19 |
+
ð ð
|
20 |
+
ɑ ɑ
|
21 |
+
ɓ ɓ
|
22 |
+
ɔ ɔ
|
23 |
+
ɗ ɗ
|
24 |
+
ɛ ɛ
|
25 |
+
ɠ ɠ
|
26 |
+
ɣ ɣ
|
27 |
+
ɾ ɾ
|
28 |
+
ʃ ʃ
|
29 |
+
ʄ ʄ
|
30 |
+
θ θ
|
31 |
+
ᵐɓ ᵐɓ
|
32 |
+
ᵑg ᵑg
|
33 |
+
ᶬv ᶬv
|
34 |
+
ⁿz ⁿz
|
35 |
+
ⁿɗ ⁿɗ
|
36 |
+
ⁿɗ͡ʒ ⁿɗ͡ʒ
|
37 |
+
<UNK> <UNK>
|
data/lang_phone/lexicon_disambig.txt
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
f f
|
2 |
+
h h
|
3 |
+
i i
|
4 |
+
j j
|
5 |
+
k k
|
6 |
+
l l
|
7 |
+
m m
|
8 |
+
n n
|
9 |
+
p p
|
10 |
+
s s
|
11 |
+
t t
|
12 |
+
t͡ʃ t͡ʃ
|
13 |
+
u u
|
14 |
+
v v
|
15 |
+
w w
|
16 |
+
x x
|
17 |
+
z z
|
18 |
+
| |
|
19 |
+
ð ð
|
20 |
+
ɑ ɑ
|
21 |
+
ɓ ɓ
|
22 |
+
ɔ ɔ
|
23 |
+
ɗ ɗ
|
24 |
+
ɛ ɛ
|
25 |
+
ɠ ɠ
|
26 |
+
ɣ ɣ
|
27 |
+
ɾ ɾ
|
28 |
+
ʃ ʃ
|
29 |
+
ʄ ʄ
|
30 |
+
θ θ
|
31 |
+
ᵐɓ ᵐɓ
|
32 |
+
ᵑg ᵑg
|
33 |
+
ᶬv ᶬv
|
34 |
+
ⁿz ⁿz
|
35 |
+
ⁿɗ ⁿɗ
|
36 |
+
ⁿɗ͡ʒ ⁿɗ͡ʒ
|
37 |
+
<UNK> <UNK>
|
data/lang_phone/tokens.txt
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<eps> 0
|
2 |
+
s 1
|
3 |
+
ð 2
|
4 |
+
ᵑg 3
|
5 |
+
ᶬv 4
|
6 |
+
ʃ 5
|
7 |
+
ɔ 6
|
8 |
+
x 7
|
9 |
+
t 8
|
10 |
+
ɛ 9
|
11 |
+
v 10
|
12 |
+
ⁿɗ͡ʒ 11
|
13 |
+
f 12
|
14 |
+
n 13
|
15 |
+
| 14
|
16 |
+
ⁿz 15
|
17 |
+
k 16
|
18 |
+
h 17
|
19 |
+
t͡ʃ 18
|
20 |
+
<UNK> 19
|
21 |
+
ɗ 20
|
22 |
+
z 21
|
23 |
+
m 22
|
24 |
+
ʄ 23
|
25 |
+
ɠ 24
|
26 |
+
θ 25
|
27 |
+
j 26
|
28 |
+
ᵐɓ 27
|
29 |
+
u 28
|
30 |
+
ɣ 29
|
31 |
+
ɓ 30
|
32 |
+
i 31
|
33 |
+
l 32
|
34 |
+
ɾ 33
|
35 |
+
ⁿɗ 34
|
36 |
+
w 35
|
37 |
+
p 36
|
38 |
+
ɑ 37
|
39 |
+
#0 38
|
data/lang_phone/words.txt
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<eps> 0
|
2 |
+
<UNK> 1
|
3 |
+
f 2
|
4 |
+
h 3
|
5 |
+
i 4
|
6 |
+
j 5
|
7 |
+
k 6
|
8 |
+
l 7
|
9 |
+
m 8
|
10 |
+
n 9
|
11 |
+
p 10
|
12 |
+
s 11
|
13 |
+
t 12
|
14 |
+
t͡ʃ 13
|
15 |
+
u 14
|
16 |
+
v 15
|
17 |
+
w 16
|
18 |
+
x 17
|
19 |
+
z 18
|
20 |
+
| 19
|
21 |
+
ð 20
|
22 |
+
ɑ 21
|
23 |
+
ɓ 22
|
24 |
+
ɔ 23
|
25 |
+
ɗ 24
|
26 |
+
ɛ 25
|
27 |
+
ɠ 26
|
28 |
+
ɣ 27
|
29 |
+
ɾ 28
|
30 |
+
ʃ 29
|
31 |
+
ʄ 30
|
32 |
+
θ 31
|
33 |
+
ᵐɓ 32
|
34 |
+
ᵑg 33
|
35 |
+
ᶬv 34
|
36 |
+
ⁿz 35
|
37 |
+
ⁿɗ 36
|
38 |
+
ⁿɗ͡ʒ 37
|
39 |
+
#0 38
|
40 |
+
<s> 39
|
41 |
+
</s> 40
|
exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-57-38
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 10:57:38,784 INFO [ctc_decode.py:631] Decoding started
|
2 |
+
2024-03-07 10:57:38,784 INFO [ctc_decode.py:637] Device: cuda:0
|
3 |
+
2024-03-07 10:57:38,784 INFO [ctc_decode.py:638] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'frame_shift_ms': 10, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'context_size': 2, 'decoding_method': 'ctc-decoding', 'num_paths': 100, 'nbest_scale': 1.0, 'hlg_scale': 0.6, 'lm_dir': PosixPath('data/lm'), 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/ctc-decoding'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model'}
|
4 |
+
2024-03-07 10:57:38,784 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
5 |
+
2024-03-07 10:57:38,948 INFO [ctc_decode.py:713] About to create model
|
6 |
+
2024-03-07 10:57:39,170 INFO [ctc_decode.py:780] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 10:57:39,809 INFO [ctc_decode.py:797] Number of model parameters: 65182863
|
exp-causal/ctc-decoding/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-10-59-38
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 10:59:38,354 INFO [ctc_decode.py:621] Decoding started
|
2 |
+
2024-03-07 10:59:38,354 INFO [ctc_decode.py:627] Device: cuda:0
|
3 |
+
2024-03-07 10:59:38,354 INFO [ctc_decode.py:628] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'frame_shift_ms': 10, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'context_size': 2, 'decoding_method': 'ctc-decoding', 'num_paths': 100, 'nbest_scale': 1.0, 'hlg_scale': 0.6, 'lm_dir': PosixPath('data/lm'), 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/ctc-decoding'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model'}
|
4 |
+
2024-03-07 10:59:38,355 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
5 |
+
2024-03-07 10:59:38,522 INFO [ctc_decode.py:701] About to create model
|
6 |
+
2024-03-07 10:59:38,744 INFO [ctc_decode.py:756] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 10:59:39,391 INFO [ctc_decode.py:772] Number of model parameters: 65182863
|
8 |
+
2024-03-07 10:59:39,392 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 10:59:39,392 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 10:59:39,392 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 10:59:39,392 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 10:59:39,992 INFO [ctc_decode.py:542] batch 0/?, cuts processed until now is 11
|
13 |
+
2024-03-07 10:59:44,584 INFO [ctc_decode.py:556] The transcripts are stored in zipformer/exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
14 |
+
2024-03-07 10:59:44,625 INFO [utils.py:656] [test-fleurs-ctc-decoding] %WER 6.72% [4137 / 61587, 1757 ins, 1036 del, 1344 sub ]
|
15 |
+
2024-03-07 10:59:44,719 INFO [ctc_decode.py:565] Wrote detailed error stats to zipformer/exp-causal/ctc-decoding/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
16 |
+
2024-03-07 10:59:44,719 INFO [ctc_decode.py:579]
|
17 |
+
For test-fleurs, WER of different settings are:
|
18 |
+
ctc-decoding 6.72 best for test-fleurs
|
19 |
+
|
20 |
+
2024-03-07 10:59:45,379 INFO [ctc_decode.py:542] batch 0/?, cuts processed until now is 28
|
21 |
+
2024-03-07 10:59:52,644 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4930, 1.6467, 1.4691, 1.4399, 1.6736, 1.6094, 1.5041, 1.7942],
|
22 |
+
device='cuda:0')
|
23 |
+
2024-03-07 10:59:53,068 INFO [ctc_decode.py:542] batch 100/?, cuts processed until now is 3210
|
24 |
+
2024-03-07 10:59:57,567 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6248, 2.7026, 2.6528, 1.9603], device='cuda:0')
|
25 |
+
2024-03-07 10:59:58,159 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1807, 3.0239, 2.3377, 2.7052], device='cuda:0')
|
26 |
+
2024-03-07 10:59:58,511 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.7430, 1.7438, 1.6404, 1.8630, 1.9495, 1.9508, 2.0193, 1.6035],
|
27 |
+
device='cuda:0')
|
28 |
+
2024-03-07 11:00:00,641 INFO [ctc_decode.py:542] batch 200/?, cuts processed until now is 6582
|
29 |
+
2024-03-07 11:00:08,194 INFO [ctc_decode.py:542] batch 300/?, cuts processed until now is 9972
|
30 |
+
2024-03-07 11:00:13,903 INFO [ctc_decode.py:556] The transcripts are stored in zipformer/exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
31 |
+
2024-03-07 11:00:14,287 INFO [utils.py:656] [test-commonvoice-ctc-decoding] %WER 7.78% [48615 / 624874, 12396 ins, 20104 del, 16115 sub ]
|
32 |
+
2024-03-07 11:00:15,129 INFO [ctc_decode.py:565] Wrote detailed error stats to zipformer/exp-causal/ctc-decoding/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
33 |
+
2024-03-07 11:00:15,130 INFO [ctc_decode.py:579]
|
34 |
+
For test-commonvoice, WER of different settings are:
|
35 |
+
ctc-decoding 7.78 best for test-commonvoice
|
36 |
+
|
37 |
+
2024-03-07 11:00:15,130 INFO [ctc_decode.py:806] Done!
|
exp-causal/ctc-decoding/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/ctc-decoding/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/ctc-decoding/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
ctc-decoding 7.78
|
exp-causal/ctc-decoding/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
ctc-decoding 6.72
|
exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2024-03-07-08-39-05
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 08:39:05,727 INFO [decode.py:764] Decoding started
|
2 |
+
2024-03-07 08:39:05,727 INFO [decode.py:770] Device: cuda:0
|
3 |
+
2024-03-07 08:39:05,727 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2024-03-07 08:39:05,728 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/fast_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
|
5 |
+
2024-03-07 08:39:05,728 INFO [decode.py:780] About to create model
|
6 |
+
2024-03-07 08:39:05,976 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 08:39:06,839 INFO [decode.py:908] Number of model parameters: 65182863
|
8 |
+
2024-03-07 08:39:06,839 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 08:39:06,839 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 08:39:06,839 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 08:39:06,839 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 08:39:07,885 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
|
13 |
+
2024-03-07 08:39:09,245 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2843, 3.2412, 2.7568, 2.6566], device='cuda:0')
|
14 |
+
2024-03-07 08:39:17,182 INFO [decode.py:651] batch 20/?, cuts processed until now is 270
|
15 |
+
2024-03-07 08:39:26,436 INFO [decode.py:651] batch 40/?, cuts processed until now is 487
|
16 |
+
2024-03-07 08:39:26,499 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
17 |
+
2024-03-07 08:39:26,539 INFO [utils.py:656] [test-fleurs-beam_20.0_max_contexts_8_max_states_64] %WER 6.61% [4072 / 61587, 1548 ins, 1229 del, 1295 sub ]
|
18 |
+
2024-03-07 08:39:26,632 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
19 |
+
2024-03-07 08:39:26,633 INFO [decode.py:690]
|
20 |
+
For test-fleurs, WER of different settings are:
|
21 |
+
beam_20.0_max_contexts_8_max_states_64 6.61 best for test-fleurs
|
22 |
+
|
23 |
+
2024-03-07 08:39:27,522 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
|
24 |
+
2024-03-07 08:39:31,747 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.3396, 2.5719, 2.6157, 2.8669], device='cuda:0')
|
25 |
+
2024-03-07 08:39:32,952 INFO [decode.py:651] batch 20/?, cuts processed until now is 628
|
26 |
+
2024-03-07 08:39:38,196 INFO [decode.py:651] batch 40/?, cuts processed until now is 1253
|
27 |
+
2024-03-07 08:39:43,193 INFO [decode.py:651] batch 60/?, cuts processed until now is 1940
|
28 |
+
2024-03-07 08:39:45,893 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1773, 2.6102, 2.7401, 2.4829], device='cuda:0')
|
29 |
+
2024-03-07 08:39:46,485 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.7060, 2.8868, 2.8695, 2.0508], device='cuda:0')
|
30 |
+
2024-03-07 08:39:48,212 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.4578, 4.5392, 4.1744, 4.1141], device='cuda:0')
|
31 |
+
2024-03-07 08:39:48,733 INFO [decode.py:651] batch 80/?, cuts processed until now is 2513
|
32 |
+
2024-03-07 08:39:49,794 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.1706, 2.6006, 2.7836, 2.5051], device='cuda:0')
|
33 |
+
2024-03-07 08:39:51,031 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5083, 4.5611, 4.1912, 4.1362], device='cuda:0')
|
34 |
+
2024-03-07 08:39:52,617 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.1998, 2.3700, 2.7924, 2.7577], device='cuda:0')
|
35 |
+
2024-03-07 08:39:53,711 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
|
36 |
+
2024-03-07 08:39:59,086 INFO [decode.py:651] batch 120/?, cuts processed until now is 3814
|
37 |
+
2024-03-07 08:40:04,040 INFO [decode.py:651] batch 140/?, cuts processed until now is 4529
|
38 |
+
2024-03-07 08:40:08,988 INFO [decode.py:651] batch 160/?, cuts processed until now is 5256
|
39 |
+
2024-03-07 08:40:09,916 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5227, 4.0025, 4.1148, 4.3108], device='cuda:0')
|
40 |
+
2024-03-07 08:40:11,126 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4430, 1.6413, 1.6686, 1.8403, 1.5950, 1.9201, 2.0724, 1.6969],
|
41 |
+
device='cuda:0')
|
42 |
+
2024-03-07 08:40:14,084 INFO [decode.py:651] batch 180/?, cuts processed until now is 5927
|
43 |
+
2024-03-07 08:40:18,458 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.8823, 3.2936, 3.0880, 3.2756], device='cuda:0')
|
44 |
+
2024-03-07 08:40:19,214 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
|
45 |
+
2024-03-07 08:40:22,724 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.5198, 1.6440, 1.5930, 1.8840, 1.5783, 1.8650, 2.0609, 1.6564],
|
46 |
+
device='cuda:0')
|
47 |
+
2024-03-07 08:40:24,453 INFO [decode.py:651] batch 220/?, cuts processed until now is 7221
|
48 |
+
2024-03-07 08:40:29,626 INFO [decode.py:651] batch 240/?, cuts processed until now is 7878
|
49 |
+
2024-03-07 08:40:33,319 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.3668, 2.7808, 2.7429, 2.0601], device='cuda:0')
|
50 |
+
2024-03-07 08:40:34,832 INFO [decode.py:651] batch 260/?, cuts processed until now is 8528
|
51 |
+
2024-03-07 08:40:39,681 INFO [decode.py:651] batch 280/?, cuts processed until now is 9263
|
52 |
+
2024-03-07 08:40:44,602 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
|
53 |
+
2024-03-07 08:40:49,952 INFO [decode.py:651] batch 320/?, cuts processed until now is 10574
|
54 |
+
2024-03-07 08:40:51,376 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0097, 2.9823, 2.5350, 2.4735], device='cuda:0')
|
55 |
+
2024-03-07 08:40:54,529 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.4511, 1.6635, 1.4021, 1.3549, 1.6671, 1.8209, 1.7242, 1.7077],
|
56 |
+
device='cuda:0')
|
57 |
+
2024-03-07 08:40:54,975 INFO [decode.py:651] batch 340/?, cuts processed until now is 11255
|
58 |
+
2024-03-07 08:41:00,108 INFO [decode.py:651] batch 360/?, cuts processed until now is 11900
|
59 |
+
2024-03-07 08:41:03,743 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
60 |
+
2024-03-07 08:41:04,136 INFO [utils.py:656] [test-commonvoice-beam_20.0_max_contexts_8_max_states_64] %WER 7.73% [48322 / 624874, 9678 ins, 23552 del, 15092 sub ]
|
61 |
+
2024-03-07 08:41:04,996 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
62 |
+
2024-03-07 08:41:04,997 INFO [decode.py:690]
|
63 |
+
For test-commonvoice, WER of different settings are:
|
64 |
+
beam_20.0_max_contexts_8_max_states_64 7.73 best for test-commonvoice
|
65 |
+
|
66 |
+
2024-03-07 08:41:04,997 INFO [decode.py:944] Done!
|
exp-causal/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_20.0_max_contexts_8_max_states_64 7.73
|
exp-causal/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_20.0_max_contexts_8_max_states_64 6.61
|
exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model-2024-03-07-08-38-15
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 08:38:15,365 INFO [decode.py:764] Decoding started
|
2 |
+
2024-03-07 08:38:15,365 INFO [decode.py:770] Device: cuda:0
|
3 |
+
2024-03-07 08:38:15,366 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2024-03-07 08:38:15,369 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/greedy_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
|
5 |
+
2024-03-07 08:38:15,369 INFO [decode.py:780] About to create model
|
6 |
+
2024-03-07 08:38:15,616 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 08:38:16,521 INFO [decode.py:908] Number of model parameters: 65182863
|
8 |
+
2024-03-07 08:38:16,521 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 08:38:16,521 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 08:38:16,522 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 08:38:16,522 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 08:38:17,254 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
|
13 |
+
2024-03-07 08:38:21,746 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6919, 1.8132, 1.6442, 1.9278, 2.0388, 1.9352, 2.1089, 1.7078],
|
14 |
+
device='cuda:0')
|
15 |
+
2024-03-07 08:38:23,390 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.4804, 4.7386, 4.4316, 4.4934], device='cuda:0')
|
16 |
+
2024-03-07 08:38:23,574 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
17 |
+
2024-03-07 08:38:23,615 INFO [utils.py:656] [test-fleurs-greedy_search] %WER 6.58% [4054 / 61587, 1612 ins, 1154 del, 1288 sub ]
|
18 |
+
2024-03-07 08:38:23,708 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
19 |
+
2024-03-07 08:38:23,708 INFO [decode.py:690]
|
20 |
+
For test-fleurs, WER of different settings are:
|
21 |
+
greedy_search 6.58 best for test-fleurs
|
22 |
+
|
23 |
+
2024-03-07 08:38:24,424 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
|
24 |
+
2024-03-07 08:38:26,567 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.7743, 3.4609, 3.1923, 3.4857], device='cuda:0')
|
25 |
+
2024-03-07 08:38:29,582 INFO [decode.py:651] batch 50/?, cuts processed until now is 1611
|
26 |
+
2024-03-07 08:38:34,671 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
|
27 |
+
2024-03-07 08:38:38,604 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.8230, 2.9017, 2.8725, 2.2266], device='cuda:0')
|
28 |
+
2024-03-07 08:38:39,685 INFO [decode.py:651] batch 150/?, cuts processed until now is 4896
|
29 |
+
2024-03-07 08:38:44,671 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
|
30 |
+
2024-03-07 08:38:49,727 INFO [decode.py:651] batch 250/?, cuts processed until now is 8173
|
31 |
+
2024-03-07 08:38:50,536 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2378, 3.0733, 2.4014, 2.7650], device='cuda:0')
|
32 |
+
2024-03-07 08:38:50,823 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.5433, 1.8018, 1.6968, 1.5210, 1.7601, 1.7079, 1.5636, 1.8637],
|
33 |
+
device='cuda:0')
|
34 |
+
2024-03-07 08:38:51,210 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0157, 2.9472, 2.5216, 2.4810], device='cuda:0')
|
35 |
+
2024-03-07 08:38:54,561 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
|
36 |
+
2024-03-07 08:38:55,938 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.0921, 4.2424, 3.6664, 3.7351], device='cuda:0')
|
37 |
+
2024-03-07 08:38:59,584 INFO [decode.py:651] batch 350/?, cuts processed until now is 11592
|
38 |
+
2024-03-07 08:39:00,118 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([4.5326, 4.0145, 4.0918, 4.3280], device='cuda:0')
|
39 |
+
2024-03-07 08:39:02,056 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
40 |
+
2024-03-07 08:39:02,445 INFO [utils.py:656] [test-commonvoice-greedy_search] %WER 7.71% [48192 / 624874, 10414 ins, 21811 del, 15967 sub ]
|
41 |
+
2024-03-07 08:39:03,298 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
42 |
+
2024-03-07 08:39:03,298 INFO [decode.py:690]
|
43 |
+
For test-commonvoice, WER of different settings are:
|
44 |
+
greedy_search 7.71 best for test-commonvoice
|
45 |
+
|
46 |
+
2024-03-07 08:39:03,299 INFO [decode.py:944] Done!
|
exp-causal/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/greedy_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
greedy_search 7.71
|
exp-causal/greedy_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
greedy_search 6.58
|
exp-causal/jit_script_chunk_32_left_128.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1f7af6590fca59f1026cad0a0a2abee9332f146142d8e5c6d0b4975b6ce35f97
|
3 |
+
size 263594678
|
exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/modified_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model-2024-03-07-08-41-07
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 08:41:07,436 INFO [decode.py:764] Decoding started
|
2 |
+
2024-03-07 08:41:07,436 INFO [decode.py:770] Device: cuda:0
|
3 |
+
2024-03-07 08:41:07,437 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2024-03-07 08:41:07,437 INFO [decode.py:778] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-clean', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'modified_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 3, 'backoff_id': 500, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': PosixPath('zipformer/exp-causal/modified_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
|
5 |
+
2024-03-07 08:41:07,438 INFO [decode.py:780] About to create model
|
6 |
+
2024-03-07 08:41:07,691 INFO [decode.py:847] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 08:41:08,529 INFO [decode.py:908] Number of model parameters: 65182863
|
8 |
+
2024-03-07 08:41:08,529 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 08:41:08,529 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 08:41:08,530 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 08:41:08,530 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 08:41:10,007 INFO [decode.py:651] batch 0/?, cuts processed until now is 11
|
13 |
+
2024-03-07 08:41:24,636 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.9650, 3.6353, 3.4530, 3.6656], device='cuda:0')
|
14 |
+
2024-03-07 08:41:27,015 INFO [decode.py:651] batch 20/?, cuts processed until now is 270
|
15 |
+
2024-03-07 08:41:41,955 INFO [decode.py:651] batch 40/?, cuts processed until now is 487
|
16 |
+
2024-03-07 08:41:42,017 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
|
17 |
+
2024-03-07 08:41:42,058 INFO [utils.py:656] [test-fleurs-beam_size_4] %WER 6.40% [3942 / 61587, 1687 ins, 950 del, 1305 sub ]
|
18 |
+
2024-03-07 08:41:42,153 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/modified_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
|
19 |
+
2024-03-07 08:41:42,153 INFO [decode.py:690]
|
20 |
+
For test-fleurs, WER of different settings are:
|
21 |
+
beam_size_4 6.4 best for test-fleurs
|
22 |
+
|
23 |
+
2024-03-07 08:41:43,500 INFO [decode.py:651] batch 0/?, cuts processed until now is 28
|
24 |
+
2024-03-07 08:41:57,852 INFO [decode.py:651] batch 20/?, cuts processed until now is 628
|
25 |
+
2024-03-07 08:42:07,812 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.1801, 2.3482, 2.7994, 2.7716], device='cuda:0')
|
26 |
+
2024-03-07 08:42:12,022 INFO [decode.py:651] batch 40/?, cuts processed until now is 1253
|
27 |
+
2024-03-07 08:42:21,068 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([1.6309, 1.6829, 1.5267, 1.7630, 1.8531, 1.8195, 1.8756, 1.6031],
|
28 |
+
device='cuda:0')
|
29 |
+
2024-03-07 08:42:25,949 INFO [decode.py:651] batch 60/?, cuts processed until now is 1940
|
30 |
+
2024-03-07 08:42:36,123 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2243, 3.0776, 2.3565, 2.7223], device='cuda:0')
|
31 |
+
2024-03-07 08:42:40,387 INFO [decode.py:651] batch 80/?, cuts processed until now is 2513
|
32 |
+
2024-03-07 08:42:54,535 INFO [decode.py:651] batch 100/?, cuts processed until now is 3210
|
33 |
+
2024-03-07 08:43:08,702 INFO [decode.py:651] batch 120/?, cuts processed until now is 3814
|
34 |
+
2024-03-07 08:43:22,478 INFO [decode.py:651] batch 140/?, cuts processed until now is 4529
|
35 |
+
2024-03-07 08:43:30,734 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.2739, 2.6799, 2.8831, 2.6336], device='cuda:0')
|
36 |
+
2024-03-07 08:43:36,235 INFO [decode.py:651] batch 160/?, cuts processed until now is 5256
|
37 |
+
2024-03-07 08:43:38,339 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([3.0916, 2.5364, 2.6418, 2.3536], device='cuda:0')
|
38 |
+
2024-03-07 08:43:50,472 INFO [decode.py:651] batch 180/?, cuts processed until now is 5927
|
39 |
+
2024-03-07 08:44:04,600 INFO [decode.py:651] batch 200/?, cuts processed until now is 6582
|
40 |
+
2024-03-07 08:44:18,698 INFO [decode.py:651] batch 220/?, cuts processed until now is 7221
|
41 |
+
2024-03-07 08:44:32,720 INFO [decode.py:651] batch 240/?, cuts processed until now is 7878
|
42 |
+
2024-03-07 08:44:46,812 INFO [decode.py:651] batch 260/?, cuts processed until now is 8528
|
43 |
+
2024-03-07 08:45:00,678 INFO [decode.py:651] batch 280/?, cuts processed until now is 9263
|
44 |
+
2024-03-07 08:45:02,883 INFO [zipformer.py:1858] name=None, attn_weights_entropy = tensor([2.2417, 2.2166, 2.4622, 2.6655], device='cuda:0')
|
45 |
+
2024-03-07 08:45:14,478 INFO [decode.py:651] batch 300/?, cuts processed until now is 9972
|
46 |
+
2024-03-07 08:45:28,719 INFO [decode.py:651] batch 320/?, cuts processed until now is 10574
|
47 |
+
2024-03-07 08:45:42,779 INFO [decode.py:651] batch 340/?, cuts processed until now is 11255
|
48 |
+
2024-03-07 08:45:56,922 INFO [decode.py:651] batch 360/?, cuts processed until now is 11900
|
49 |
+
2024-03-07 08:46:05,025 INFO [decode.py:665] The transcripts are stored in zipformer/exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
|
50 |
+
2024-03-07 08:46:05,434 INFO [utils.py:656] [test-commonvoice-beam_size_4] %WER 7.53% [47039 / 624874, 11050 ins, 20152 del, 15837 sub ]
|
51 |
+
2024-03-07 08:46:06,304 INFO [decode.py:676] Wrote detailed error stats to zipformer/exp-causal/modified_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
|
52 |
+
2024-03-07 08:46:06,304 INFO [decode.py:690]
|
53 |
+
For test-commonvoice, WER of different settings are:
|
54 |
+
beam_size_4 7.53 best for test-commonvoice
|
55 |
+
|
56 |
+
2024-03-07 08:46:06,304 INFO [decode.py:944] Done!
|
exp-causal/modified_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/modified_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/modified_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_size_4 7.53
|
exp-causal/modified_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_size_4 6.4
|
exp-causal/pretrained.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a57d5529d997fe029a18bccae562257f58bdc66d4eb2648dc4526c66618e8d8
|
3 |
+
size 261184016
|
exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/fast_beam_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model-2024-03-07-08-56-46
ADDED
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 08:56:46,168 INFO [streaming_decode.py:723] Decoding started
|
2 |
+
2024-03-07 08:56:46,168 INFO [streaming_decode.py:729] Device: cuda:0
|
3 |
+
2024-03-07 08:56:46,168 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2024-03-07 08:56:46,170 INFO [streaming_decode.py:737] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1000, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/streaming/fast_beam_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
|
5 |
+
2024-03-07 08:56:46,170 INFO [streaming_decode.py:739] About to create model
|
6 |
+
2024-03-07 08:56:46,400 INFO [streaming_decode.py:806] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 08:56:47,216 INFO [streaming_decode.py:828] Number of model parameters: 65182863
|
8 |
+
2024-03-07 08:56:47,216 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 08:56:47,216 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 08:56:47,217 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 08:56:47,217 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 08:56:47,250 INFO [streaming_decode.py:615] Cuts processed until now is 0.
|
13 |
+
2024-03-07 08:56:47,505 INFO [streaming_decode.py:615] Cuts processed until now is 100.
|
14 |
+
2024-03-07 08:56:47,761 INFO [streaming_decode.py:615] Cuts processed until now is 200.
|
15 |
+
2024-03-07 08:56:48,125 INFO [streaming_decode.py:615] Cuts processed until now is 300.
|
16 |
+
2024-03-07 08:56:48,388 INFO [streaming_decode.py:615] Cuts processed until now is 400.
|
17 |
+
2024-03-07 08:56:59,328 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
18 |
+
2024-03-07 08:56:59,374 INFO [utils.py:656] [test-fleurs-beam_4_max_contexts_4_max_states_32] %WER 6.44% [3966 / 61587, 1562 ins, 1114 del, 1290 sub ]
|
19 |
+
2024-03-07 08:56:59,473 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/fast_beam_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
20 |
+
2024-03-07 08:56:59,473 INFO [streaming_decode.py:685]
|
21 |
+
For test-fleurs, WER of different settings are:
|
22 |
+
beam_4_max_contexts_4_max_states_32 6.44 best for test-fleurs
|
23 |
+
|
24 |
+
2024-03-07 08:56:59,533 INFO [streaming_decode.py:615] Cuts processed until now is 0.
|
25 |
+
2024-03-07 08:56:59,987 INFO [streaming_decode.py:615] Cuts processed until now is 100.
|
26 |
+
2024-03-07 08:57:00,402 INFO [streaming_decode.py:615] Cuts processed until now is 200.
|
27 |
+
2024-03-07 08:57:00,805 INFO [streaming_decode.py:615] Cuts processed until now is 300.
|
28 |
+
2024-03-07 08:57:01,205 INFO [streaming_decode.py:615] Cuts processed until now is 400.
|
29 |
+
2024-03-07 08:57:01,610 INFO [streaming_decode.py:615] Cuts processed until now is 500.
|
30 |
+
2024-03-07 08:57:02,035 INFO [streaming_decode.py:615] Cuts processed until now is 600.
|
31 |
+
2024-03-07 08:57:02,444 INFO [streaming_decode.py:615] Cuts processed until now is 700.
|
32 |
+
2024-03-07 08:57:02,848 INFO [streaming_decode.py:615] Cuts processed until now is 800.
|
33 |
+
2024-03-07 08:57:03,267 INFO [streaming_decode.py:615] Cuts processed until now is 900.
|
34 |
+
2024-03-07 08:57:06,208 INFO [streaming_decode.py:615] Cuts processed until now is 1000.
|
35 |
+
2024-03-07 08:57:07,867 INFO [streaming_decode.py:615] Cuts processed until now is 1100.
|
36 |
+
2024-03-07 08:57:09,022 INFO [streaming_decode.py:615] Cuts processed until now is 1200.
|
37 |
+
2024-03-07 08:57:10,227 INFO [streaming_decode.py:615] Cuts processed until now is 1300.
|
38 |
+
2024-03-07 08:57:11,297 INFO [streaming_decode.py:615] Cuts processed until now is 1400.
|
39 |
+
2024-03-07 08:57:12,537 INFO [streaming_decode.py:615] Cuts processed until now is 1500.
|
40 |
+
2024-03-07 08:57:13,834 INFO [streaming_decode.py:615] Cuts processed until now is 1600.
|
41 |
+
2024-03-07 08:57:14,242 INFO [streaming_decode.py:615] Cuts processed until now is 1700.
|
42 |
+
2024-03-07 08:57:15,348 INFO [streaming_decode.py:615] Cuts processed until now is 1800.
|
43 |
+
2024-03-07 08:57:16,629 INFO [streaming_decode.py:615] Cuts processed until now is 1900.
|
44 |
+
2024-03-07 08:57:17,927 INFO [streaming_decode.py:615] Cuts processed until now is 2000.
|
45 |
+
2024-03-07 08:57:19,033 INFO [streaming_decode.py:615] Cuts processed until now is 2100.
|
46 |
+
2024-03-07 08:57:20,293 INFO [streaming_decode.py:615] Cuts processed until now is 2200.
|
47 |
+
2024-03-07 08:57:21,399 INFO [streaming_decode.py:615] Cuts processed until now is 2300.
|
48 |
+
2024-03-07 08:57:23,520 INFO [streaming_decode.py:615] Cuts processed until now is 2400.
|
49 |
+
2024-03-07 08:57:24,627 INFO [streaming_decode.py:615] Cuts processed until now is 2500.
|
50 |
+
2024-03-07 08:57:25,903 INFO [streaming_decode.py:615] Cuts processed until now is 2600.
|
51 |
+
2024-03-07 08:57:27,193 INFO [streaming_decode.py:615] Cuts processed until now is 2700.
|
52 |
+
2024-03-07 08:57:28,345 INFO [streaming_decode.py:615] Cuts processed until now is 2800.
|
53 |
+
2024-03-07 08:57:28,729 INFO [streaming_decode.py:615] Cuts processed until now is 2900.
|
54 |
+
2024-03-07 08:57:30,011 INFO [streaming_decode.py:615] Cuts processed until now is 3000.
|
55 |
+
2024-03-07 08:57:31,313 INFO [streaming_decode.py:615] Cuts processed until now is 3100.
|
56 |
+
2024-03-07 08:57:32,611 INFO [streaming_decode.py:615] Cuts processed until now is 3200.
|
57 |
+
2024-03-07 08:57:33,723 INFO [streaming_decode.py:615] Cuts processed until now is 3300.
|
58 |
+
2024-03-07 08:57:35,022 INFO [streaming_decode.py:615] Cuts processed until now is 3400.
|
59 |
+
2024-03-07 08:57:36,346 INFO [streaming_decode.py:615] Cuts processed until now is 3500.
|
60 |
+
2024-03-07 08:57:37,464 INFO [streaming_decode.py:615] Cuts processed until now is 3600.
|
61 |
+
2024-03-07 08:57:38,782 INFO [streaming_decode.py:615] Cuts processed until now is 3700.
|
62 |
+
2024-03-07 08:57:40,089 INFO [streaming_decode.py:615] Cuts processed until now is 3800.
|
63 |
+
2024-03-07 08:57:41,219 INFO [streaming_decode.py:615] Cuts processed until now is 3900.
|
64 |
+
2024-03-07 08:57:42,516 INFO [streaming_decode.py:615] Cuts processed until now is 4000.
|
65 |
+
2024-03-07 08:57:43,861 INFO [streaming_decode.py:615] Cuts processed until now is 4100.
|
66 |
+
2024-03-07 08:57:44,985 INFO [streaming_decode.py:615] Cuts processed until now is 4200.
|
67 |
+
2024-03-07 08:57:46,301 INFO [streaming_decode.py:615] Cuts processed until now is 4300.
|
68 |
+
2024-03-07 08:57:47,589 INFO [streaming_decode.py:615] Cuts processed until now is 4400.
|
69 |
+
2024-03-07 08:57:48,724 INFO [streaming_decode.py:615] Cuts processed until now is 4500.
|
70 |
+
2024-03-07 08:57:50,031 INFO [streaming_decode.py:615] Cuts processed until now is 4600.
|
71 |
+
2024-03-07 08:57:51,375 INFO [streaming_decode.py:615] Cuts processed until now is 4700.
|
72 |
+
2024-03-07 08:57:52,513 INFO [streaming_decode.py:615] Cuts processed until now is 4800.
|
73 |
+
2024-03-07 08:57:53,839 INFO [streaming_decode.py:615] Cuts processed until now is 4900.
|
74 |
+
2024-03-07 08:57:54,958 INFO [streaming_decode.py:615] Cuts processed until now is 5000.
|
75 |
+
2024-03-07 08:57:56,282 INFO [streaming_decode.py:615] Cuts processed until now is 5100.
|
76 |
+
2024-03-07 08:57:57,610 INFO [streaming_decode.py:615] Cuts processed until now is 5200.
|
77 |
+
2024-03-07 08:57:58,744 INFO [streaming_decode.py:615] Cuts processed until now is 5300.
|
78 |
+
2024-03-07 08:58:00,031 INFO [streaming_decode.py:615] Cuts processed until now is 5400.
|
79 |
+
2024-03-07 08:58:01,344 INFO [streaming_decode.py:615] Cuts processed until now is 5500.
|
80 |
+
2024-03-07 08:58:02,494 INFO [streaming_decode.py:615] Cuts processed until now is 5600.
|
81 |
+
2024-03-07 08:58:03,800 INFO [streaming_decode.py:615] Cuts processed until now is 5700.
|
82 |
+
2024-03-07 08:58:05,143 INFO [streaming_decode.py:615] Cuts processed until now is 5800.
|
83 |
+
2024-03-07 08:58:06,296 INFO [streaming_decode.py:615] Cuts processed until now is 5900.
|
84 |
+
2024-03-07 08:58:07,598 INFO [streaming_decode.py:615] Cuts processed until now is 6000.
|
85 |
+
2024-03-07 08:58:08,934 INFO [streaming_decode.py:615] Cuts processed until now is 6100.
|
86 |
+
2024-03-07 08:58:10,061 INFO [streaming_decode.py:615] Cuts processed until now is 6200.
|
87 |
+
2024-03-07 08:58:11,363 INFO [streaming_decode.py:615] Cuts processed until now is 6300.
|
88 |
+
2024-03-07 08:58:12,697 INFO [streaming_decode.py:615] Cuts processed until now is 6400.
|
89 |
+
2024-03-07 08:58:13,841 INFO [streaming_decode.py:615] Cuts processed until now is 6500.
|
90 |
+
2024-03-07 08:58:15,142 INFO [streaming_decode.py:615] Cuts processed until now is 6600.
|
91 |
+
2024-03-07 08:58:16,473 INFO [streaming_decode.py:615] Cuts processed until now is 6700.
|
92 |
+
2024-03-07 08:58:17,624 INFO [streaming_decode.py:615] Cuts processed until now is 6800.
|
93 |
+
2024-03-07 08:58:18,925 INFO [streaming_decode.py:615] Cuts processed until now is 6900.
|
94 |
+
2024-03-07 08:58:20,264 INFO [streaming_decode.py:615] Cuts processed until now is 7000.
|
95 |
+
2024-03-07 08:58:21,416 INFO [streaming_decode.py:615] Cuts processed until now is 7100.
|
96 |
+
2024-03-07 08:58:22,705 INFO [streaming_decode.py:615] Cuts processed until now is 7200.
|
97 |
+
2024-03-07 08:58:24,056 INFO [streaming_decode.py:615] Cuts processed until now is 7300.
|
98 |
+
2024-03-07 08:58:25,222 INFO [streaming_decode.py:615] Cuts processed until now is 7400.
|
99 |
+
2024-03-07 08:58:26,532 INFO [streaming_decode.py:615] Cuts processed until now is 7500.
|
100 |
+
2024-03-07 08:58:27,880 INFO [streaming_decode.py:615] Cuts processed until now is 7600.
|
101 |
+
2024-03-07 08:58:29,014 INFO [streaming_decode.py:615] Cuts processed until now is 7700.
|
102 |
+
2024-03-07 08:58:30,320 INFO [streaming_decode.py:615] Cuts processed until now is 7800.
|
103 |
+
2024-03-07 08:58:31,679 INFO [streaming_decode.py:615] Cuts processed until now is 7900.
|
104 |
+
2024-03-07 08:58:32,790 INFO [streaming_decode.py:615] Cuts processed until now is 8000.
|
105 |
+
2024-03-07 08:58:34,120 INFO [streaming_decode.py:615] Cuts processed until now is 8100.
|
106 |
+
2024-03-07 08:58:35,488 INFO [streaming_decode.py:615] Cuts processed until now is 8200.
|
107 |
+
2024-03-07 08:58:36,616 INFO [streaming_decode.py:615] Cuts processed until now is 8300.
|
108 |
+
2024-03-07 08:58:37,953 INFO [streaming_decode.py:615] Cuts processed until now is 8400.
|
109 |
+
2024-03-07 08:58:39,086 INFO [streaming_decode.py:615] Cuts processed until now is 8500.
|
110 |
+
2024-03-07 08:58:40,391 INFO [streaming_decode.py:615] Cuts processed until now is 8600.
|
111 |
+
2024-03-07 08:58:41,731 INFO [streaming_decode.py:615] Cuts processed until now is 8700.
|
112 |
+
2024-03-07 08:58:42,864 INFO [streaming_decode.py:615] Cuts processed until now is 8800.
|
113 |
+
2024-03-07 08:58:44,169 INFO [streaming_decode.py:615] Cuts processed until now is 8900.
|
114 |
+
2024-03-07 08:58:45,542 INFO [streaming_decode.py:615] Cuts processed until now is 9000.
|
115 |
+
2024-03-07 08:58:46,679 INFO [streaming_decode.py:615] Cuts processed until now is 9100.
|
116 |
+
2024-03-07 08:58:47,998 INFO [streaming_decode.py:615] Cuts processed until now is 9200.
|
117 |
+
2024-03-07 08:58:49,366 INFO [streaming_decode.py:615] Cuts processed until now is 9300.
|
118 |
+
2024-03-07 08:58:50,494 INFO [streaming_decode.py:615] Cuts processed until now is 9400.
|
119 |
+
2024-03-07 08:58:51,824 INFO [streaming_decode.py:615] Cuts processed until now is 9500.
|
120 |
+
2024-03-07 08:58:53,228 INFO [streaming_decode.py:615] Cuts processed until now is 9600.
|
121 |
+
2024-03-07 08:58:54,385 INFO [streaming_decode.py:615] Cuts processed until now is 9700.
|
122 |
+
2024-03-07 08:58:55,761 INFO [streaming_decode.py:615] Cuts processed until now is 9800.
|
123 |
+
2024-03-07 08:58:56,178 INFO [streaming_decode.py:615] Cuts processed until now is 9900.
|
124 |
+
2024-03-07 08:58:57,326 INFO [streaming_decode.py:615] Cuts processed until now is 10000.
|
125 |
+
2024-03-07 08:58:59,584 INFO [streaming_decode.py:615] Cuts processed until now is 10100.
|
126 |
+
2024-03-07 08:59:00,723 INFO [streaming_decode.py:615] Cuts processed until now is 10200.
|
127 |
+
2024-03-07 08:59:02,083 INFO [streaming_decode.py:615] Cuts processed until now is 10300.
|
128 |
+
2024-03-07 08:59:03,461 INFO [streaming_decode.py:615] Cuts processed until now is 10400.
|
129 |
+
2024-03-07 08:59:04,588 INFO [streaming_decode.py:615] Cuts processed until now is 10500.
|
130 |
+
2024-03-07 08:59:05,955 INFO [streaming_decode.py:615] Cuts processed until now is 10600.
|
131 |
+
2024-03-07 08:59:07,086 INFO [streaming_decode.py:615] Cuts processed until now is 10700.
|
132 |
+
2024-03-07 08:59:08,428 INFO [streaming_decode.py:615] Cuts processed until now is 10800.
|
133 |
+
2024-03-07 08:59:09,807 INFO [streaming_decode.py:615] Cuts processed until now is 10900.
|
134 |
+
2024-03-07 08:59:10,955 INFO [streaming_decode.py:615] Cuts processed until now is 11000.
|
135 |
+
2024-03-07 08:59:12,305 INFO [streaming_decode.py:615] Cuts processed until now is 11100.
|
136 |
+
2024-03-07 08:59:13,713 INFO [streaming_decode.py:615] Cuts processed until now is 11200.
|
137 |
+
2024-03-07 08:59:14,126 INFO [streaming_decode.py:615] Cuts processed until now is 11300.
|
138 |
+
2024-03-07 08:59:16,212 INFO [streaming_decode.py:615] Cuts processed until now is 11400.
|
139 |
+
2024-03-07 08:59:17,370 INFO [streaming_decode.py:615] Cuts processed until now is 11500.
|
140 |
+
2024-03-07 08:59:18,717 INFO [streaming_decode.py:615] Cuts processed until now is 11600.
|
141 |
+
2024-03-07 08:59:19,119 INFO [streaming_decode.py:615] Cuts processed until now is 11700.
|
142 |
+
2024-03-07 08:59:20,499 INFO [streaming_decode.py:615] Cuts processed until now is 11800.
|
143 |
+
2024-03-07 08:59:21,634 INFO [streaming_decode.py:615] Cuts processed until now is 11900.
|
144 |
+
2024-03-07 08:59:22,973 INFO [streaming_decode.py:615] Cuts processed until now is 12000.
|
145 |
+
2024-03-07 08:59:25,070 INFO [streaming_decode.py:615] Cuts processed until now is 12100.
|
146 |
+
2024-03-07 08:59:26,456 INFO [streaming_decode.py:615] Cuts processed until now is 12200.
|
147 |
+
2024-03-07 08:59:32,292 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
148 |
+
2024-03-07 08:59:32,774 INFO [utils.py:656] [test-commonvoice-beam_4_max_contexts_4_max_states_32] %WER 7.72% [48248 / 624874, 9842 ins, 22929 del, 15477 sub ]
|
149 |
+
2024-03-07 08:59:33,754 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/fast_beam_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
150 |
+
2024-03-07 08:59:33,754 INFO [streaming_decode.py:685]
|
151 |
+
For test-commonvoice, WER of different settings are:
|
152 |
+
beam_4_max_contexts_4_max_states_32 7.72 best for test-commonvoice
|
153 |
+
|
154 |
+
2024-03-07 08:59:33,754 INFO [streaming_decode.py:853] Done!
|
exp-causal/streaming/fast_beam_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/fast_beam_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/fast_beam_search/wer-summary-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_4_max_contexts_4_max_states_32 7.72
|
exp-causal/streaming/fast_beam_search/wer-summary-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_4_max_contexts_4_max_states_32 6.44
|
exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp-causal/streaming/greedy_search/log-decode-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model-2024-03-07-08-54-56
ADDED
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-03-07 08:54:56,454 INFO [streaming_decode.py:723] Decoding started
|
2 |
+
2024-03-07 08:54:56,455 INFO [streaming_decode.py:729] Device: cuda:0
|
3 |
+
2024-03-07 08:54:56,455 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2024-03-07 08:54:56,457 INFO [streaming_decode.py:737] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f6919c0ddb311bea7b53a50f3afdcb3c18b8ccc8', 'k2-git-date': 'Sat Feb 10 09:23:09 2024', 'lhotse-version': '1.22.0.dev+git.9355bd72.clean', 'torch-version': '2.0.0+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'b35406b0-dirty', 'icefall-git-date': 'Thu Mar 7 06:20:34 2024', 'icefall-path': '/root/icefall', 'k2-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/miniconda3/envs/icefall/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'bookbot-h100', 'IP address': '127.0.0.1'}, 'epoch': 40, 'iter': 0, 'avg': 7, 'use_averaged_model': True, 'exp_dir': PosixPath('zipformer/exp-causal'), 'lang_dir': PosixPath('data/lang_phone'), 'decoding_method': 'greedy_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1000, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '32', 'left_context_frames': '128', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('zipformer/exp-causal/streaming/greedy_search'), 'suffix': 'epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model', 'blank_id': 0, 'unk_id': 19, 'vocab_size': 38}
|
5 |
+
2024-03-07 08:54:56,457 INFO [streaming_decode.py:739] About to create model
|
6 |
+
2024-03-07 08:54:56,690 INFO [streaming_decode.py:806] Calculating the averaged model over epoch range from 33 (excluded) to 40
|
7 |
+
2024-03-07 08:54:57,485 INFO [streaming_decode.py:828] Number of model parameters: 65182863
|
8 |
+
2024-03-07 08:54:57,485 INFO [multidataset.py:81] About to get FLEURS test cuts
|
9 |
+
2024-03-07 08:54:57,485 INFO [multidataset.py:83] Loading FLEURS in lazy mode
|
10 |
+
2024-03-07 08:54:57,486 INFO [multidataset.py:90] About to get Common Voice test cuts
|
11 |
+
2024-03-07 08:54:57,486 INFO [multidataset.py:92] Loading Common Voice in lazy mode
|
12 |
+
2024-03-07 08:54:57,520 INFO [streaming_decode.py:615] Cuts processed until now is 0.
|
13 |
+
2024-03-07 08:54:57,771 INFO [streaming_decode.py:615] Cuts processed until now is 100.
|
14 |
+
2024-03-07 08:54:58,025 INFO [streaming_decode.py:615] Cuts processed until now is 200.
|
15 |
+
2024-03-07 08:54:58,389 INFO [streaming_decode.py:615] Cuts processed until now is 300.
|
16 |
+
2024-03-07 08:54:58,649 INFO [streaming_decode.py:615] Cuts processed until now is 400.
|
17 |
+
2024-03-07 08:55:03,961 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/greedy_search/recogs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
18 |
+
2024-03-07 08:55:04,004 INFO [utils.py:656] [test-fleurs-greedy_search] %WER 6.59% [4058 / 61587, 1608 ins, 1149 del, 1301 sub ]
|
19 |
+
2024-03-07 08:55:04,100 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/greedy_search/errs-test-fleurs-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
20 |
+
2024-03-07 08:55:04,101 INFO [streaming_decode.py:685]
|
21 |
+
For test-fleurs, WER of different settings are:
|
22 |
+
greedy_search 6.59 best for test-fleurs
|
23 |
+
|
24 |
+
2024-03-07 08:55:04,160 INFO [streaming_decode.py:615] Cuts processed until now is 0.
|
25 |
+
2024-03-07 08:55:04,586 INFO [streaming_decode.py:615] Cuts processed until now is 100.
|
26 |
+
2024-03-07 08:55:04,991 INFO [streaming_decode.py:615] Cuts processed until now is 200.
|
27 |
+
2024-03-07 08:55:05,379 INFO [streaming_decode.py:615] Cuts processed until now is 300.
|
28 |
+
2024-03-07 08:55:05,766 INFO [streaming_decode.py:615] Cuts processed until now is 400.
|
29 |
+
2024-03-07 08:55:06,160 INFO [streaming_decode.py:615] Cuts processed until now is 500.
|
30 |
+
2024-03-07 08:55:06,568 INFO [streaming_decode.py:615] Cuts processed until now is 600.
|
31 |
+
2024-03-07 08:55:06,968 INFO [streaming_decode.py:615] Cuts processed until now is 700.
|
32 |
+
2024-03-07 08:55:07,371 INFO [streaming_decode.py:615] Cuts processed until now is 800.
|
33 |
+
2024-03-07 08:55:07,784 INFO [streaming_decode.py:615] Cuts processed until now is 900.
|
34 |
+
2024-03-07 08:55:09,627 INFO [streaming_decode.py:615] Cuts processed until now is 1000.
|
35 |
+
2024-03-07 08:55:10,682 INFO [streaming_decode.py:615] Cuts processed until now is 1100.
|
36 |
+
2024-03-07 08:55:11,495 INFO [streaming_decode.py:615] Cuts processed until now is 1200.
|
37 |
+
2024-03-07 08:55:12,174 INFO [streaming_decode.py:615] Cuts processed until now is 1300.
|
38 |
+
2024-03-07 08:55:12,992 INFO [streaming_decode.py:615] Cuts processed until now is 1400.
|
39 |
+
2024-03-07 08:55:13,818 INFO [streaming_decode.py:615] Cuts processed until now is 1500.
|
40 |
+
2024-03-07 08:55:14,510 INFO [streaming_decode.py:615] Cuts processed until now is 1600.
|
41 |
+
2024-03-07 08:55:15,045 INFO [streaming_decode.py:615] Cuts processed until now is 1700.
|
42 |
+
2024-03-07 08:55:15,725 INFO [streaming_decode.py:615] Cuts processed until now is 1800.
|
43 |
+
2024-03-07 08:55:16,557 INFO [streaming_decode.py:615] Cuts processed until now is 1900.
|
44 |
+
2024-03-07 08:55:17,411 INFO [streaming_decode.py:615] Cuts processed until now is 2000.
|
45 |
+
2024-03-07 08:55:18,087 INFO [streaming_decode.py:615] Cuts processed until now is 2100.
|
46 |
+
2024-03-07 08:55:18,919 INFO [streaming_decode.py:615] Cuts processed until now is 2200.
|
47 |
+
2024-03-07 08:55:19,591 INFO [streaming_decode.py:615] Cuts processed until now is 2300.
|
48 |
+
2024-03-07 08:55:20,850 INFO [streaming_decode.py:615] Cuts processed until now is 2400.
|
49 |
+
2024-03-07 08:55:21,527 INFO [streaming_decode.py:615] Cuts processed until now is 2500.
|
50 |
+
2024-03-07 08:55:22,359 INFO [streaming_decode.py:615] Cuts processed until now is 2600.
|
51 |
+
2024-03-07 08:55:23,201 INFO [streaming_decode.py:615] Cuts processed until now is 2700.
|
52 |
+
2024-03-07 08:55:23,888 INFO [streaming_decode.py:615] Cuts processed until now is 2800.
|
53 |
+
2024-03-07 08:55:24,270 INFO [streaming_decode.py:615] Cuts processed until now is 2900.
|
54 |
+
2024-03-07 08:55:25,106 INFO [streaming_decode.py:615] Cuts processed until now is 3000.
|
55 |
+
2024-03-07 08:55:25,958 INFO [streaming_decode.py:615] Cuts processed until now is 3100.
|
56 |
+
2024-03-07 08:55:26,804 INFO [streaming_decode.py:615] Cuts processed until now is 3200.
|
57 |
+
2024-03-07 08:55:27,480 INFO [streaming_decode.py:615] Cuts processed until now is 3300.
|
58 |
+
2024-03-07 08:55:28,327 INFO [streaming_decode.py:615] Cuts processed until now is 3400.
|
59 |
+
2024-03-07 08:55:29,199 INFO [streaming_decode.py:615] Cuts processed until now is 3500.
|
60 |
+
2024-03-07 08:55:29,882 INFO [streaming_decode.py:615] Cuts processed until now is 3600.
|
61 |
+
2024-03-07 08:55:30,749 INFO [streaming_decode.py:615] Cuts processed until now is 3700.
|
62 |
+
2024-03-07 08:55:31,445 INFO [streaming_decode.py:615] Cuts processed until now is 3800.
|
63 |
+
2024-03-07 08:55:32,285 INFO [streaming_decode.py:615] Cuts processed until now is 3900.
|
64 |
+
2024-03-07 08:55:33,146 INFO [streaming_decode.py:615] Cuts processed until now is 4000.
|
65 |
+
2024-03-07 08:55:33,839 INFO [streaming_decode.py:615] Cuts processed until now is 4100.
|
66 |
+
2024-03-07 08:55:34,687 INFO [streaming_decode.py:615] Cuts processed until now is 4200.
|
67 |
+
2024-03-07 08:55:35,556 INFO [streaming_decode.py:615] Cuts processed until now is 4300.
|
68 |
+
2024-03-07 08:55:36,242 INFO [streaming_decode.py:615] Cuts processed until now is 4400.
|
69 |
+
2024-03-07 08:55:37,118 INFO [streaming_decode.py:615] Cuts processed until now is 4500.
|
70 |
+
2024-03-07 08:55:37,989 INFO [streaming_decode.py:615] Cuts processed until now is 4600.
|
71 |
+
2024-03-07 08:55:38,683 INFO [streaming_decode.py:615] Cuts processed until now is 4700.
|
72 |
+
2024-03-07 08:55:39,534 INFO [streaming_decode.py:615] Cuts processed until now is 4800.
|
73 |
+
2024-03-07 08:55:40,399 INFO [streaming_decode.py:615] Cuts processed until now is 4900.
|
74 |
+
2024-03-07 08:55:41,089 INFO [streaming_decode.py:615] Cuts processed until now is 5000.
|
75 |
+
2024-03-07 08:55:41,951 INFO [streaming_decode.py:615] Cuts processed until now is 5100.
|
76 |
+
2024-03-07 08:55:42,815 INFO [streaming_decode.py:615] Cuts processed until now is 5200.
|
77 |
+
2024-03-07 08:55:43,506 INFO [streaming_decode.py:615] Cuts processed until now is 5300.
|
78 |
+
2024-03-07 08:55:44,345 INFO [streaming_decode.py:615] Cuts processed until now is 5400.
|
79 |
+
2024-03-07 08:55:45,218 INFO [streaming_decode.py:615] Cuts processed until now is 5500.
|
80 |
+
2024-03-07 08:55:45,911 INFO [streaming_decode.py:615] Cuts processed until now is 5600.
|
81 |
+
2024-03-07 08:55:46,761 INFO [streaming_decode.py:615] Cuts processed until now is 5700.
|
82 |
+
2024-03-07 08:55:47,641 INFO [streaming_decode.py:615] Cuts processed until now is 5800.
|
83 |
+
2024-03-07 08:55:48,336 INFO [streaming_decode.py:615] Cuts processed until now is 5900.
|
84 |
+
2024-03-07 08:55:49,182 INFO [streaming_decode.py:615] Cuts processed until now is 6000.
|
85 |
+
2024-03-07 08:55:50,051 INFO [streaming_decode.py:615] Cuts processed until now is 6100.
|
86 |
+
2024-03-07 08:55:50,744 INFO [streaming_decode.py:615] Cuts processed until now is 6200.
|
87 |
+
2024-03-07 08:55:51,596 INFO [streaming_decode.py:615] Cuts processed until now is 6300.
|
88 |
+
2024-03-07 08:55:52,469 INFO [streaming_decode.py:615] Cuts processed until now is 6400.
|
89 |
+
2024-03-07 08:55:53,170 INFO [streaming_decode.py:615] Cuts processed until now is 6500.
|
90 |
+
2024-03-07 08:55:54,019 INFO [streaming_decode.py:615] Cuts processed until now is 6600.
|
91 |
+
2024-03-07 08:55:54,889 INFO [streaming_decode.py:615] Cuts processed until now is 6700.
|
92 |
+
2024-03-07 08:55:55,582 INFO [streaming_decode.py:615] Cuts processed until now is 6800.
|
93 |
+
2024-03-07 08:55:56,430 INFO [streaming_decode.py:615] Cuts processed until now is 6900.
|
94 |
+
2024-03-07 08:55:57,304 INFO [streaming_decode.py:615] Cuts processed until now is 7000.
|
95 |
+
2024-03-07 08:55:58,003 INFO [streaming_decode.py:615] Cuts processed until now is 7100.
|
96 |
+
2024-03-07 08:55:58,859 INFO [streaming_decode.py:615] Cuts processed until now is 7200.
|
97 |
+
2024-03-07 08:55:59,737 INFO [streaming_decode.py:615] Cuts processed until now is 7300.
|
98 |
+
2024-03-07 08:56:00,438 INFO [streaming_decode.py:615] Cuts processed until now is 7400.
|
99 |
+
2024-03-07 08:56:01,288 INFO [streaming_decode.py:615] Cuts processed until now is 7500.
|
100 |
+
2024-03-07 08:56:02,163 INFO [streaming_decode.py:615] Cuts processed until now is 7600.
|
101 |
+
2024-03-07 08:56:02,862 INFO [streaming_decode.py:615] Cuts processed until now is 7700.
|
102 |
+
2024-03-07 08:56:03,730 INFO [streaming_decode.py:615] Cuts processed until now is 7800.
|
103 |
+
2024-03-07 08:56:04,615 INFO [streaming_decode.py:615] Cuts processed until now is 7900.
|
104 |
+
2024-03-07 08:56:05,298 INFO [streaming_decode.py:615] Cuts processed until now is 8000.
|
105 |
+
2024-03-07 08:56:06,159 INFO [streaming_decode.py:615] Cuts processed until now is 8100.
|
106 |
+
2024-03-07 08:56:07,036 INFO [streaming_decode.py:615] Cuts processed until now is 8200.
|
107 |
+
2024-03-07 08:56:07,727 INFO [streaming_decode.py:615] Cuts processed until now is 8300.
|
108 |
+
2024-03-07 08:56:08,586 INFO [streaming_decode.py:615] Cuts processed until now is 8400.
|
109 |
+
2024-03-07 08:56:09,472 INFO [streaming_decode.py:615] Cuts processed until now is 8500.
|
110 |
+
2024-03-07 08:56:10,163 INFO [streaming_decode.py:615] Cuts processed until now is 8600.
|
111 |
+
2024-03-07 08:56:11,024 INFO [streaming_decode.py:615] Cuts processed until now is 8700.
|
112 |
+
2024-03-07 08:56:11,913 INFO [streaming_decode.py:615] Cuts processed until now is 8800.
|
113 |
+
2024-03-07 08:56:12,600 INFO [streaming_decode.py:615] Cuts processed until now is 8900.
|
114 |
+
2024-03-07 08:56:13,467 INFO [streaming_decode.py:615] Cuts processed until now is 9000.
|
115 |
+
2024-03-07 08:56:14,370 INFO [streaming_decode.py:615] Cuts processed until now is 9100.
|
116 |
+
2024-03-07 08:56:15,069 INFO [streaming_decode.py:615] Cuts processed until now is 9200.
|
117 |
+
2024-03-07 08:56:15,959 INFO [streaming_decode.py:615] Cuts processed until now is 9300.
|
118 |
+
2024-03-07 08:56:16,866 INFO [streaming_decode.py:615] Cuts processed until now is 9400.
|
119 |
+
2024-03-07 08:56:17,556 INFO [streaming_decode.py:615] Cuts processed until now is 9500.
|
120 |
+
2024-03-07 08:56:18,460 INFO [streaming_decode.py:615] Cuts processed until now is 9600.
|
121 |
+
2024-03-07 08:56:19,157 INFO [streaming_decode.py:615] Cuts processed until now is 9700.
|
122 |
+
2024-03-07 08:56:20,036 INFO [streaming_decode.py:615] Cuts processed until now is 9800.
|
123 |
+
2024-03-07 08:56:20,439 INFO [streaming_decode.py:615] Cuts processed until now is 9900.
|
124 |
+
2024-03-07 08:56:21,347 INFO [streaming_decode.py:615] Cuts processed until now is 10000.
|
125 |
+
2024-03-07 08:56:22,519 INFO [streaming_decode.py:615] Cuts processed until now is 10100.
|
126 |
+
2024-03-07 08:56:23,219 INFO [streaming_decode.py:615] Cuts processed until now is 10200.
|
127 |
+
2024-03-07 08:56:24,092 INFO [streaming_decode.py:615] Cuts processed until now is 10300.
|
128 |
+
2024-03-07 08:56:24,984 INFO [streaming_decode.py:615] Cuts processed until now is 10400.
|
129 |
+
2024-03-07 08:56:25,673 INFO [streaming_decode.py:615] Cuts processed until now is 10500.
|
130 |
+
2024-03-07 08:56:26,535 INFO [streaming_decode.py:615] Cuts processed until now is 10600.
|
131 |
+
2024-03-07 08:56:27,422 INFO [streaming_decode.py:615] Cuts processed until now is 10700.
|
132 |
+
2024-03-07 08:56:28,110 INFO [streaming_decode.py:615] Cuts processed until now is 10800.
|
133 |
+
2024-03-07 08:56:28,983 INFO [streaming_decode.py:615] Cuts processed until now is 10900.
|
134 |
+
2024-03-07 08:56:29,887 INFO [streaming_decode.py:615] Cuts processed until now is 11000.
|
135 |
+
2024-03-07 08:56:30,582 INFO [streaming_decode.py:615] Cuts processed until now is 11100.
|
136 |
+
2024-03-07 08:56:31,468 INFO [streaming_decode.py:615] Cuts processed until now is 11200.
|
137 |
+
2024-03-07 08:56:31,870 INFO [streaming_decode.py:615] Cuts processed until now is 11300.
|
138 |
+
2024-03-07 08:56:33,076 INFO [streaming_decode.py:615] Cuts processed until now is 11400.
|
139 |
+
2024-03-07 08:56:33,986 INFO [streaming_decode.py:615] Cuts processed until now is 11500.
|
140 |
+
2024-03-07 08:56:34,684 INFO [streaming_decode.py:615] Cuts processed until now is 11600.
|
141 |
+
2024-03-07 08:56:35,077 INFO [streaming_decode.py:615] Cuts processed until now is 11700.
|
142 |
+
2024-03-07 08:56:35,970 INFO [streaming_decode.py:615] Cuts processed until now is 11800.
|
143 |
+
2024-03-07 08:56:36,873 INFO [streaming_decode.py:615] Cuts processed until now is 11900.
|
144 |
+
2024-03-07 08:56:37,568 INFO [streaming_decode.py:615] Cuts processed until now is 12000.
|
145 |
+
2024-03-07 08:56:38,745 INFO [streaming_decode.py:615] Cuts processed until now is 12100.
|
146 |
+
2024-03-07 08:56:39,618 INFO [streaming_decode.py:615] Cuts processed until now is 12200.
|
147 |
+
2024-03-07 08:56:42,452 INFO [streaming_decode.py:660] The transcripts are stored in zipformer/exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
148 |
+
2024-03-07 08:56:42,883 INFO [utils.py:656] [test-commonvoice-greedy_search] %WER 7.75% [48447 / 624874, 10530 ins, 21899 del, 16018 sub ]
|
149 |
+
2024-03-07 08:56:43,788 INFO [streaming_decode.py:671] Wrote detailed error stats to zipformer/exp-causal/streaming/greedy_search/errs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
|
150 |
+
2024-03-07 08:56:43,788 INFO [streaming_decode.py:685]
|
151 |
+
For test-commonvoice, WER of different settings are:
|
152 |
+
greedy_search 7.75 best for test-commonvoice
|
153 |
+
|
154 |
+
2024-03-07 08:56:43,788 INFO [streaming_decode.py:853] Done!
|
exp-causal/streaming/greedy_search/recogs-test-commonvoice-epoch-40-avg-7-chunk-32-left-context-128-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|