Emrys365 commited on
Commit
3872dcc
·
1 Parent(s): 102e803

Update model

Browse files
Files changed (45) hide show
  1. README.md +377 -3
  2. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/64epoch.pth +3 -0
  3. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/config.yaml +236 -0
  4. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/enhanced_test_16k/RESULTS.md +22 -0
  5. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/enhanced_test_48k/RESULTS.md +18 -0
  6. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/clip.png +0 -0
  8. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/loss.png +0 -0
  27. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,377 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_whamr_conv_tasnet_large`
15
+
16
+ This model was trained by wyz using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag="wyz/vctk_dns2020_whamr_conv_tasnet_large",
32
+ normalize_output_wav=True,
33
+ device="cuda",
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config="exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/config.yaml",
38
+ # model_file="exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/xxxx.pth",
39
+ # normalize_output_wav=True,
40
+ # device="cuda",
41
+ # )
42
+
43
+ audio, fs = sf.read("/path/to/noisy/utt1.flac")
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
48
+ # RESULTS
49
+ ## Environments
50
+ - date: `Tue Mar 12 17:35:06 EDT 2024`
51
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
52
+ - espnet version: `espnet 202304`
53
+ - pytorch version: `pytorch 2.0.1+cu118`
54
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
55
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
56
+
57
+
58
+ ## enhanced_test_16k
59
+
60
+
61
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
62
+ |---|---|---|---|---|---|---|---|---|---|---|
63
+ |chime4_et05_real_isolated_6ch_track|1.26|54.36|-2.47|-2.47|0.00|-30.98|2.91|3.34|3.61|3.32|
64
+ |chime4_et05_simu_isolated_6ch_track|1.41|83.38|8.00|8.00|0.00|0.97|2.88|3.23|3.78|3.09|
65
+ |dns20_tt_synthetic_no_reverb|2.74|96.49|17.49|17.49|0.00|17.43|3.26|3.53|4.04|3.82|
66
+ |reverb_et_simu_8ch_multich|1.84|89.99|9.29|9.29|0.00|-9.40|2.99|3.37|3.77|3.64|
67
+ |whamr_tt_mix_single_reverb_max_16k|1.93|91.74|9.63|9.63|0.00|7.28|3.17|3.44|4.03|3.50|
68
+
69
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
70
+ # RESULTS
71
+ ## Environments
72
+ - date: `Tue Mar 12 17:01:58 EDT 2024`
73
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
74
+ - espnet version: `espnet 202304`
75
+ - pytorch version: `pytorch 2.0.1+cu118`
76
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
77
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
78
+
79
+
80
+ ## enhanced_test_48k
81
+
82
+
83
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
84
+ |---|---|---|---|---|---|---|---|---|---|
85
+ |vctk_noisy_tt_2spk|94.63|19.47|19.47|0.00|18.60|3.09|3.45|3.85|3.41|
86
+
87
+ ## ENH config
88
+
89
+ <details><summary>expand</summary>
90
+
91
+ ```
92
+ config: conf/tuning/train_enh_conv_tasnet_large.yaml
93
+ print_config: false
94
+ log_level: INFO
95
+ dry_run: false
96
+ iterator_type: chunk
97
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw
98
+ ngpu: 1
99
+ seed: 0
100
+ num_workers: 4
101
+ num_att_plot: 3
102
+ dist_backend: nccl
103
+ dist_init_method: env://
104
+ dist_world_size: 2
105
+ dist_rank: 0
106
+ local_rank: 0
107
+ dist_master_addr: localhost
108
+ dist_master_port: 47395
109
+ dist_launcher: null
110
+ multiprocessing_distributed: true
111
+ unused_parameters: true
112
+ sharded_ddp: false
113
+ cudnn_enabled: true
114
+ cudnn_benchmark: false
115
+ cudnn_deterministic: true
116
+ collect_stats: false
117
+ write_collected_feats: false
118
+ max_epoch: 100
119
+ patience: 10
120
+ val_scheduler_criterion:
121
+ - valid
122
+ - loss
123
+ early_stopping_criterion:
124
+ - valid
125
+ - loss
126
+ - min
127
+ best_model_criterion:
128
+ - - valid
129
+ - loss
130
+ - min
131
+ keep_nbest_models: 1
132
+ nbest_averaging_interval: 0
133
+ grad_clip: 5.0
134
+ grad_clip_type: 2.0
135
+ grad_noise: false
136
+ accum_grad: 1
137
+ no_forward_run: false
138
+ resume: true
139
+ save_interval: 1000
140
+ train_dtype: float32
141
+ use_amp: false
142
+ log_interval: null
143
+ use_matplotlib: true
144
+ use_tensorboard: true
145
+ create_graph_in_tensorboard: false
146
+ use_wandb: false
147
+ wandb_project: null
148
+ wandb_id: null
149
+ wandb_entity: null
150
+ wandb_name: null
151
+ wandb_model_log_interval: -1
152
+ detect_anomaly: false
153
+ pretrain_path: null
154
+ init_param: []
155
+ ignore_init_mismatch: false
156
+ freeze_param: []
157
+ num_iters_per_epoch: 8000
158
+ num_iters_valid: null
159
+ batch_size: 4
160
+ valid_batch_size: null
161
+ batch_bins: 1000000
162
+ valid_batch_bins: null
163
+ train_shape_file:
164
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
165
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
166
+ valid_shape_file:
167
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
168
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
169
+ batch_type: folded
170
+ valid_batch_type: null
171
+ fold_length:
172
+ - 80000
173
+ - 80000
174
+ sort_in_batch: descending
175
+ sort_batch: descending
176
+ multiple_iterator: false
177
+ chunk_length: 32000
178
+ chunk_shift_ratio: 0.5
179
+ num_cache_chunks: 1024
180
+ chunk_excluded_key_prefixes: []
181
+ chunk_discard_short_samples: false
182
+ train_data_path_and_name_and_type:
183
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
184
+ - speech_mix
185
+ - sound
186
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
187
+ - speech_ref1
188
+ - sound
189
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
190
+ - category
191
+ - text
192
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
193
+ - fs
194
+ - text_int
195
+ valid_data_path_and_name_and_type:
196
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
197
+ - speech_mix
198
+ - sound
199
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
200
+ - speech_ref1
201
+ - sound
202
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
203
+ - category
204
+ - text
205
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
206
+ - fs
207
+ - text_int
208
+ allow_variable_data_keys: false
209
+ max_cache_size: 0.0
210
+ max_cache_fd: 32
211
+ allow_multi_rates: true
212
+ valid_max_cache_size: null
213
+ exclude_weight_decay: false
214
+ exclude_weight_decay_conf: {}
215
+ optim: adam
216
+ optim_conf:
217
+ lr: 0.001
218
+ eps: 1.0e-08
219
+ weight_decay: 1.0e-05
220
+ scheduler: steplr
221
+ scheduler_conf:
222
+ step_size: 2
223
+ gamma: 0.99
224
+ init: null
225
+ model_conf:
226
+ normalize_variance_per_ch: true
227
+ always_forward_in_48k: true
228
+ categories:
229
+ - 1ch_8k
230
+ - 1ch_8k_r
231
+ - 1ch_16k_r
232
+ - 1ch_48k
233
+ - 1ch_24k
234
+ - 1ch_16k
235
+ - 2ch_8k
236
+ - 2ch_8k_r
237
+ - 2ch_16k
238
+ - 2ch_16k_r
239
+ - 5ch_8k
240
+ - 5ch_16k
241
+ - 8ch_8k_r
242
+ - 8ch_16k_r
243
+ criterions:
244
+ - name: mr_l1_tfd
245
+ conf:
246
+ window_sz:
247
+ - 256
248
+ - 512
249
+ - 768
250
+ - 1024
251
+ hop_sz: null
252
+ eps: 1.0e-08
253
+ time_domain_weight: 0.5
254
+ normalize_variance: true
255
+ use_builtin_complex: true
256
+ wrapper: fixed_order
257
+ wrapper_conf:
258
+ weight: 1.0
259
+ - name: si_snr
260
+ conf:
261
+ eps: 1.0e-07
262
+ wrapper: fixed_order
263
+ wrapper_conf:
264
+ weight: 0.0
265
+ speech_volume_normalize: null
266
+ rir_scp: null
267
+ rir_apply_prob: 1.0
268
+ noise_scp: null
269
+ noise_apply_prob: 1.0
270
+ noise_db_range: '13_15'
271
+ short_noise_thres: 0.5
272
+ use_reverberant_ref: false
273
+ num_spk: 1
274
+ num_noise_type: 1
275
+ sample_rate: 8000
276
+ force_single_channel: true
277
+ channel_reordering: true
278
+ categories:
279
+ - 1ch_8k
280
+ - 1ch_8k_r
281
+ - 1ch_16k_r
282
+ - 1ch_48k
283
+ - 1ch_24k
284
+ - 1ch_16k
285
+ - 2ch_8k
286
+ - 2ch_8k_r
287
+ - 2ch_16k
288
+ - 2ch_16k_r
289
+ - 5ch_8k
290
+ - 5ch_16k
291
+ - 8ch_8k_r
292
+ - 8ch_16k_r
293
+ speech_segment: null
294
+ avoid_allzero_segment: true
295
+ flexible_numspk: false
296
+ dynamic_mixing: false
297
+ utt2spk: null
298
+ dynamic_mixing_gain_db: 0.0
299
+ encoder: conv
300
+ encoder_conf:
301
+ channel: 1536
302
+ kernel_size: 120
303
+ stride: 60
304
+ separator: tcn
305
+ separator_conf:
306
+ num_spk: 1
307
+ layer: 8
308
+ stack: 6
309
+ bottleneck_dim: 512
310
+ hidden_dim: 1024
311
+ kernel: 3
312
+ causal: false
313
+ norm_type: gLN
314
+ nonlinear: relu
315
+ decoder: conv
316
+ decoder_conf:
317
+ channel: 1536
318
+ kernel_size: 120
319
+ stride: 60
320
+ mask_module: multi_mask
321
+ mask_module_conf: {}
322
+ preprocessor: enh
323
+ preprocessor_conf: {}
324
+ required:
325
+ - output_dir
326
+ version: '202304'
327
+ distributed: true
328
+ ```
329
+
330
+ </details>
331
+
332
+
333
+
334
+ ### Citing ESPnet
335
+
336
+ ```BibTex
337
+ @inproceedings{watanabe2018espnet,
338
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
339
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
340
+ year={2018},
341
+ booktitle={Proceedings of Interspeech},
342
+ pages={2207--2211},
343
+ doi={10.21437/Interspeech.2018-1456},
344
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
345
+ }
346
+
347
+
348
+ @inproceedings{ESPnet-SE,
349
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
350
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
351
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
352
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
353
+ pages = {785--792},
354
+ publisher = {{IEEE}},
355
+ year = {2021},
356
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
357
+ doi = {10.1109/SLT48900.2021.9383615},
358
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
359
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
360
+ bibsource = {dblp computer science bibliography, https://dblp.org}
361
+ }
362
+
363
+
364
+ ```
365
+
366
+ or arXiv:
367
+
368
+ ```bibtex
369
+ @misc{watanabe2018espnet,
370
+ title={ESPnet: End-to-End Speech Processing Toolkit},
371
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
372
+ year={2018},
373
+ eprint={1804.00015},
374
+ archivePrefix={arXiv},
375
+ primaryClass={cs.CL}
376
+ }
377
+ ```
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/64epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19e244ff15cb24c091868eafa10bba131e58ecf04832bd94cd9033618b747d3a
3
+ size 210659925
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/config.yaml ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_conv_tasnet_large.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 47395
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 10
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
75
+ valid_shape_file:
76
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
77
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 80000
83
+ sort_in_batch: descending
84
+ sort_batch: descending
85
+ multiple_iterator: false
86
+ chunk_length: 32000
87
+ chunk_shift_ratio: 0.5
88
+ num_cache_chunks: 1024
89
+ chunk_excluded_key_prefixes: []
90
+ chunk_discard_short_samples: false
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
93
+ - speech_mix
94
+ - sound
95
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
96
+ - speech_ref1
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
99
+ - category
100
+ - text
101
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
102
+ - fs
103
+ - text_int
104
+ valid_data_path_and_name_and_type:
105
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
106
+ - speech_mix
107
+ - sound
108
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
109
+ - speech_ref1
110
+ - sound
111
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
112
+ - category
113
+ - text
114
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
115
+ - fs
116
+ - text_int
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: true
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.001
127
+ eps: 1.0e-08
128
+ weight_decay: 1.0e-05
129
+ scheduler: steplr
130
+ scheduler_conf:
131
+ step_size: 2
132
+ gamma: 0.99
133
+ init: null
134
+ model_conf:
135
+ normalize_variance_per_ch: true
136
+ always_forward_in_48k: true
137
+ categories:
138
+ - 1ch_8k
139
+ - 1ch_8k_r
140
+ - 1ch_16k_r
141
+ - 1ch_48k
142
+ - 1ch_24k
143
+ - 1ch_16k
144
+ - 2ch_8k
145
+ - 2ch_8k_r
146
+ - 2ch_16k
147
+ - 2ch_16k_r
148
+ - 5ch_8k
149
+ - 5ch_16k
150
+ - 8ch_8k_r
151
+ - 8ch_16k_r
152
+ criterions:
153
+ - name: mr_l1_tfd
154
+ conf:
155
+ window_sz:
156
+ - 256
157
+ - 512
158
+ - 768
159
+ - 1024
160
+ hop_sz: null
161
+ eps: 1.0e-08
162
+ time_domain_weight: 0.5
163
+ normalize_variance: true
164
+ use_builtin_complex: true
165
+ wrapper: fixed_order
166
+ wrapper_conf:
167
+ weight: 1.0
168
+ - name: si_snr
169
+ conf:
170
+ eps: 1.0e-07
171
+ wrapper: fixed_order
172
+ wrapper_conf:
173
+ weight: 0.0
174
+ speech_volume_normalize: null
175
+ rir_scp: null
176
+ rir_apply_prob: 1.0
177
+ noise_scp: null
178
+ noise_apply_prob: 1.0
179
+ noise_db_range: '13_15'
180
+ short_noise_thres: 0.5
181
+ use_reverberant_ref: false
182
+ num_spk: 1
183
+ num_noise_type: 1
184
+ sample_rate: 8000
185
+ force_single_channel: true
186
+ channel_reordering: true
187
+ categories:
188
+ - 1ch_8k
189
+ - 1ch_8k_r
190
+ - 1ch_16k_r
191
+ - 1ch_48k
192
+ - 1ch_24k
193
+ - 1ch_16k
194
+ - 2ch_8k
195
+ - 2ch_8k_r
196
+ - 2ch_16k
197
+ - 2ch_16k_r
198
+ - 5ch_8k
199
+ - 5ch_16k
200
+ - 8ch_8k_r
201
+ - 8ch_16k_r
202
+ speech_segment: null
203
+ avoid_allzero_segment: true
204
+ flexible_numspk: false
205
+ dynamic_mixing: false
206
+ utt2spk: null
207
+ dynamic_mixing_gain_db: 0.0
208
+ encoder: conv
209
+ encoder_conf:
210
+ channel: 1536
211
+ kernel_size: 120
212
+ stride: 60
213
+ separator: tcn
214
+ separator_conf:
215
+ num_spk: 1
216
+ layer: 8
217
+ stack: 6
218
+ bottleneck_dim: 512
219
+ hidden_dim: 1024
220
+ kernel: 3
221
+ causal: false
222
+ norm_type: gLN
223
+ nonlinear: relu
224
+ decoder: conv
225
+ decoder_conf:
226
+ channel: 1536
227
+ kernel_size: 120
228
+ stride: 60
229
+ mask_module: multi_mask
230
+ mask_module_conf: {}
231
+ preprocessor: enh
232
+ preprocessor_conf: {}
233
+ required:
234
+ - output_dir
235
+ version: '202304'
236
+ distributed: true
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Mar 12 17:35:06 EDT 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_16k
13
+
14
+
15
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|---|
17
+ |chime4_et05_real_isolated_6ch_track|1.26|54.36|-2.47|-2.47|0.00|-30.98|2.91|3.34|3.61|3.32|
18
+ |chime4_et05_simu_isolated_6ch_track|1.41|83.38|8.00|8.00|0.00|0.97|2.88|3.23|3.78|3.09|
19
+ |dns20_tt_synthetic_no_reverb|2.74|96.49|17.49|17.49|0.00|17.43|3.26|3.53|4.04|3.82|
20
+ |reverb_et_simu_8ch_multich|1.84|89.99|9.29|9.29|0.00|-9.40|2.99|3.37|3.77|3.64|
21
+ |whamr_tt_mix_single_reverb_max_16k|1.93|91.74|9.63|9.63|0.00|7.28|3.17|3.44|4.03|3.50|
22
+
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Mar 12 17:01:58 EDT 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_48k
13
+
14
+
15
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|
17
+ |vctk_noisy_tt_2spk|94.63|19.47|19.47|0.00|18.60|3.09|3.45|3.85|3.41|
18
+
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/backward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/clip.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/forward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/grad_norm.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/iter_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/loss.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/loss_scale.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/optim_step_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/64epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1723016941.726699
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20_whamr/enh_train_enh_conv_tasnet_large_raw/config.yaml