yangwang825 commited on
Commit
76fcc9f
·
1 Parent(s): e585064

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -31,7 +31,7 @@ This repository provides a pretrained E-TDNN model (x-vector) using SpeechBrain.
31
 
32
  This system is composed of an E-TDNN model (x-vector). It is a combination of convolutional and residual blocks. The embeddings are extracted using temporal statistical pooling. The system is trained with Additive Margin Softmax Loss.
33
 
34
- We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels) as the input features. It was trained using initial learning rate of 0.001 and batch size of 512 with linear scheduler for 30 epochs on 4 A100 GPUs. We employ additive noises and reverberation from [MUSAN](http://www.openslr.org/17/) and [RIR](http://www.openslr.org/28/) datasets to enrich the supervised information. The pre-training progress takes approximately seven days for the E-TDNN model.
35
 
36
  # Performance
37
 
@@ -39,7 +39,7 @@ We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels
39
 
40
  | Splits | Backend | S-norm | EER(%) | minDCF(0.01) |
41
  |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|
42
- | VoxCeleb1-O | cosine | no | 2.27 | 0.21 |
43
  | VoxCeleb1-E | cosine | no | TBD | TBD |
44
  | VoxCeleb1-H | cosine | no | TBD | TBD |
45
 
 
31
 
32
  This system is composed of an E-TDNN model (x-vector). It is a combination of convolutional and residual blocks. The embeddings are extracted using temporal statistical pooling. The system is trained with Additive Margin Softmax Loss.
33
 
34
+ We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels) as the input features. It was trained using initial learning rate of 0.001 and batch size of 512 with linear scheduler for 40 epochs on 4 A100 GPUs. We employ additive noises and reverberation from [MUSAN](http://www.openslr.org/17/) and [RIR](http://www.openslr.org/28/) datasets to enrich the supervised information. The pre-training progress takes approximately seven days for the E-TDNN model.
35
 
36
  # Performance
37
 
 
39
 
40
  | Splits | Backend | S-norm | EER(%) | minDCF(0.01) |
41
  |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|
42
+ | VoxCeleb1-O | cosine | no | 1.91 | 0.20 |
43
  | VoxCeleb1-E | cosine | no | TBD | TBD |
44
  | VoxCeleb1-H | cosine | no | TBD | TBD |
45