Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,13 @@ license: apache-2.0
|
|
3 |
language:
|
4 |
- en
|
5 |
- ca
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
# Wavenext-encodec
|
@@ -84,7 +91,7 @@ The model was trained on 4 speech datasets
|
|
84 |
### Training Procedure
|
85 |
|
86 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
The model was trained for 1M steps and
|
88 |
|
89 |
|
90 |
#### Training Hyperparameters
|
@@ -101,15 +108,15 @@ The model was trained for 1M steps and 183 epochs with a batch size of 16 for st
|
|
101 |
|
102 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
103 |
|
104 |
-
Evaluation was done using the metrics on the original repo, after
|
105 |
|
106 |
-
* val_loss:
|
107 |
-
* f1_score: 0.
|
108 |
-
* mel_loss: 0.
|
109 |
-
* periodicity_loss:0.
|
110 |
-
* pesq_score:
|
111 |
-
* pitch_loss:
|
112 |
-
* utmos_score:
|
113 |
|
114 |
|
115 |
## Citation
|
|
|
3 |
language:
|
4 |
- en
|
5 |
- ca
|
6 |
+
datasets:
|
7 |
+
- mythicinfinity/libritts_r
|
8 |
+
- projecte-aina/festcat_trimmed_denoised
|
9 |
+
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
10 |
+
- keithito/lj_speech
|
11 |
+
base_model:
|
12 |
+
- facebook/encodec_24khz
|
13 |
---
|
14 |
|
15 |
# Wavenext-encodec
|
|
|
91 |
### Training Procedure
|
92 |
|
93 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
94 |
+
The model was trained for 1M steps and 99 epochs with a batch size of 16 for stability. We used a Cosine scheduler with a initial learning rate of 1e-4.
|
95 |
|
96 |
|
97 |
#### Training Hyperparameters
|
|
|
108 |
|
109 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
110 |
|
111 |
+
Evaluation was done using the metrics on the original vocos repo, Note that this metrics are calculated using the codecs corresponding to a bandwidth of 1.5 kbps, after 99 epochs we achieve:
|
112 |
|
113 |
+
* val_loss: 5.52
|
114 |
+
* f1_score: 0.93
|
115 |
+
* mel_loss: 0.53
|
116 |
+
* periodicity_loss:0.14
|
117 |
+
* pesq_score: 2.12
|
118 |
+
* pitch_loss: 47.73
|
119 |
+
* utmos_score: 2.89
|
120 |
|
121 |
|
122 |
## Citation
|