Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- id
|
5 |
+
base_model:
|
6 |
+
- SWivid/F5-TTS
|
7 |
+
pipeline_tag: text-to-speech
|
8 |
+
---
|
9 |
+
|
10 |
+
## Overview
|
11 |
+
This indonesian finetune of [F5-TTS](https://github.com/SWivid/F5-TTS) is made to introduce indonesian speech capabilities on the model.
|
12 |
+
|
13 |
+
## Dataset
|
14 |
+
Length: 43.35 hours \
|
15 |
+
Audio samples: 43999
|
16 |
+
|
17 |
+
Dataset sources: \
|
18 |
+
• [data-indsp-news-lvcsr](https://github.com/s-sakti/data_indsp_news_lvcsr)
|
19 |
+
|
20 |
+
## Results
|
21 |
+
The model has some difficulties in accurately matching the zero shot voice and emotions. The model also hallucinates on long texts.
|
22 |
+
|
23 |
+
Reference text: "Tidak ada yang menakutiku, bahkan kematian sekalipun." \
|
24 |
+
Reference audio: [Zilong.ogg](https://huggingface.co/Eempostor/F5-TTS-IND-FINETUNE/resolve/main/Zilong.ogg?download=true) \
|
25 |
+
Input text: "Halo. Model faintun ini adalah sebuah percobaan. Masih terdapat beberapa kekurangan jadi tolong dimaklumkan." \
|
26 |
+
Generated audio: [Zilong_generated.ogg](https://huggingface.co/Eempostor/F5-TTS-IND-FINETUNE/resolve/main/Zilong_generated.wav?download=true)
|
27 |
+
|
28 |
+
## License
|
29 |
+
The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.
|
30 |
+
|
31 |
+
---
|