|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- id |
|
base_model: |
|
- SWivid/F5-TTS |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
## Overview |
|
This indonesian finetune of [F5-TTS](https://github.com/SWivid/F5-TTS) is made to introduce indonesian speech capabilities on the model. |
|
|
|
## Dataset |
|
Length: 43.35 hours \ |
|
Audio samples: 43999 |
|
|
|
Dataset sources: \ |
|
• [data-indsp-news-lvcsr](https://github.com/s-sakti/data_indsp_news_lvcsr) |
|
|
|
## Results |
|
The model has some difficulties in accurately matching the zero shot voice and emotions. The model also hallucinates on long texts. |
|
|
|
Reference text: "Tidak ada yang menakutiku, bahkan kematian sekalipun." \ |
|
Reference audio: [Zilong.ogg](https://huggingface.co/Eempostor/F5-TTS-IND-FINETUNE/resolve/main/Zilong.ogg?download=true) \ |
|
Input text: "Halo. Model faintun ini adalah sebuah percobaan. Masih terdapat beberapa kekurangan jadi tolong dimaklumkan." \ |
|
Generated audio: [Zilong_generated.ogg](https://huggingface.co/Eempostor/F5-TTS-IND-FINETUNE/resolve/main/Zilong_generated.wav?download=true) |
|
|
|
## License |
|
The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause. |
|
|
|
--- |