README.md · Eempostor/F5-TTS-IND-FINETUNE at main

metadata

license: cc-by-nc-4.0
language:
  - id
base_model:
  - SWivid/F5-TTS
pipeline_tag: text-to-speech

Overview

This indonesian finetune of F5-TTS is made to introduce indonesian speech capabilities on the model.

Dataset

Length: 43.35 hours
Audio samples: 43999

Dataset sources:
• data-indsp-news-lvcsr

Results

The model has some difficulties in accurately matching the zero shot voice and emotions. The model also hallucinates on long texts.

Reference text: "Tidak ada yang menakutiku, bahkan kematian sekalipun."
Reference audio: Zilong.ogg
Input text: "Halo. Model faintun ini adalah sebuah percobaan. Masih terdapat beberapa kekurangan jadi tolong dimaklumkan."
Generated audio: Zilong_generated.ogg

License

The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.