Overview
This indonesian finetune of F5-TTS is made to introduce indonesian speech capabilities on the model.
Dataset
Length: 43.35 hours
Audio samples: 43999
Dataset sources:
• data-indsp-news-lvcsr
Results
The model has some difficulties in accurately matching the zero shot voice and emotions. The model also hallucinates on long texts.
Reference text: "Tidak ada yang menakutiku, bahkan kematian sekalipun."
Reference audio: Zilong.ogg
Input text: "Halo. Model faintun ini adalah sebuah percobaan. Masih terdapat beberapa kekurangan jadi tolong dimaklumkan."
Generated audio: Zilong_generated.ogg
License
The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.
Model tree for Eempostor/F5-TTS-IND-FINETUNE
Base model
SWivid/F5-TTS