Spaces:
Running
on
CPU Upgrade
New Open Source TTS Model (OuteTTS)
the model may frequently alter, insert, or omit wrong words, leading to variations in output quality.
IMHO, this makes it an invalid TTS.
the model may frequently alter, insert, or omit wrong words, leading to variations in output quality.
Yeah, I agree but I just want to document open-source TTS models.
I found a working space ameerazam08/OuteTTS-0.2-500M-Demo, it has to have ZeroGPU otherwise it will be very slow.
I also notice that you have to provide a voice sample that is an English speaker. Otherwise it will sound weird especially when pronouncing numbers. For example when you have a Chinese voice (the provided voices) the model will say the word 10
or ten
as the same word but in Chinese.
the model may frequently alter, insert, or omit wrong words, leading to variations in output quality.
IMHO, this makes it an invalid TTS.
Still. I went ahed and added the v0.2 500M and v0.3 1B TTS spaces to the forked Arena Space.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
It is probably not fairing very well against others due to poor audio quality of the default voice and early cutoff of the audio sample. Only then do people care about the delivery. This was similar with WhisperSpeech, whose line delivery was great overall.