TTS Distribution Score

non-profit

http://ttsdsbenchmark.com

ttsds

Activity Feed

AI & ML interests

speech synthesis, evaluation

Recent Activity

cdminix new activity about 2 months ago

ttsds/benchmark:The .txt file when submitting dataset

cdminix new activity 2 months ago

ttsds/benchmark:The .txt file when submitting dataset

cdminix updated a dataset 2 months ago

ttsds/results

View all activity

ttsds's activity

cdminix

in ttsds/benchmark about 2 months ago

The .txt file when submitting dataset

#4 opened 2 months ago by

ecyht2

cdminix

updated a dataset 2 months ago

ttsds/results

Viewer • Updated Nov 19, 2024 • 2 • 46 • 1

cdminix

posted an update 2 months ago

Post

987

As part of some ongoing work, I'm releasing the currently biggest collection of docker containers for state-of-the-art voice cloning TTS systems.
https://github.com/ttsds/datasets

Alongside there is also a nice overview of all systems (see below)

cdminix

updated a dataset 2 months ago

ttsds/requests

Updated Nov 19, 2024 • 395

cdminix

updated a dataset 4 months ago

ttsds/noise-reference

Updated Sep 16, 2024 • 12

cdminix

updated a Space 5 months ago

Running

🥇

TTSDS Benchmark and Leaderboard

Text-To-Speech (TTS) Evaluation using objective metrics.

cdminix

updated a dataset 5 months ago

ttsds/reference

Updated Aug 31, 2024 • 33

cdminix

updated a Space 6 months ago

Running

🦀

README

cdminix

posted an update 6 months ago

Post

514

I just added 5 more models to my open source TTS model benchmark, ttsds/benchmark.
Let's talk about the results!

Over the last couple days, I added jbetker/tortoise-tts-v2, metavoiceio/metavoice-1B-v0.1, audo/HierSpeechpp, and the unofficial implementations of amphion/NaturalSpeech2 and amphion/valle by https://huggingface.co/amphion

Takeaways:
- TorToiSe does very well, falling into second place after StyleTTS 2, which is also ranked first in the human evaluation at TTS-AGI/TTS-Arena.
- MetaVoice-1B's overall score is dragged down by its Intelligibility Score (probably due to utterances being cut short), it achieves #3 in Speaker Score, which indicates good voice cloning ability.
- HierSpeech++ lands in the middle of the road in terms of performance, but excels at the Environment Score, achieving #2 - this means the model is especially good at modeling recording conditions such as microphone and background noise.
- The Amphion models, possibly due to not being trained for the same amount as in the papers, achieve relatively low scores. However, they seem to struggle for different reasons. The autoregressive VALLE models have low Intelligibility Scores (possibly due to "babbling" or early stop tokens) while NaturalSpeech2 has low Speaker and Prosody scores.

What's next?
I'm planning to add more open source TTS models like suno/bark, CAMB-AI/MARS5-TTS and fishaudio/fish-speech-1.2. I'll also write an article on these and all the other results soon, since our paper, TTSDS -- Text-to-Speech Distribution Score (2407.12707), mostly focused on establishing the benchmark itself rather than the indiviual TTS systems.

3 replies

cdminix

updated a dataset 6 months ago

ttsds/speaker_text_pairs

Updated Aug 4, 2024 • 15

cdminix

authored 2 papers 6 months ago

TTSDS -- Text-to-Speech Distribution Score

Paper • 2407.12707 • Published Jul 17, 2024

Evaluating and reducing the distance between synthetic and real speech distributions

Paper • 2211.16049 • Published Nov 29, 2022 • 1

cdminix

posted an update 6 months ago

Post

2239

Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.

I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.

I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.

The paper with all the details is available here: https://arxiv.org/abs/2407.12707

AI & ML interests

Recent Activity

Team members 1

ttsds's activity

The .txt file when submitting dataset

TTSDS Benchmark and Leaderboard

README