Update README.md
Browse files
README.md
CHANGED
@@ -31,17 +31,17 @@ datasets:
|
|
31 |
|
32 |
<img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
33 |
|
34 |
-
# Indic Parler-TTS
|
35 |
|
36 |
<a target="_blank" href="https://huggingface.co/spaces/PHBJT/multi_parler_tts">
|
37 |
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
|
38 |
</a>
|
39 |
|
40 |
-
**Indic Parler-TTS** is a multilingual Indic extension of [Parler-TTS Mini](https://huggingface.co/parler-tts/parler-tts-mini-v1.1).
|
41 |
|
42 |
It is a fine-tuned version of [Parler-TTS Mini v1.1](https://huggingface.co/parler-tts/parler-tts-mini-v1.1), trained on a **8,385 hours** multilingual Indic and English dataset.
|
43 |
|
44 |
-
**Indic Parler-TTS Mini** can officially speak in 20 Indic languages, making it comprehensive for regional language technologies, and in English. The **21 languages** supported are: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.
|
45 |
|
46 |
Thanks to its **better prompt tokenizer**, it can easily be extended to other languages. This tokenizer has a larger vocabulary and handles byte fallback, which simplifies multilingual training.
|
47 |
|
@@ -93,7 +93,7 @@ The model accepts two primary inputs:
|
|
93 |
- For other accents, the model allows customization by specifying accent details, such as "A male British speaker" or "A female American speaker," using style transfer for more dynamic and personalized outputs.
|
94 |
|
95 |
5. **Customizable Output**
|
96 |
-
Indic Parler-TTS offers precise control over various speech characteristics using the **caption** input:
|
97 |
|
98 |
- **Background Noise**: Adjust the noise level in the audio, from clear to slightly noisy environments.
|
99 |
- **Reverberation**: Control the perceived distance of the voice, from close-sounding to distant-sounding speech.
|
@@ -107,7 +107,7 @@ The model accepts two primary inputs:
|
|
107 |
|
108 |
π¨ Unlike previous versions of Parler-TTS, here we use two tokenizers - one for the prompt and one for the description. π¨
|
109 |
|
110 |
-
**Indic Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
|
111 |
|
112 |
```py
|
113 |
import torch
|
@@ -132,7 +132,7 @@ audio_arr = generation.cpu().numpy().squeeze()
|
|
132 |
sf.write("indic_tts_out.wav", audio_arr, model.config.sampling_rate)
|
133 |
```
|
134 |
|
135 |
-
Indic Parler-TTS provides highly effective control over key aspects of speech synthesis using descriptive captions. Below is a summary of what each control parameter can achieve:
|
136 |
|
137 |
| **Control Type** | **Capabilities** |
|
138 |
|--------------------------|----------------------------------------------------------------------------------|
|
@@ -266,7 +266,7 @@ Here is the table based on the provided data:
|
|
266 |
|
267 |
## π Evaluation
|
268 |
|
269 |
-
Indic Parler-TTS has been evaluated using a MOS-like framework by native and non-native speakers. The results highlight its exceptional performance in generating natural and intelligible speech, especially for native speakers of Indian languages.
|
270 |
|
271 |
| **Language** | **Native Speaker Score (%)** | **Highlights** |
|
272 |
|--------------|-------------------------------|--------------------------------------------------------------------------------------------------|
|
@@ -309,7 +309,7 @@ Parler-TTS was released alongside:
|
|
309 |
## Training dataset
|
310 |
|
311 |
- **Description**:
|
312 |
-
The model was trained on an internal **Indic-Parler-Dataset**, a large-scale multilingual speech corpus designed to train the **Indic Parler-TTS** model. It provides comprehensive coverage of 24 languages, which includes all the 22 official languages of India along with Chattisgarhi and English, making it an invaluable resource for speech technologies focused on the subcontinent.
|
313 |
|
314 |
- **Key Statistics**:
|
315 |
|
|
|
31 |
|
32 |
<img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
33 |
|
34 |
+
# Indic Parler-TTS Pretrained
|
35 |
|
36 |
<a target="_blank" href="https://huggingface.co/spaces/PHBJT/multi_parler_tts">
|
37 |
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
|
38 |
</a>
|
39 |
|
40 |
+
**Indic Parler-TTS Pretrained** is a multilingual Indic extension of [Parler-TTS Mini](https://huggingface.co/parler-tts/parler-tts-mini-v1.1).
|
41 |
|
42 |
It is a fine-tuned version of [Parler-TTS Mini v1.1](https://huggingface.co/parler-tts/parler-tts-mini-v1.1), trained on a **8,385 hours** multilingual Indic and English dataset.
|
43 |
|
44 |
+
**Indic Parler-TTS Pretrained Mini** can officially speak in 20 Indic languages, making it comprehensive for regional language technologies, and in English. The **21 languages** supported are: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.
|
45 |
|
46 |
Thanks to its **better prompt tokenizer**, it can easily be extended to other languages. This tokenizer has a larger vocabulary and handles byte fallback, which simplifies multilingual training.
|
47 |
|
|
|
93 |
- For other accents, the model allows customization by specifying accent details, such as "A male British speaker" or "A female American speaker," using style transfer for more dynamic and personalized outputs.
|
94 |
|
95 |
5. **Customizable Output**
|
96 |
+
Indic Parler-TTS Pretrained offers precise control over various speech characteristics using the **caption** input:
|
97 |
|
98 |
- **Background Noise**: Adjust the noise level in the audio, from clear to slightly noisy environments.
|
99 |
- **Reverberation**: Control the perceived distance of the voice, from close-sounding to distant-sounding speech.
|
|
|
107 |
|
108 |
π¨ Unlike previous versions of Parler-TTS, here we use two tokenizers - one for the prompt and one for the description. π¨
|
109 |
|
110 |
+
**Indic Parler-TTS Pretrained** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
|
111 |
|
112 |
```py
|
113 |
import torch
|
|
|
132 |
sf.write("indic_tts_out.wav", audio_arr, model.config.sampling_rate)
|
133 |
```
|
134 |
|
135 |
+
Indic Parler-TTS Pretrained provides highly effective control over key aspects of speech synthesis using descriptive captions. Below is a summary of what each control parameter can achieve:
|
136 |
|
137 |
| **Control Type** | **Capabilities** |
|
138 |
|--------------------------|----------------------------------------------------------------------------------|
|
|
|
266 |
|
267 |
## π Evaluation
|
268 |
|
269 |
+
Indic Parler-TTS Pretrained has been evaluated using a MOS-like framework by native and non-native speakers. The results highlight its exceptional performance in generating natural and intelligible speech, especially for native speakers of Indian languages.
|
270 |
|
271 |
| **Language** | **Native Speaker Score (%)** | **Highlights** |
|
272 |
|--------------|-------------------------------|--------------------------------------------------------------------------------------------------|
|
|
|
309 |
## Training dataset
|
310 |
|
311 |
- **Description**:
|
312 |
+
The model was trained on an internal **Indic-Parler-Dataset**, a large-scale multilingual speech corpus designed to train the **Indic Parler-TTS Pretrained** model. It provides comprehensive coverage of 24 languages, which includes all the 22 official languages of India along with Chattisgarhi and English, making it an invaluable resource for speech technologies focused on the subcontinent.
|
313 |
|
314 |
- **Key Statistics**:
|
315 |
|