Text-to-Speech
Transformers
Safetensors
parler_tts
text2text-generation
annotation
ylacombe commited on
Commit
d391b40
Β·
verified Β·
1 Parent(s): 793b78f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -31,17 +31,17 @@ datasets:
31
 
32
  <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
33
 
34
- # Indic Parler-TTS
35
 
36
  <a target="_blank" href="https://huggingface.co/spaces/PHBJT/multi_parler_tts">
37
  <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
38
  </a>
39
 
40
- **Indic Parler-TTS** is a multilingual Indic extension of [Parler-TTS Mini](https://huggingface.co/parler-tts/parler-tts-mini-v1.1).
41
 
42
  It is a fine-tuned version of [Parler-TTS Mini v1.1](https://huggingface.co/parler-tts/parler-tts-mini-v1.1), trained on a **8,385 hours** multilingual Indic and English dataset.
43
 
44
- **Indic Parler-TTS Mini** can officially speak in 20 Indic languages, making it comprehensive for regional language technologies, and in English. The **21 languages** supported are: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.
45
 
46
  Thanks to its **better prompt tokenizer**, it can easily be extended to other languages. This tokenizer has a larger vocabulary and handles byte fallback, which simplifies multilingual training.
47
 
@@ -93,7 +93,7 @@ The model accepts two primary inputs:
93
  - For other accents, the model allows customization by specifying accent details, such as "A male British speaker" or "A female American speaker," using style transfer for more dynamic and personalized outputs.
94
 
95
  5. **Customizable Output**
96
- Indic Parler-TTS offers precise control over various speech characteristics using the **caption** input:
97
 
98
  - **Background Noise**: Adjust the noise level in the audio, from clear to slightly noisy environments.
99
  - **Reverberation**: Control the perceived distance of the voice, from close-sounding to distant-sounding speech.
@@ -107,7 +107,7 @@ The model accepts two primary inputs:
107
 
108
  🚨 Unlike previous versions of Parler-TTS, here we use two tokenizers - one for the prompt and one for the description. 🚨
109
 
110
- **Indic Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
111
 
112
  ```py
113
  import torch
@@ -132,7 +132,7 @@ audio_arr = generation.cpu().numpy().squeeze()
132
  sf.write("indic_tts_out.wav", audio_arr, model.config.sampling_rate)
133
  ```
134
 
135
- Indic Parler-TTS provides highly effective control over key aspects of speech synthesis using descriptive captions. Below is a summary of what each control parameter can achieve:
136
 
137
  | **Control Type** | **Capabilities** |
138
  |--------------------------|----------------------------------------------------------------------------------|
@@ -266,7 +266,7 @@ Here is the table based on the provided data:
266
 
267
  ## πŸ“ Evaluation
268
 
269
- Indic Parler-TTS has been evaluated using a MOS-like framework by native and non-native speakers. The results highlight its exceptional performance in generating natural and intelligible speech, especially for native speakers of Indian languages.
270
 
271
  | **Language** | **Native Speaker Score (%)** | **Highlights** |
272
  |--------------|-------------------------------|--------------------------------------------------------------------------------------------------|
@@ -309,7 +309,7 @@ Parler-TTS was released alongside:
309
  ## Training dataset
310
 
311
  - **Description**:
312
- The model was trained on an internal **Indic-Parler-Dataset**, a large-scale multilingual speech corpus designed to train the **Indic Parler-TTS** model. It provides comprehensive coverage of 24 languages, which includes all the 22 official languages of India along with Chattisgarhi and English, making it an invaluable resource for speech technologies focused on the subcontinent.
313
 
314
  - **Key Statistics**:
315
 
 
31
 
32
  <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
33
 
34
+ # Indic Parler-TTS Pretrained
35
 
36
  <a target="_blank" href="https://huggingface.co/spaces/PHBJT/multi_parler_tts">
37
  <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
38
  </a>
39
 
40
+ **Indic Parler-TTS Pretrained** is a multilingual Indic extension of [Parler-TTS Mini](https://huggingface.co/parler-tts/parler-tts-mini-v1.1).
41
 
42
  It is a fine-tuned version of [Parler-TTS Mini v1.1](https://huggingface.co/parler-tts/parler-tts-mini-v1.1), trained on a **8,385 hours** multilingual Indic and English dataset.
43
 
44
+ **Indic Parler-TTS Pretrained Mini** can officially speak in 20 Indic languages, making it comprehensive for regional language technologies, and in English. The **21 languages** supported are: Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu.
45
 
46
  Thanks to its **better prompt tokenizer**, it can easily be extended to other languages. This tokenizer has a larger vocabulary and handles byte fallback, which simplifies multilingual training.
47
 
 
93
  - For other accents, the model allows customization by specifying accent details, such as "A male British speaker" or "A female American speaker," using style transfer for more dynamic and personalized outputs.
94
 
95
  5. **Customizable Output**
96
+ Indic Parler-TTS Pretrained offers precise control over various speech characteristics using the **caption** input:
97
 
98
  - **Background Noise**: Adjust the noise level in the audio, from clear to slightly noisy environments.
99
  - **Reverberation**: Control the perceived distance of the voice, from close-sounding to distant-sounding speech.
 
107
 
108
  🚨 Unlike previous versions of Parler-TTS, here we use two tokenizers - one for the prompt and one for the description. 🚨
109
 
110
+ **Indic Parler-TTS Pretrained** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
111
 
112
  ```py
113
  import torch
 
132
  sf.write("indic_tts_out.wav", audio_arr, model.config.sampling_rate)
133
  ```
134
 
135
+ Indic Parler-TTS Pretrained provides highly effective control over key aspects of speech synthesis using descriptive captions. Below is a summary of what each control parameter can achieve:
136
 
137
  | **Control Type** | **Capabilities** |
138
  |--------------------------|----------------------------------------------------------------------------------|
 
266
 
267
  ## πŸ“ Evaluation
268
 
269
+ Indic Parler-TTS Pretrained has been evaluated using a MOS-like framework by native and non-native speakers. The results highlight its exceptional performance in generating natural and intelligible speech, especially for native speakers of Indian languages.
270
 
271
  | **Language** | **Native Speaker Score (%)** | **Highlights** |
272
  |--------------|-------------------------------|--------------------------------------------------------------------------------------------------|
 
309
  ## Training dataset
310
 
311
  - **Description**:
312
+ The model was trained on an internal **Indic-Parler-Dataset**, a large-scale multilingual speech corpus designed to train the **Indic Parler-TTS Pretrained** model. It provides comprehensive coverage of 24 languages, which includes all the 22 official languages of India along with Chattisgarhi and English, making it an invaluable resource for speech technologies focused on the subcontinent.
313
 
314
  - **Key Statistics**:
315