|
--- |
|
license: other |
|
license_name: coqui-public-model-license |
|
license_link: https://coqui.ai/cpml |
|
library_name: coqui |
|
pipeline_tag: text-to-speech |
|
widget: |
|
- text: "Once when I was six years old I saw a magnificent picture" |
|
--- |
|
|
|
# XTTS v2 Fine-Tuned on Hindi Datasets |
|
|
|
**Model Name**: XTTS v2 Fine-Tuned on Hindi Datasets |
|
|
|
**Model Description**: This is a fine-tuned version of the XTTS v2 (Cross-lingual Text-to-Speech) model developed by Coqui-AI, specifically fine-tuned on Hindi speech datasets to improve performance in generating natural and accurate Hindi speech. The model supports a range of features including voice cloning and multilingual speech generation. |
|
|
|
### Colab Notebook |
|
You can view the Colab notebook used for fine-tuning the XTTS v2 model on Hindi datasets and replicate the process by following this [Colab Notebook Link](https://colab.research.google.com/drive/1VwNltFIcqhB7Ydt4NVaPnYegl-qHoUSO#scrollTo=KKj-kq7iCG3d). |
|
|
|
### Features |
|
- **Languages**: Supports 16 languages including Hindi (hi). |
|
- **Voice Cloning**: Clone voices with just a 6-second audio clip. |
|
- **Emotion and Style Transfer**: Achieve emotion and style transfer by cloning. |
|
- **Cross-Language Voice Cloning**: Supports voice cloning across different languages. |
|
- **Sampling Rate**: 24kHz sampling rate for high-quality audio. |
|
|
|
### Updates over XTTS-v1 |
|
- **New Languages**: Added support for Hungarian and Korean. |
|
- **Architectural Improvements**: Enhanced speaker conditioning and interpolation. |
|
- **Stability Improvements**: Better overall stability and performance. |
|
- **Audio Quality**: Improved prosody and audio quality. |
|
|
|
### Languages |
|
The XTTS-v2 model supports 17 languages including: |
|
- **English (en)** |
|
- **Spanish (es)** |
|
- **French (fr)** |
|
- **German (de)** |
|
- **Italian (it)** |
|
- **Portuguese (pt)** |
|
- **Polish (pl)** |
|
- **Turkish (tr)** |
|
- **Russian (ru)** |
|
- **Dutch (nl)** |
|
- **Czech (cs)** |
|
- **Arabic (ar)** |
|
- **Chinese (zh-cn)** |
|
- **Japanese (ja)** |
|
- **Hungarian (hu)** |
|
- **Korean (ko)** |
|
- **Hindi (hi)** |
|
|
|
### Training Data |
|
The model was fine-tuned on the following Hindi datasets: |
|
- **Mozilla CommonVoice 18**: A diverse dataset of Hindi speech. |
|
- **IndicTTS Hindi Dataset**: Hindi speech data for text-to-speech training. |
|
|
|
### Code |
|
The [code-base](https://github.com/coqui-ai/TTS) supports both inference and [fine-tuning](https://tts.readthedocs.io/en/latest/models/xtts.html#training). |
|
|
|
### Demo Spaces |
|
- [XTTS Space](https://huggingface.co/spaces/coqui/xtts): Explore the model's performance on supported languages and try it with your own reference or microphone input. |
|
- [XTTS Voice Chat with Mistral or Zephyr](https://huggingface.co/spaces/coqui/voice-chat-with-mistral): Experience streaming voice chat with Mistral 7B Instruct or Zephyr 7B Beta. |
|
|
|
### License |
|
This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the [origin story of CPML here](https://coqui.ai/blog/tts/cpml). |
|
|
|
### Contact |
|
Join our 🐸 Community on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, you can also email us at [email protected]. |
|
|
|
### Usage |
|
|
|
#### Using 🐸TTS API |
|
```python |
|
from TTS.api import TTS |
|
|
|
# Load the model |
|
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True) |
|
|
|
# Generate speech by cloning a voice using default settings |
|
tts.tts_to_file( |
|
text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.", |
|
file_path="output.wav", |
|
speaker_wav="/path/to/target/speaker.wav", |
|
language="hi" |
|
) |
|
|