File size: 3,570 Bytes
23e632e cd84f6b 23e632e 94c6afa 23e632e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: other
license_name: coqui-public-model-license
license_link: https://coqui.ai/cpml
library_name: coqui
pipeline_tag: text-to-speech
widget:
- text: "Once when I was six years old I saw a magnificent picture"
---
# XTTS v2 Fine-Tuned on Hindi Datasets
**Model Name**: XTTS v2 Fine-Tuned on Hindi Datasets
**Model Description**: This is a fine-tuned version of the XTTS v2 (Cross-lingual Text-to-Speech) model developed by Coqui-AI, specifically fine-tuned on Hindi speech datasets to improve performance in generating natural and accurate Hindi speech. The model supports a range of features including voice cloning and multilingual speech generation.
### Colab Notebook
You can view the Colab notebook used for fine-tuning the XTTS v2 model on Hindi datasets and replicate the process by following this [Colab Notebook Link](https://colab.research.google.com/drive/1VwNltFIcqhB7Ydt4NVaPnYegl-qHoUSO#scrollTo=KKj-kq7iCG3d).
### Features
- **Languages**: Supports 16 languages including Hindi (hi).
- **Voice Cloning**: Clone voices with just a 6-second audio clip.
- **Emotion and Style Transfer**: Achieve emotion and style transfer by cloning.
- **Cross-Language Voice Cloning**: Supports voice cloning across different languages.
- **Sampling Rate**: 24kHz sampling rate for high-quality audio.
### Updates over XTTS-v1
- **New Languages**: Added support for Hungarian and Korean.
- **Architectural Improvements**: Enhanced speaker conditioning and interpolation.
- **Stability Improvements**: Better overall stability and performance.
- **Audio Quality**: Improved prosody and audio quality.
### Languages
The XTTS-v2 model supports 17 languages including:
- **English (en)**
- **Spanish (es)**
- **French (fr)**
- **German (de)**
- **Italian (it)**
- **Portuguese (pt)**
- **Polish (pl)**
- **Turkish (tr)**
- **Russian (ru)**
- **Dutch (nl)**
- **Czech (cs)**
- **Arabic (ar)**
- **Chinese (zh-cn)**
- **Japanese (ja)**
- **Hungarian (hu)**
- **Korean (ko)**
- **Hindi (hi)**
### Training Data
The model was fine-tuned on the following Hindi datasets:
- **Mozilla CommonVoice 18**: A diverse dataset of Hindi speech.
- **IndicTTS Hindi Dataset**: Hindi speech data for text-to-speech training.
### Code
The [code-base](https://github.com/coqui-ai/TTS) supports both inference and [fine-tuning](https://tts.readthedocs.io/en/latest/models/xtts.html#training).
### Demo Spaces
- [XTTS Space](https://huggingface.co/spaces/coqui/xtts): Explore the model's performance on supported languages and try it with your own reference or microphone input.
- [XTTS Voice Chat with Mistral or Zephyr](https://huggingface.co/spaces/coqui/voice-chat-with-mistral): Experience streaming voice chat with Mistral 7B Instruct or Zephyr 7B Beta.
### License
This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the [origin story of CPML here](https://coqui.ai/blog/tts/cpml).
### Contact
Join our 🐸 Community on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, you can also email us at [email protected].
### Usage
#### Using 🐸TTS API
```python
from TTS.api import TTS
# Load the model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)
# Generate speech by cloning a voice using default settings
tts.tts_to_file(
text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
file_path="output.wav",
speaker_wav="/path/to/target/speaker.wav",
language="hi"
)
|