hans00 commited on
Commit
ed8c643
·
verified ·
1 Parent(s): a627c6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -5,3 +5,39 @@ language:
5
  library_name: transformers.js
6
  ---
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  library_name: transformers.js
6
  ---
7
 
8
+ # VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
9
+
10
+ VITS is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.
11
+
12
+ ## Model Details
13
+
14
+ Languages: Chinese
15
+
16
+ Dataset: THCHS-30
17
+
18
+ Speakers: 44
19
+
20
+ Training Hours: 48
21
+
22
+ ## Usage
23
+
24
+ Using this checkpoint from Hugging Face Transformers:
25
+
26
+ ```py
27
+ from transformers import VitsModel, VitsTokenizer
28
+ from pypinyin import lazy_pinyin, Style
29
+ import torch
30
+
31
+ model = VitsModel.from_pretrained("BricksDisplay/vits-cmn")
32
+ tokenizer = VitsTokenizer.from_pretrained("BricksDisplay/vits-cmn")
33
+
34
+ text = "中文"
35
+ payload = ''.join(lazy_pinyin(text, style=Style.TONE, tone_sandhi=True))
36
+ inputs = tokenizer(payload, return_tensors="pt")
37
+
38
+ with torch.no_grad():
39
+ output = model(**inputs, speaker_id=0)
40
+
41
+ from IPython.display import Audio
42
+ Audio(output.audio[0], rate=16000)
43
+ ```