Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ tags:
|
|
18 |
|
19 |
Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
|
20 |
|
21 |
-
Trained on over
|
22 |
|
23 |
For more details, check out our official [blog post]().
|
24 |
|
@@ -49,7 +49,21 @@ For more details, check out our official [blog post]().
|
|
49 |
You can use given example code to load the model.
|
50 |
|
51 |
```{python}
|
|
|
|
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
|
55 |
|
@@ -57,14 +71,15 @@ You can use given example code to load the model.
|
|
57 |
|
58 |
| **Parameter** | **Value** |
|
59 |
|----------------------------|-------------------------|
|
60 |
-
| **Epochs** |
|
61 |
-
| **Global Batch Size** |
|
62 |
-
| **Learning Rate** |
|
63 |
| **Learning Scheduler** | Cosine |
|
64 |
| **Optimizer** | AdamW |
|
65 |
-
| **Warmup Ratio** |
|
66 |
-
| **Weight Decay** |
|
67 |
-
| **Max Sequence Length** |
|
|
|
68 |
|
69 |
## Evaluation
|
70 |
|
|
|
18 |
|
19 |
Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
|
20 |
|
21 |
+
Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.
|
22 |
|
23 |
For more details, check out our official [blog post]().
|
24 |
|
|
|
49 |
You can use given example code to load the model.
|
50 |
|
51 |
```{python}
|
52 |
+
import torch
|
53 |
+
from transformers import pipeline
|
54 |
|
55 |
+
model_id = "homebrewltd/Speechless-llama3.2-v0.1"
|
56 |
+
|
57 |
+
pipe = pipeline(
|
58 |
+
"text-generation",
|
59 |
+
model=model_id,
|
60 |
+
torch_dtype=torch.bfloat16,
|
61 |
+
device_map="auto"
|
62 |
+
)
|
63 |
+
|
64 |
+
pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")
|
65 |
+
|
66 |
+
>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]
|
67 |
```
|
68 |
|
69 |
|
|
|
71 |
|
72 |
| **Parameter** | **Value** |
|
73 |
|----------------------------|-------------------------|
|
74 |
+
| **Epochs** | 2 |
|
75 |
+
| **Global Batch Size** | 144 |
|
76 |
+
| **Learning Rate** | 3e-4 |
|
77 |
| **Learning Scheduler** | Cosine |
|
78 |
| **Optimizer** | AdamW |
|
79 |
+
| **Warmup Ratio** | 0.05 |
|
80 |
+
| **Weight Decay** | 0.01 |
|
81 |
+
| **Max Sequence Length** | 512 |
|
82 |
+
| **Clip Grad Norm** | 1.0 |
|
83 |
|
84 |
## Evaluation
|
85 |
|