prithivMLmods
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -17,75 +17,93 @@ tags:
|
|
17 |
---
|
18 |
### **Llama-Song-Stream-3B-Instruct Model Card**
|
19 |
|
20 |
-
The **Llama-Song-Stream-3B-Instruct** is a fine-tuned language model
|
21 |
-
|
22 |
-
| **File Name** | **Size** | **Description** | **Upload Status** |
|
23 |
-
|----------------------------------------|--------------------|--------------------------------------------------|--------------------|
|
24 |
-
| `.gitattributes` | 1.57 kB | LFS tracking configuration. | Uploaded |
|
25 |
-
| `README.md` | 282 Bytes | Updated documentation with project details. | Uploaded |
|
26 |
-
| `config.json` | 1.03 kB | Configuration settings for model initialization. | Uploaded |
|
27 |
-
| `generation_config.json` | 248 Bytes | Model generation settings. | Uploaded |
|
28 |
-
| `pytorch_model-00001-of-00002.bin` | 4.97 GB | Primary model weights (part 1 of 2). | Uploaded (LFS) |
|
29 |
-
| `pytorch_model-00002-of-00002.bin` | 1.46 GB | Primary model weights (part 2 of 2). | Uploaded (LFS) |
|
30 |
-
| `pytorch_model.bin.index.json` | 21.2 kB | Index file for model weight mapping. | Uploaded |
|
31 |
-
| `special_tokens_map.json` | 477 Bytes | Special tokens used by the tokenizer. | Uploaded |
|
32 |
-
| `tokenizer.json` | 17.2 MB | Tokenizer file (large LFS model tokenizer data). | Uploaded (LFS) |
|
33 |
-
| `tokenizer_config.json` | 57.4 kB | Tokenizer configuration settings. | Uploaded |
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
-
|
|
|
38 |
|
39 |
-
|
40 |
-
-
|
41 |
-
- **Model Parameters:** 3B (billion parameters).
|
42 |
-
- **Fine-tuned dataset focus:** Song generation and lyric-based chain-of-thought reasoning.
|
43 |
|
44 |
---
|
|
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
- `pytorch_model-00001-of-00002.bin` - **4.97 GB**
|
50 |
-
- `pytorch_model-00002-of-00002.bin` - **1.46 GB**
|
51 |
-
|
52 |
-
2. **Tokenizer Data:**
|
53 |
-
- Tokenizer includes LFS model configuration:
|
54 |
-
- `tokenizer.json` - **17.2 MB**
|
55 |
-
- `special_tokens_map.json` - **477 Bytes**
|
56 |
-
- `tokenizer_config.json` - **57.4 KB**
|
57 |
-
|
58 |
-
3. **Configuration Files:**
|
59 |
-
- `config.json` - Model settings (**1.03 KB**).
|
60 |
-
- `generation_config.json` - Inference task parameters (**248 Bytes**).
|
61 |
|
62 |
---
|
|
|
63 |
|
64 |
-
|
65 |
-
-
|
66 |
-
|
67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
---
|
70 |
|
71 |
-
### **
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
|
|
|
|
80 |
|
81 |
---
|
82 |
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
---
|
|
|
17 |
---
|
18 |
### **Llama-Song-Stream-3B-Instruct Model Card**
|
19 |
|
20 |
+
The **Llama-Song-Stream-3B-Instruct** is a fine-tuned language model specializing in generating music-related text, such as song lyrics, compositions, and musical thoughts. Built upon the **meta-llama/Llama-3.2-3B-Instruct** base, it has been trained with a custom dataset focused on song lyrics and music compositions to produce context-aware, creative, and stylized music output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
| **File Name** | **Size** | **Description** |
|
23 |
+
|---------------------------------|------------|-------------------------------------------------|
|
24 |
+
| `.gitattributes` | 1.57 kB | LFS tracking file to manage large model files. |
|
25 |
+
| `README.md` | 282 Bytes | Documentation with model details and usage. |
|
26 |
+
| `config.json` | 1.03 kB | Model configuration settings. |
|
27 |
+
| `generation_config.json` | 248 Bytes | Generation parameters like max sequence length. |
|
28 |
+
| `pytorch_model-00001-of-00002.bin` | 4.97 GB | Primary weights (part 1 of 2). |
|
29 |
+
| `pytorch_model-00002-of-00002.bin` | 1.46 GB | Primary weights (part 2 of 2). |
|
30 |
+
| `pytorch_model.bin.index.json` | 21.2 kB | Index file mapping the checkpoint layers. |
|
31 |
+
| `special_tokens_map.json` | 477 Bytes | Defines special tokens for tokenization. |
|
32 |
+
| `tokenizer.json` | 17.2 MB | Tokenizer data for text generation. |
|
33 |
+
| `tokenizer_config.json` | 57.4 kB | Configuration settings for tokenization. |
|
34 |
+
|
35 |
+
### **Key Features**
|
36 |
+
|
37 |
+
1. **Song Generation:**
|
38 |
+
- Generates full song lyrics based on user input, maintaining rhyme, meter, and thematic consistency.
|
39 |
+
|
40 |
+
2. **Music Context Understanding:**
|
41 |
+
- Trained on lyrics and song patterns to mimic and generate song-like content.
|
42 |
|
43 |
+
3. **Fine-tuned Creativity:**
|
44 |
+
- Fine-tuned using *Song-Catalogue-Long-Thought* for coherent lyric generation over extended prompts.
|
45 |
|
46 |
+
4. **Interactive Text Generation:**
|
47 |
+
- Designed for use cases like generating lyrical ideas, creating drafts for songwriters, or exploring themes musically.
|
|
|
|
|
48 |
|
49 |
---
|
50 |
+
### **Training Details**
|
51 |
|
52 |
+
- **Base Model:** [meta-llama/Llama-3.2-3B-Instruct](#)
|
53 |
+
- **Finetuning Dataset:** [prithivMLmods/Song-Catalogue-Long-Thought](#)
|
54 |
+
- This dataset comprises 57.7k examples of lyrical patterns, song fragments, and themes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
---
|
57 |
+
### **Applications**
|
58 |
|
59 |
+
1. **Songwriting AI Tools:**
|
60 |
+
- Generate lyrics for genres like pop, rock, rap, classical, and others.
|
61 |
+
|
62 |
+
2. **Creative Writing Assistance:**
|
63 |
+
- Assist songwriters by suggesting lyric variations and song drafts.
|
64 |
+
|
65 |
+
3. **Storytelling via Music:**
|
66 |
+
- Create song narratives using custom themes and moods.
|
67 |
+
|
68 |
+
4. **Entertainment AI Integration:**
|
69 |
+
- Build virtual musicians or interactive lyric-based content generators.
|
70 |
|
71 |
---
|
72 |
|
73 |
+
### **Example Usage**
|
74 |
+
|
75 |
+
#### **Setup**
|
76 |
+
First, load the Llama-Song-Stream model:
|
77 |
+
```python
|
78 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
79 |
+
|
80 |
+
model_name = "prithivMLmods/Llama-Song-Stream-3B-Instruct"
|
81 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
82 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
83 |
+
```
|
84 |
|
85 |
---
|
86 |
|
87 |
+
#### **Generate Lyrics Example**
|
88 |
+
```python
|
89 |
+
prompt = "Write a song about freedom and the open sky"
|
90 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
91 |
+
outputs = model.generate(**inputs, max_length=100, temperature=0.7, num_return_sequences=1)
|
92 |
+
|
93 |
+
generated_lyrics = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
94 |
+
print(generated_lyrics)
|
95 |
+
```
|
96 |
+
|
97 |
+
---
|
98 |
+
|
99 |
+
### **Deployment Notes**
|
100 |
+
|
101 |
+
1. **Serverless vs. Dedicated Endpoints:**
|
102 |
+
The model currently does not have enough usage for a serverless endpoint. Options include:
|
103 |
+
- **Dedicated inference endpoints** for faster responses.
|
104 |
+
- **Custom integrations via Hugging Face inference tools.**
|
105 |
+
|
106 |
+
2. **Resource Requirements:**
|
107 |
+
Ensure sufficient GPU memory and compute for large PyTorch model weights.
|
108 |
|
109 |
---
|