Triangle104 commited on
Commit
59463ba
Β·
verified Β·
1 Parent(s): 5df9747

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +261 -0
README.md CHANGED
@@ -20,6 +20,267 @@ pipeline_tag: text-generation
20
  This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
21
  Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Use with llama.cpp
24
  Install llama.cpp through brew (works on Mac and Linux)
25
 
 
20
  This model was converted to GGUF format from [`prithivMLmods/Llama-Thinker-3B-Preview2`](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
21
  Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Thinker-3B-Preview2) for more details on the model.
22
 
23
+ ---
24
+ Model details:
25
+ -
26
+ Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned
27
+ generative model designed for multilingual applications. These models
28
+ are trained using synthetic datasets based on long chains of thought,
29
+ enabling them to perform complex reasoning tasks effectively.
30
+
31
+
32
+ Model Architecture: [ Based on Llama 3.2 ] is an autoregressive
33
+ language model that uses an optimized transformer architecture. The
34
+ tuned versions undergo supervised fine-tuning (SFT) and reinforcement
35
+ learning with human feedback (RLHF) to align with human preferences for
36
+ helpfulness and safety.
37
+
38
+
39
+
40
+
41
+
42
+
43
+
44
+ Use with transformers
45
+
46
+
47
+
48
+
49
+ Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
50
+
51
+
52
+ Make sure to update your transformers installation via pip install --upgrade transformers.
53
+
54
+
55
+ import torch
56
+ from transformers import pipeline
57
+
58
+ model_id = "prithivMLmods/Llama-Thinker-3B-Preview2"
59
+ pipe = pipeline(
60
+ "text-generation",
61
+ model=model_id,
62
+ torch_dtype=torch.bfloat16,
63
+ device_map="auto",
64
+ )
65
+ messages = [
66
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
67
+ {"role": "user", "content": "Who are you?"},
68
+ ]
69
+ outputs = pipe(
70
+ messages,
71
+ max_new_tokens=256,
72
+ )
73
+ print(outputs[0]["generated_text"][-1])
74
+
75
+
76
+
77
+ Note: You can also find detailed recipes on how to use the model locally, with torch.compile(), assisted generations, quantised and more at huggingface-llama-recipes
78
+
79
+
80
+
81
+
82
+
83
+
84
+
85
+ Use with llama
86
+
87
+
88
+
89
+
90
+ Please, follow the instructions in the repository
91
+
92
+
93
+ To download Original checkpoints, see the example command below leveraging huggingface-cli:
94
+
95
+
96
+ huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2
97
+
98
+
99
+
100
+ Here’s a version tailored for the Llama-Thinker-3B-Preview2-GGUF model:
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+ How to Run Llama-Thinker-3B-Preview2 on Ollama Locally
110
+
111
+
112
+
113
+
114
+ This guide demonstrates how to run the Llama-Thinker-3B-Preview2-GGUF
115
+ model locally using Ollama. The model is instruction-tuned for
116
+ multilingual tasks and complex reasoning, making it highly versatile for
117
+ a wide range of use cases. By the end, you'll be equipped to run this
118
+ and other open-source models with ease.
119
+
120
+
121
+
122
+
123
+
124
+
125
+
126
+
127
+ Example 1: How to Run the Llama-Thinker-3B-Preview2 Model
128
+
129
+
130
+
131
+
132
+ The Llama-Thinker-3B-Preview2 model is a pretrained
133
+ and instruction-tuned LLM, designed for complex reasoning tasks across
134
+ multiple languages. In this guide, we'll interact with it locally using
135
+ Ollama, with support for quantized models.
136
+
137
+
138
+
139
+
140
+
141
+
142
+
143
+ Step 1: Download the Model
144
+
145
+
146
+
147
+
148
+ First, download the Llama-Thinker-3B-Preview2-GGUF model using the following command:
149
+
150
+
151
+ ollama run llama-thinker-3b-preview2.gguf
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+ Step 2: Model Initialization and Download
161
+
162
+
163
+
164
+
165
+ Once the command is executed, Ollama will initialize and download the
166
+ necessary model files. You should see output similar to this:
167
+
168
+
169
+ pulling manifest
170
+ pulling a12cd3456efg... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 3.2 GB
171
+ pulling 9f87ghijklmn... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 6.5 KB
172
+ verifying sha256 digest
173
+ writing manifest
174
+ removing any unused layers
175
+ success
176
+ >>> Send a message (/? for help)
177
+
178
+
179
+
180
+
181
+
182
+
183
+
184
+
185
+ Step 3: Interact with the Model
186
+
187
+
188
+
189
+
190
+ Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
191
+
192
+
193
+ >>> How can you assist me today?
194
+
195
+
196
+
197
+ A sample response might look like this [may / maynot be identical]:
198
+
199
+
200
+ I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
201
+
202
+ 1. Answering complex questions in multiple languages.
203
+ 2. Assisting with creative writing, content generation, and problem-solving.
204
+ 3. Providing detailed summaries and explanations.
205
+ 4. Translating text across different languages.
206
+ 5. Generating ideas for personal or professional use.
207
+ 6. Offering insights on technical topics.
208
+
209
+ Feel free to ask me anything you'd like assistance with!
210
+
211
+
212
+
213
+
214
+
215
+
216
+
217
+
218
+ Step 4: Exit the Program
219
+
220
+
221
+
222
+
223
+ To exit the program, simply type:
224
+
225
+
226
+ /exit
227
+
228
+
229
+
230
+
231
+
232
+
233
+
234
+
235
+
236
+ Example 2: Using Multi-Modal Models (Future Use)
237
+
238
+
239
+
240
+
241
+ In the future, Ollama may support multi-modal models where you can
242
+ input both text and images for advanced interactions. This section will
243
+ be updated as new capabilities become available.
244
+
245
+
246
+
247
+
248
+
249
+
250
+
251
+
252
+ Notes on Using Quantized Models
253
+
254
+
255
+
256
+
257
+ Quantized models like llama-thinker-3b-preview2.gguf
258
+ are optimized for efficient performance on local systems with limited
259
+ resources. Here are some key points to ensure smooth operation:
260
+
261
+
262
+ VRAM/CPU Requirements: Ensure your system has adequate VRAM or CPU resources to handle model inference.
263
+ Model Format: Use the .gguf model format for compatibility with Ollama.
264
+
265
+
266
+
267
+
268
+
269
+
270
+
271
+
272
+ Conclusion
273
+
274
+
275
+
276
+
277
+ Running the Llama-Thinker-3B-Preview2 model locally
278
+ using Ollama provides a powerful way to leverage open-source LLMs for
279
+ complex reasoning and multilingual tasks. By following this guide, you
280
+ can explore other models and expand your use cases as new models become
281
+ available.
282
+
283
+ ---
284
  ## Use with llama.cpp
285
  Install llama.cpp through brew (works on Mac and Linux)
286