prithivMLmods commited on
Commit
a23ca38
Β·
verified Β·
1 Parent(s): 9c53466

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -1
README.md CHANGED
@@ -9,4 +9,141 @@ tags:
9
  - prev_2
10
  - self_reasoning
11
  ---
12
- ![lp2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/n6K9uTYIG6LK_GHYKx0yX.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - prev_2
10
  - self_reasoning
11
  ---
12
+ ![lp2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/n6K9uTYIG6LK_GHYKx0yX.png)
13
+
14
+ # **Llama-Thinker-3B-Preview2**
15
+
16
+ Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned generative model designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
17
+
18
+ Model Architecture: [ Based on Llama 3.2 ] is an autoregressive language model that uses an optimized transformer architecture. The tuned versions undergo supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
19
+
20
+ # **Use with transformers**
21
+
22
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
23
+
24
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
25
+
26
+ ```python
27
+ import torch
28
+ from transformers import pipeline
29
+
30
+ model_id = "prithivMLmods/Llama-Thinker-3B-Preview2"
31
+ pipe = pipeline(
32
+ "text-generation",
33
+ model=model_id,
34
+ torch_dtype=torch.bfloat16,
35
+ device_map="auto",
36
+ )
37
+ messages = [
38
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
39
+ {"role": "user", "content": "Who are you?"},
40
+ ]
41
+ outputs = pipe(
42
+ messages,
43
+ max_new_tokens=256,
44
+ )
45
+ print(outputs[0]["generated_text"][-1])
46
+ ```
47
+
48
+ Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
49
+
50
+ # **Use with `llama`**
51
+
52
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)
53
+
54
+ To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
55
+
56
+ ```
57
+ huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2
58
+ ```
59
+
60
+ Here’s a version tailored for the **Llama-Thinker-3B-Preview2-GGUF** model:
61
+
62
+ ---
63
+
64
+ # **How to Run Llama-Thinker-3B-Preview2 on Ollama Locally**
65
+
66
+ This guide demonstrates how to run the **Llama-Thinker-3B-Preview2-GGUF** model locally using Ollama. The model is instruction-tuned for multilingual tasks and complex reasoning, making it highly versatile for a wide range of use cases. By the end, you'll be equipped to run this and other open-source models with ease.
67
+
68
+ ---
69
+
70
+ ## Example 1: How to Run the Llama-Thinker-3B-Preview2 Model
71
+
72
+ The **Llama-Thinker-3B-Preview2** model is a pretrained and instruction-tuned LLM, designed for complex reasoning tasks across multiple languages. In this guide, we'll interact with it locally using Ollama, with support for quantized models.
73
+
74
+ ### Step 1: Download the Model
75
+
76
+ First, download the **Llama-Thinker-3B-Preview2-GGUF** model using the following command:
77
+
78
+ ```bash
79
+ ollama run llama-thinker-3b-preview2.gguf
80
+ ```
81
+
82
+ ### Step 2: Model Initialization and Download
83
+
84
+ Once the command is executed, Ollama will initialize and download the necessary model files. You should see output similar to this:
85
+
86
+ ```plaintext
87
+ pulling manifest
88
+ pulling a12cd3456efg... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 3.2 GB
89
+ pulling 9f87ghijklmn... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 6.5 KB
90
+ verifying sha256 digest
91
+ writing manifest
92
+ removing any unused layers
93
+ success
94
+ >>> Send a message (/? for help)
95
+ ```
96
+
97
+ ### Step 3: Interact with the Model
98
+
99
+ Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask:
100
+
101
+ ```plaintext
102
+ >>> How can you assist me today?
103
+ ```
104
+
105
+ A sample response might look like this [may / maynot be identical]:
106
+
107
+ ```plaintext
108
+ I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with:
109
+
110
+ 1. Answering complex questions in multiple languages.
111
+ 2. Assisting with creative writing, content generation, and problem-solving.
112
+ 3. Providing detailed summaries and explanations.
113
+ 4. Translating text across different languages.
114
+ 5. Generating ideas for personal or professional use.
115
+ 6. Offering insights on technical topics.
116
+
117
+ Feel free to ask me anything you'd like assistance with!
118
+ ```
119
+
120
+ ### Step 4: Exit the Program
121
+
122
+ To exit the program, simply type:
123
+
124
+ ```plaintext
125
+ /exit
126
+ ```
127
+
128
+ ---
129
+
130
+ ## Example 2: Using Multi-Modal Models (Future Use)
131
+
132
+ In the future, Ollama may support multi-modal models where you can input both text and images for advanced interactions. This section will be updated as new capabilities become available.
133
+
134
+ ---
135
+
136
+ ## Notes on Using Quantized Models
137
+
138
+ Quantized models like **llama-thinker-3b-preview2.gguf** are optimized for efficient performance on local systems with limited resources. Here are some key points to ensure smooth operation:
139
+
140
+ 1. **VRAM/CPU Requirements**: Ensure your system has adequate VRAM or CPU resources to handle model inference.
141
+ 2. **Model Format**: Use the `.gguf` model format for compatibility with Ollama.
142
+
143
+ ---
144
+
145
+ # **Conclusion**
146
+
147
+ Running the **Llama-Thinker-3B-Preview2** model locally using Ollama provides a powerful way to leverage open-source LLMs for complex reasoning and multilingual tasks. By following this guide, you can explore other models and expand your use cases as new models become available.
148
+
149
+ ---