Upload README.md
Browse files
README.md
CHANGED
@@ -12,39 +12,67 @@ tags:
|
|
12 |
---
|
13 |
Quantizations of https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
# From original readme
|
16 |
|
17 |
## How to Use
|
18 |
|
19 |
-
Phi-3 Mini-
|
20 |
-
|
21 |
-
* When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
|
26 |
|
27 |
### Tokenizer
|
28 |
|
29 |
-
Phi-3 Mini-
|
30 |
|
31 |
### Chat Format
|
32 |
|
33 |
-
Given the nature of the training data, the Phi-3 Mini-
|
34 |
You can provide the prompt as a question with a generic template as follow:
|
35 |
```markdown
|
36 |
-
<|
|
|
|
|
|
|
|
|
|
37 |
```
|
|
|
38 |
For example:
|
39 |
```markdown
|
|
|
|
|
40 |
<|user|>
|
41 |
How to explain Internet for a medieval knight?<|end|>
|
42 |
-
<|assistant|>
|
43 |
```
|
44 |
-
|
45 |
-
where the model generates the text after `<|assistant|>`. In case of few-shots prompt, the prompt can be formatted as the following:
|
46 |
|
47 |
```markdown
|
|
|
|
|
48 |
<|user|>
|
49 |
I am going to Paris, what should I see?<|end|>
|
50 |
<|assistant|>
|
@@ -59,40 +87,39 @@ What is so great about #1?<|end|>
|
|
59 |
This code snippets show how to get quickly started with running the model on a GPU:
|
60 |
|
61 |
```python
|
62 |
-
import torch
|
63 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
64 |
-
|
65 |
-
torch.random.manual_seed(0)
|
66 |
-
|
67 |
-
|
68 |
-
"
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-
|
74 |
-
|
75 |
-
messages = [
|
76 |
-
{"role": "
|
77 |
-
{"role": "
|
78 |
-
{"role": "
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
"
|
90 |
-
"
|
91 |
-
"
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
*Some applications/frameworks might not include a BOS token (`<s>`) at the start of the conversation. Please ensure that it is included since it provides more reliable results.*
|
|
|
12 |
---
|
13 |
Quantizations of https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
|
14 |
|
15 |
+
**Requantized and reuploaded!** GGUFs for the latest Phi-3 mini model (July 2024 update), which includes::
|
16 |
+
* Significantly increased code understanding in Python, C++, Rust, and Typescript.
|
17 |
+
* Enhanced post-training for better-structured output.
|
18 |
+
* Improved multi-turn instruction following.
|
19 |
+
* Support for <|system|> tag.
|
20 |
+
* Improved reasoning and long-context understanding.
|
21 |
+
|
22 |
+
### Inference Clients/UIs
|
23 |
+
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
24 |
+
* [JanAI](https://github.com/janhq/jan)
|
25 |
+
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
|
26 |
+
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
27 |
+
* [ollama](https://github.com/ollama/ollama)
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
# From original readme
|
32 |
|
33 |
## How to Use
|
34 |
|
35 |
+
Phi-3 Mini-4K-Instruct has been integrated in the `4.41.2` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
|
|
|
|
|
36 |
|
37 |
+
Examples of required packages:
|
38 |
+
```
|
39 |
+
flash_attn==2.5.8
|
40 |
+
torch==2.3.1
|
41 |
+
accelerate==0.31.0
|
42 |
+
transformers==4.41.2
|
43 |
+
```
|
44 |
|
45 |
+
Phi-3 Mini-4K-Instruct is also available in [Azure AI Studio](https://aka.ms/try-phi3)
|
46 |
|
47 |
### Tokenizer
|
48 |
|
49 |
+
Phi-3 Mini-4K-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
|
50 |
|
51 |
### Chat Format
|
52 |
|
53 |
+
Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows.
|
54 |
You can provide the prompt as a question with a generic template as follow:
|
55 |
```markdown
|
56 |
+
<|system|>
|
57 |
+
You are a helpful assistant.<|end|>
|
58 |
+
<|user|>
|
59 |
+
Question?<|end|>
|
60 |
+
<|assistant|>
|
61 |
```
|
62 |
+
|
63 |
For example:
|
64 |
```markdown
|
65 |
+
<|system|>
|
66 |
+
You are a helpful assistant.<|end|>
|
67 |
<|user|>
|
68 |
How to explain Internet for a medieval knight?<|end|>
|
69 |
+
<|assistant|>
|
70 |
```
|
71 |
+
where the model generates the text after `<|assistant|>` . In case of few-shots prompt, the prompt can be formatted as the following:
|
|
|
72 |
|
73 |
```markdown
|
74 |
+
<|system|>
|
75 |
+
You are a helpful travel assistant.<|end|>
|
76 |
<|user|>
|
77 |
I am going to Paris, what should I see?<|end|>
|
78 |
<|assistant|>
|
|
|
87 |
This code snippets show how to get quickly started with running the model on a GPU:
|
88 |
|
89 |
```python
|
90 |
+
import torch
|
91 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
92 |
+
|
93 |
+
torch.random.manual_seed(0)
|
94 |
+
model = AutoModelForCausalLM.from_pretrained(
|
95 |
+
"microsoft/Phi-3-mini-4k-instruct",
|
96 |
+
device_map="cuda",
|
97 |
+
torch_dtype="auto",
|
98 |
+
trust_remote_code=True,
|
99 |
+
)
|
100 |
+
|
101 |
+
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
|
102 |
+
|
103 |
+
messages = [
|
104 |
+
{"role": "system", "content": "You are a helpful AI assistant."},
|
105 |
+
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
106 |
+
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
107 |
+
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
108 |
+
]
|
109 |
+
|
110 |
+
pipe = pipeline(
|
111 |
+
"text-generation",
|
112 |
+
model=model,
|
113 |
+
tokenizer=tokenizer,
|
114 |
+
)
|
115 |
+
|
116 |
+
generation_args = {
|
117 |
+
"max_new_tokens": 500,
|
118 |
+
"return_full_text": False,
|
119 |
+
"temperature": 0.0,
|
120 |
+
"do_sample": False,
|
121 |
+
}
|
122 |
+
|
123 |
+
output = pipe(messages, **generation_args)
|
124 |
+
print(output[0]['generated_text'])
|
125 |
+
```
|
|