duyntnet commited on
Commit
3c27748
·
verified ·
1 Parent(s): f1589aa

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -48
README.md CHANGED
@@ -12,39 +12,67 @@ tags:
12
  ---
13
  Quantizations of https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # From original readme
16
 
17
  ## How to Use
18
 
19
- Phi-3 Mini-128K-Instruct has been integrated in the development version (4.41.0.dev0) of `transformers`. Until the official version is released through `pip`, ensure that you are doing one of the following:
20
-
21
- * When loading the model, ensure that `trust_remote_code=True` is passed as an argument of the `from_pretrained()` function.
22
 
23
- * Update your local `transformers` to the development version: `pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers`. The previous command is an alternative to cloning and installing from the source.
 
 
 
 
 
 
24
 
25
- The current `transformers` version can be verified with: `pip list | grep transformers`.
26
 
27
  ### Tokenizer
28
 
29
- Phi-3 Mini-128K-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
30
 
31
  ### Chat Format
32
 
33
- Given the nature of the training data, the Phi-3 Mini-128K-Instruct model is best suited for prompts using the chat format as follows.
34
  You can provide the prompt as a question with a generic template as follow:
35
  ```markdown
36
- <|user|>\nQuestion<|end|>\n<|assistant|>
 
 
 
 
37
  ```
 
38
  For example:
39
  ```markdown
 
 
40
  <|user|>
41
  How to explain Internet for a medieval knight?<|end|>
42
- <|assistant|>
43
  ```
44
-
45
- where the model generates the text after `<|assistant|>`. In case of few-shots prompt, the prompt can be formatted as the following:
46
 
47
  ```markdown
 
 
48
  <|user|>
49
  I am going to Paris, what should I see?<|end|>
50
  <|assistant|>
@@ -59,40 +87,39 @@ What is so great about #1?<|end|>
59
  This code snippets show how to get quickly started with running the model on a GPU:
60
 
61
  ```python
62
- import torch
63
- from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
64
-
65
- torch.random.manual_seed(0)
66
-
67
- model = AutoModelForCausalLM.from_pretrained(
68
- "microsoft/Phi-3-mini-128k-instruct",
69
- device_map="cuda",
70
- torch_dtype="auto",
71
- trust_remote_code=True,
72
- )
73
- tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")
74
-
75
- messages = [
76
- {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
77
- {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
78
- {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
79
- ]
80
-
81
- pipe = pipeline(
82
- "text-generation",
83
- model=model,
84
- tokenizer=tokenizer,
85
- )
86
-
87
- generation_args = {
88
- "max_new_tokens": 500,
89
- "return_full_text": False,
90
- "temperature": 0.0,
91
- "do_sample": False,
92
- }
93
-
94
- output = pipe(messages, **generation_args)
95
- print(output[0]['generated_text'])
96
- ```
97
-
98
- *Some applications/frameworks might not include a BOS token (`<s>`) at the start of the conversation. Please ensure that it is included since it provides more reliable results.*
 
12
  ---
13
  Quantizations of https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
14
 
15
+ **Requantized and reuploaded!** GGUFs for the latest Phi-3 mini model (July 2024 update), which includes::
16
+ * Significantly increased code understanding in Python, C++, Rust, and Typescript.
17
+ * Enhanced post-training for better-structured output.
18
+ * Improved multi-turn instruction following.
19
+ * Support for <|system|> tag.
20
+ * Improved reasoning and long-context understanding.
21
+
22
+ ### Inference Clients/UIs
23
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp)
24
+ * [JanAI](https://github.com/janhq/jan)
25
+ * [KoboldCPP](https://github.com/LostRuins/koboldcpp)
26
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
27
+ * [ollama](https://github.com/ollama/ollama)
28
+
29
+ ---
30
+
31
  # From original readme
32
 
33
  ## How to Use
34
 
35
+ Phi-3 Mini-4K-Instruct has been integrated in the `4.41.2` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
 
 
36
 
37
+ Examples of required packages:
38
+ ```
39
+ flash_attn==2.5.8
40
+ torch==2.3.1
41
+ accelerate==0.31.0
42
+ transformers==4.41.2
43
+ ```
44
 
45
+ Phi-3 Mini-4K-Instruct is also available in [Azure AI Studio](https://aka.ms/try-phi3)
46
 
47
  ### Tokenizer
48
 
49
+ Phi-3 Mini-4K-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
50
 
51
  ### Chat Format
52
 
53
+ Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows.
54
  You can provide the prompt as a question with a generic template as follow:
55
  ```markdown
56
+ <|system|>
57
+ You are a helpful assistant.<|end|>
58
+ <|user|>
59
+ Question?<|end|>
60
+ <|assistant|>
61
  ```
62
+
63
  For example:
64
  ```markdown
65
+ <|system|>
66
+ You are a helpful assistant.<|end|>
67
  <|user|>
68
  How to explain Internet for a medieval knight?<|end|>
69
+ <|assistant|>
70
  ```
71
+ where the model generates the text after `<|assistant|>` . In case of few-shots prompt, the prompt can be formatted as the following:
 
72
 
73
  ```markdown
74
+ <|system|>
75
+ You are a helpful travel assistant.<|end|>
76
  <|user|>
77
  I am going to Paris, what should I see?<|end|>
78
  <|assistant|>
 
87
  This code snippets show how to get quickly started with running the model on a GPU:
88
 
89
  ```python
90
+ import torch
91
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
92
+
93
+ torch.random.manual_seed(0)
94
+ model = AutoModelForCausalLM.from_pretrained(
95
+ "microsoft/Phi-3-mini-4k-instruct",
96
+ device_map="cuda",
97
+ torch_dtype="auto",
98
+ trust_remote_code=True,
99
+ )
100
+
101
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
102
+
103
+ messages = [
104
+ {"role": "system", "content": "You are a helpful AI assistant."},
105
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
106
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
107
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
108
+ ]
109
+
110
+ pipe = pipeline(
111
+ "text-generation",
112
+ model=model,
113
+ tokenizer=tokenizer,
114
+ )
115
+
116
+ generation_args = {
117
+ "max_new_tokens": 500,
118
+ "return_full_text": False,
119
+ "temperature": 0.0,
120
+ "do_sample": False,
121
+ }
122
+
123
+ output = pipe(messages, **generation_args)
124
+ print(output[0]['generated_text'])
125
+ ```