Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
-
license_link: https://huggingface.co/
|
4 |
language:
|
5 |
- multilingual
|
6 |
pipeline_tag: text-generation
|
@@ -11,7 +11,7 @@ widget:
|
|
11 |
- messages:
|
12 |
- role: user
|
13 |
content: Can you provide ways to eat combinations of bananas and dragonfruits?
|
14 |
-
library_name:
|
15 |
---
|
16 |
|
17 |
## Model Summary
|
@@ -50,18 +50,7 @@ Our models are not specifically designed or evaluated for all downstream purpose
|
|
50 |
## Usage
|
51 |
|
52 |
### Requirements
|
53 |
-
Phi-3.5-MoE-instruct is integrated in the official version of
|
54 |
-
The current `transformers` version can be verified with: `pip list | grep transformers`.
|
55 |
-
|
56 |
-
Examples of required packages:
|
57 |
-
```
|
58 |
-
flash_attn==2.5.8
|
59 |
-
torch==2.3.1
|
60 |
-
accelerate==0.31.0
|
61 |
-
transformers==4.46.0
|
62 |
-
```
|
63 |
-
|
64 |
-
Phi-3.5-MoE-instruct is also available in [Azure AI Studio](https://aka.ms/try-phi3.5moe)
|
65 |
|
66 |
### Tokenizer
|
67 |
|
@@ -81,43 +70,8 @@ How to explain Internet for a medieval knight?<|end|>
|
|
81 |
### Loading the model locally
|
82 |
After obtaining the Phi-3.5-MoE-instruct model checkpoints, users can use this sample code for inference.
|
83 |
|
84 |
-
```
|
85 |
-
|
86 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
87 |
-
|
88 |
-
torch.random.manual_seed(0)
|
89 |
-
|
90 |
-
model = AutoModelForCausalLM.from_pretrained(
|
91 |
-
"microsoft/Phi-3.5-MoE-instruct",
|
92 |
-
device_map="cuda",
|
93 |
-
torch_dtype="auto",
|
94 |
-
trust_remote_code=False,
|
95 |
-
)
|
96 |
-
|
97 |
-
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-MoE-instruct")
|
98 |
-
|
99 |
-
messages = [
|
100 |
-
{"role": "system", "content": "You are a helpful AI assistant."},
|
101 |
-
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
|
102 |
-
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
|
103 |
-
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
|
104 |
-
]
|
105 |
-
|
106 |
-
pipe = pipeline(
|
107 |
-
"text-generation",
|
108 |
-
model=model,
|
109 |
-
tokenizer=tokenizer,
|
110 |
-
)
|
111 |
-
|
112 |
-
generation_args = {
|
113 |
-
"max_new_tokens": 500,
|
114 |
-
"return_full_text": False,
|
115 |
-
"temperature": 0.0,
|
116 |
-
"do_sample": False,
|
117 |
-
}
|
118 |
-
|
119 |
-
output = pipe(messages, **generation_args)
|
120 |
-
print(output[0]['generated_text'])
|
121 |
```
|
122 |
|
123 |
## Benchmarks
|
@@ -263,116 +217,8 @@ highlight the need for industry-wide investment in the development of high-quali
|
|
263 |
and risk areas that account for cultural nuances where those languages are spoken.
|
264 |
|
265 |
## Software
|
266 |
-
* [
|
267 |
-
|
268 |
-
* [Flash-Attention](https://github.com/HazyResearch/flash-attention)
|
269 |
-
|
270 |
-
## Hardware
|
271 |
-
Note that by default, the Phi-3.5-MoE-instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
|
272 |
-
* NVIDIA A100
|
273 |
-
* NVIDIA A6000
|
274 |
-
* NVIDIA H100
|
275 |
|
276 |
## License
|
277 |
The model is licensed under the [MIT license](./LICENSE).
|
278 |
-
|
279 |
-
## Trademarks
|
280 |
-
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
|
281 |
-
|
282 |
-
|
283 |
-
## Appendix A: Korean benchmarks
|
284 |
-
|
285 |
-
The prompt is the same as the [CLIcK paper](https://arxiv.org/abs/2403.06412) prompt. The experimental results below were given with max_tokens=512 (zero-shot), max_tokens=1024 (5-shot), temperature=0.01. No system prompt used.
|
286 |
-
|
287 |
-
- GPT-4o: 2024-05-13 version
|
288 |
-
- GPT-4o-mini: 2024-07-18 version
|
289 |
-
- GPT-4-turbo: 2024-04-09 version
|
290 |
-
- GPT-3.5-turbo: 2023-06-13 version
|
291 |
-
|
292 |
-
Overall, the Phi-3.5 MoE model with just 6.6B active params outperforms GPT-3.5-Turbo.
|
293 |
-
|
294 |
-
| Benchmarks | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
295 |
-
|:-------------------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
296 |
-
| CLIcK | 56.44 | 29.12 | 47.82 | 80.46 | 68.5 | 72.82 | 50.98 |
|
297 |
-
| HAERAE 1.0 | 61.83 | 36.41 | 53.9 | 85.7 | 76.4 | 77.76 | 52.67 |
|
298 |
-
| KMMLU (0-shot, CoT) | 47.43 | 30.82 | 38.54 | 64.26 | 52.63 | 58.75 | 40.3 |
|
299 |
-
| KMMLU (5-shot) | 47.92 | 29.98 | 20.21 | 64.28 | 51.62 | 59.29 | 42.28 |
|
300 |
-
| KMMLU-HARD (0-shot, CoT) | 25.34 | 25.68 | 24.03 | 39.62 | 24.56 | 30.56 | 20.97 |
|
301 |
-
| KMMLU-HARD (5-shot) | 25.66 | 25.73 | 15.81 | 40.94 | 24.63 | 31.12 | 21.19 |
|
302 |
-
| **Average** | **45.82** | **29.99** | **29.29** | **62.54** | **50.08** | **56.74** | **39.61** |
|
303 |
-
|
304 |
-
#### CLIcK (Cultural and Linguistic Intelligence in Korean)
|
305 |
-
|
306 |
-
##### Accuracy by supercategory
|
307 |
-
| supercategory | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
308 |
-
|:----------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
309 |
-
| Culture | 58.44 | 29.74 | 51.15 | 81.89 | 70.95 | 73.61 | 53.38 |
|
310 |
-
| Language | 52.31 | 27.85 | 40.92 | 77.54 | 63.54 | 71.23 | 46 |
|
311 |
-
| **Overall** | 56.44 | 29.12 | 47.82 | 80.46 | 68.5 | 72.82 | 50.98 |
|
312 |
-
|
313 |
-
##### Accuracy by category
|
314 |
-
| supercategory | category | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
315 |
-
|:----------------|:------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
316 |
-
| Culture | Economy | 77.97 | 28.81 | 66.1 | 94.92 | 83.05 | 89.83 | 64.41 |
|
317 |
-
| Culture | Geography | 60.31 | 29.01 | 54.2 | 80.15 | 77.86 | 82.44 | 53.44 |
|
318 |
-
| Culture | History | 33.93 | 30 | 29.64 | 66.92 | 48.4 | 46.4 | 31.79 |
|
319 |
-
| Culture | Law | 52.51 | 22.83 | 44.29 | 70.78 | 57.53 | 61.19 | 41.55 |
|
320 |
-
| Culture | Politics | 70.24 | 33.33 | 59.52 | 88.1 | 83.33 | 89.29 | 65.48 |
|
321 |
-
| Culture | Pop Culture | 80.49 | 34.15 | 60.98 | 97.56 | 85.37 | 92.68 | 75.61 |
|
322 |
-
| Culture | Society | 74.43 | 31.72 | 65.05 | 92.88 | 85.44 | 86.73 | 71.2 |
|
323 |
-
| Culture | Tradition | 58.11 | 31.98 | 54.95 | 87.39 | 74.77 | 79.28 | 55.86 |
|
324 |
-
| Language | Functional | 48 | 24 | 32.8 | 84.8 | 64.8 | 80 | 40 |
|
325 |
-
| Language | Grammar | 29.58 | 23.33 | 22.92 | 57.08 | 42.5 | 47.5 | 30 |
|
326 |
-
| Language | Textual | 73.33 | 33.33 | 59.65 | 91.58 | 80.7 | 87.37 | 62.11 |
|
327 |
-
|
328 |
-
#### HAERAE 1.0
|
329 |
-
|
330 |
-
| category | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
331 |
-
|:----------------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
332 |
-
| General Knowledge | 39.77 | 28.41 | 34.66 | 77.27 | 53.41 | 66.48 | 40.91 |
|
333 |
-
| History | 60.64 | 22.34 | 44.15 | 92.02 | 84.57 | 78.72 | 30.32 |
|
334 |
-
| Loan Words | 70.41 | 35.5 | 63.31 | 79.88 | 76.33 | 78.11 | 59.17 |
|
335 |
-
| Rare Words | 63.95 | 42.96 | 63.21 | 87.9 | 81.98 | 79.01 | 61.23 |
|
336 |
-
| Reading Comprehension | 64.43 | 41.16 | 51.9 | 85.46 | 77.18 | 80.09 | 56.15 |
|
337 |
-
| Standard Nomenclature | 66.01 | 32.68 | 58.82 | 88.89 | 75.82 | 79.08 | 53.59 |
|
338 |
-
| **Overall** | 61.83 | 36.41 | 53.9 | 85.7 | 76.4 | 77.76 | 52.67 |
|
339 |
-
|
340 |
-
#### KMMLU (0-shot, CoT)
|
341 |
-
|
342 |
-
| supercategory | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
343 |
-
|:----------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
344 |
-
| Applied Science | 45.15 | 31.68 | 37.03 | 61.52 | 49.29 | 55.98 | 38.47 |
|
345 |
-
| HUMSS | 49.75 | 26.47 | 37.29 | 69.45 | 56.59 | 63 | 40.9 |
|
346 |
-
| Other | 47.24 | 31.01 | 39.15 | 63.79 | 52.35 | 57.53 | 40.19 |
|
347 |
-
| STEM | 49.08 | 31.9 | 40.42 | 65.16 | 54.74 | 60.84 | 42.24 |
|
348 |
-
| **Overall** | 47.43 | 30.82 | 38.54 | 64.26 | 52.63 | 58.75 | 40.3 |
|
349 |
-
|
350 |
-
#### KMMLU (5-shot)
|
351 |
-
|
352 |
-
| supercategory | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
353 |
-
|:----------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
354 |
-
| Applied Science | 45.9 | 29.98 | 19.24 | 61.47 | 48.66 | 56.85 | 40.22 |
|
355 |
-
| HUMSS | 49.18 | 27.27 | 22.5 | 68.79 | 55.95 | 63.68 | 43.35 |
|
356 |
-
| Other | 48.43 | 30.76 | 20.95 | 64.21 | 51.1 | 57.85 | 41.92 |
|
357 |
-
| STEM | 49.21 | 30.73 | 19.55 | 65.28 | 53.29 | 61.08 | 44.43 |
|
358 |
-
| **Overall** | 47.92 | 29.98 | 20.21 | 64.28 | 51.62 | 59.29 | 42.28 |
|
359 |
-
|
360 |
-
#### KMMLU-HARD (0-shot, CoT)
|
361 |
-
|
362 |
-
| supercategory | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024)| Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
363 |
-
|:----------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
364 |
-
| Applied Science | 25.83 | 26.17 | 26.25 | 37.12 | 22.25 | 29.17 | 21.07 |
|
365 |
-
| HUMSS | 21.52 | 24.38 | 20.21 | 41.97 | 23.31 | 31.51 | 19.44 |
|
366 |
-
| Other | 24.82 | 24.82 | 23.88 | 40.39 | 26.48 | 29.59 | 22.22 |
|
367 |
-
| STEM | 28.18 | 26.91 | 24.64 | 39.82 | 26.36 | 32.18 | 20.91 |
|
368 |
-
| **Overall** | 25.34 | 25.68 | 24.03 | 39.62 | 24.56 | 30.56 | 20.97 |
|
369 |
-
|
370 |
-
#### KMMLU-HARD (5-shot)
|
371 |
-
|
372 |
-
| supercategory | Phi-3.5-MoE-Instruct | Phi-3.0-Mini-128k-Instruct (June2024) | Llama-3.1-8B-Instruct | GPT-4o | GPT-4o-mini | GPT-4-turbo | GPT-3.5-turbo |
|
373 |
-
|:----------------|-----------------------:|--------------------------------:|------------------------:|---------:|--------------:|--------------:|----------------:|
|
374 |
-
| Applied Science | 21 | 29 | 12 | 31 | 21 | 25 | 20 |
|
375 |
-
| HUMSS | 22.88 | 19.92 | 14 | 43.98 | 23.47 | 33.53 | 19.53 |
|
376 |
-
| Other | 25.13 | 27.27 | 12.83 | 39.84 | 28.34 | 29.68 | 23.22 |
|
377 |
-
| STEM | 21.75 | 25.25 | 12.75 | 40.25 | 23.25 | 27.25 | 19.75 |
|
378 |
-
| **Overall** | 25.66 | 25.73 | 15.81 | 40.94 | 24.63 | 31.12 | 21.19 |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
license_link: https://huggingface.co/phymbert/Phi-3.5-MoE-instruct-GGUF/resolve/main/LICENSE
|
4 |
language:
|
5 |
- multilingual
|
6 |
pipeline_tag: text-generation
|
|
|
11 |
- messages:
|
12 |
- role: user
|
13 |
content: Can you provide ways to eat combinations of bananas and dragonfruits?
|
14 |
+
library_name: llama.cpp
|
15 |
---
|
16 |
|
17 |
## Model Summary
|
|
|
50 |
## Usage
|
51 |
|
52 |
### Requirements
|
53 |
+
Phi-3.5-MoE-instruct is integrated in the official version of llama.cpp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
### Tokenizer
|
56 |
|
|
|
70 |
### Loading the model locally
|
71 |
After obtaining the Phi-3.5-MoE-instruct model checkpoints, users can use this sample code for inference.
|
72 |
|
73 |
+
```shell
|
74 |
+
llama-cli --model phi-3.5-moe-instruct-q3_k_s.gguf -p "I believe the meaning of life is"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
```
|
76 |
|
77 |
## Benchmarks
|
|
|
217 |
and risk areas that account for cultural nuances where those languages are spoken.
|
218 |
|
219 |
## Software
|
220 |
+
* [LlamaCPP](https://github.com/ggerganov/llama.cpp)
|
221 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
222 |
|
223 |
## License
|
224 |
The model is licensed under the [MIT license](./LICENSE).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|