English
isaacmac commited on
Commit
7b2e446
·
verified ·
1 Parent(s): 30d209a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -19
README.md CHANGED
@@ -9,7 +9,7 @@ language:
9
 
10
  ## Model Details
11
 
12
- This model is an int4 model with group_size 128 of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) generated by [intel/auto-round](https://github.com/intel/auto-round).
13
  Inference of this model is compatible with AutoGPTQ's Kernel.
14
 
15
 
@@ -18,16 +18,14 @@ Inference of this model is compatible with AutoGPTQ's Kernel.
18
 
19
 
20
 
21
- ### Reproduce the model
22
 
23
  Here is the sample command to reproduce the model
24
 
25
  ```bash
26
- git clone https://github.com/intel/auto-round
27
- cd auto-round/examples/language-modeling
28
- pip install -r requirements.txt
29
- python3 main.py \
30
- --model_name microsoft/Phi-3-mini-128k-instruct \
31
  --device 0 \
32
  --group_size 128 \
33
  --bits 4 \
@@ -35,9 +33,9 @@ python3 main.py \
35
  --nsamples 512 \
36
  --seqlen 4096 \
37
  --minmax_lr 0.01 \
38
- --deployment_device 'gpu' \
39
  --gradient_accumulate_steps 2 \
40
- --train_bs 4 \
41
  --output_dir "./tmp_autoround" \
42
 
43
  ```
@@ -46,15 +44,59 @@ python3 main.py \
46
 
47
 
48
 
49
- ### Evaluate the model
50
 
51
- Install [lm-eval-harness 0.4.2](https://github.com/EleutherAI/lm-evaluation-harness.git) from source.
 
52
 
53
  ```bash
54
- lm_eval --model hf --model_args pretrained="Intel/Phi-3-mini-128k-instruct-int4-inc",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu --batch_size 32
 
 
55
  ```
56
 
57
- | Metric | FP16 | INT4 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  | -------------- | ------ | ------ |
59
  | Avg. | 0.6364 | 0.6300 |
60
  | mmlu | 0.6215 | 0.6237 |
@@ -68,11 +110,6 @@ lm_eval --model hf --model_args pretrained="Intel/Phi-3-mini-128k-instruct-int4-
68
  | arc_easy | 0.8119 | 0.8199 |
69
  | arc_challenge | 0.5418 | 0.5350 |
70
 
71
-
72
-
73
-
74
-
75
-
76
  ## Caveats and Recommendations
77
 
78
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
@@ -80,7 +117,6 @@ Users (both direct and downstream) should be made aware of the risks, biases and
80
  Here are a couple of useful links to learn more about Intel's AI software:
81
 
82
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
83
- * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
84
 
85
 
86
 
 
9
 
10
  ## Model Details
11
 
12
+ This model is an int4 model recipe with group_size 128 of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) generated by [intel/auto-round](https://github.com/intel/auto-round).
13
  Inference of this model is compatible with AutoGPTQ's Kernel.
14
 
15
 
 
18
 
19
 
20
 
21
+ ### Quantize the model
22
 
23
  Here is the sample command to reproduce the model
24
 
25
  ```bash
26
+ pip install auto-round
27
+ auto-round
28
+ --model microsoft/Phi-3-mini-128k-instruct \
 
 
29
  --device 0 \
30
  --group_size 128 \
31
  --bits 4 \
 
33
  --nsamples 512 \
34
  --seqlen 4096 \
35
  --minmax_lr 0.01 \
36
+ --format 'auto_gptq' \
37
  --gradient_accumulate_steps 2 \
38
+ --batch_size 4 \
39
  --output_dir "./tmp_autoround" \
40
 
41
  ```
 
44
 
45
 
46
 
47
+ ## How to use
48
 
49
+ ### INT4 Inference with IPEX on Intel CPU
50
+ Install the latest [Intel Extension for Pytorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel Neural Compressor](https://github.com/intel/neural-compressor)
51
 
52
  ```bash
53
+ pip install torch --index-url https://download.pytorch.org/whl/cpu
54
+ pip install intel_extension_for_pytorch
55
+ pip install neural_compressor_pt
56
  ```
57
 
58
+ ```python
59
+ from transformers import AutoTokenizer
60
+ from neural_compressor.transformers import AutoModelForCausalLM
61
+
62
+ ## note: use quantized model directory name below
63
+ model_name_or_path="./tmp_autoround/<model directory name>"
64
+ q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
65
+
66
+ prompt = "Once upon a time, a little girl"
67
+
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
69
+ print(tokenizer.decode(q_model.generate(**tokenizer(prompt, return_tensors="pt").to(q_model.device),max_new_tokens=50)[0]))
70
+ ##Once upon a time, a little girl named Lily was playing in her backyard. She loved to explore and discover new things. One day, she stumbled upon a beautiful garden filled with colorful flowers andugh the garden, she noticed a
71
+ ```
72
+
73
+ ### INT4 Inference on Intel Gaudi Accelerator
74
+ docker image with Gaudi Software Stack is recommended. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/).
75
+
76
+ ```python
77
+ import habana_frameworks.torch.core as htcore
78
+ from neural_compressor.torch.quantization import load
79
+ from transformers import AutoTokenizer, AutoModelForCausalLM
80
+
81
+ ## note: use quantized model directory name below
82
+ model_name_or_path="./tmp_autoround/<model directory name>"
83
+
84
+ model = load(
85
+ model_name_or_path=model_name_or_path,
86
+ format="huggingface",
87
+ device="hpu"
88
+ )
89
+
90
+ prompt = "Once upon a time, a little girl"
91
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
92
+ print(tokenizer.decode(model.generate(**tokenizer(prompt, return_tensors="pt").to("hpu"),max_new_tokens=50)[0]))
93
+
94
+ ```
95
+
96
+ ## Accuracy Result
97
+
98
+
99
+ | Metric <img width=200> | FP16 <img width=200> | INT4 <img width=200> |
100
  | -------------- | ------ | ------ |
101
  | Avg. | 0.6364 | 0.6300 |
102
  | mmlu | 0.6215 | 0.6237 |
 
110
  | arc_easy | 0.8119 | 0.8199 |
111
  | arc_challenge | 0.5418 | 0.5350 |
112
 
 
 
 
 
 
113
  ## Caveats and Recommendations
114
 
115
  Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
 
117
  Here are a couple of useful links to learn more about Intel's AI software:
118
 
119
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 
120
 
121
 
122