zamroni111
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -6,10 +6,6 @@ pipeline_tag: text-generation
|
|
6 |
---
|
7 |
# Model Card for Model ID
|
8 |
|
9 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
-
|
11 |
-
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
|
12 |
-
|
13 |
## Model Details
|
14 |
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
|
15 |
|
@@ -24,24 +20,14 @@ INT4 accuracy level: FP32 (float32)<br>
|
|
24 |
8-bit quantization for MoE layers
|
25 |
|
26 |
- **Developed by:** Mochamad Aris Zamroni
|
27 |
-
- **Model type:** [More Information Needed]
|
28 |
-
- **Language(s) (NLP):** [More Information Needed]
|
29 |
-
- **License:** [More Information Needed]
|
30 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
31 |
|
32 |
### Model Sources [optional]
|
33 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
|
34 |
|
35 |
-
- **Repository:** [More Information Needed]
|
36 |
-
- **Paper [optional]:** [More Information Needed]
|
37 |
-
- **Demo [optional]:** [More Information Needed]
|
38 |
-
|
39 |
-
## Uses
|
40 |
-
|
41 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
42 |
-
|
43 |
### Direct Use
|
44 |
-
This is Windows DirectML optimized model
|
|
|
|
|
45 |
|
46 |
Prerequisites:<br>
|
47 |
1. Install Python 3.10 from Windows Store:<br>
|
@@ -49,71 +35,41 @@ https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US
|
|
49 |
|
50 |
2. Open command line cmd.exe
|
51 |
|
52 |
-
3. Create python virtual environment
|
53 |
mkdir c:\temp<br>
|
54 |
cd c:\temp<br>
|
55 |
python -m venv dmlgenai<br>
|
56 |
dmlgenai\Scripts\activate.bat<br>
|
57 |
pip install onnxruntime-genai-directml
|
58 |
|
|
|
|
|
|
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
Use the code below to get started with the model.
|
63 |
-
|
64 |
-
[More Information Needed]
|
65 |
-
|
66 |
-
#### Preprocessing [optional]
|
67 |
-
|
68 |
-
[More Information Needed]
|
69 |
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
#### Speeds, Sizes, Times [optional]
|
72 |
-
15 token/s in Radeon 780M with 8GB dedicated RAM
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
77 |
-
|
78 |
-
[More Information Needed]
|
79 |
-
|
80 |
-
### Results
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
#### Summary
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
## Model Examination [optional]
|
89 |
-
|
90 |
-
<!-- Relevant interpretability work for the model goes here -->
|
91 |
-
|
92 |
-
[More Information Needed]
|
93 |
-
|
94 |
-
## Technical Specifications [optional]
|
95 |
-
|
96 |
-
### Model Architecture and Objective
|
97 |
-
|
98 |
-
[More Information Needed]
|
99 |
-
|
100 |
-
### Compute Infrastructure
|
101 |
-
|
102 |
-
Microsoft Windows DirectML
|
103 |
|
104 |
#### Hardware
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
shared VRAM 8GB
|
109 |
|
110 |
#### Software
|
111 |
-
|
112 |
-
Microsoft Windows DirectML
|
113 |
|
114 |
## Model Card Authors [optional]
|
115 |
Mochamad Aris Zamroni
|
116 |
|
117 |
## Model Card Contact
|
118 |
-
|
119 |
https://www.linkedin.com/in/zamroni/
|
|
|
6 |
---
|
7 |
# Model Card for Model ID
|
8 |
|
|
|
|
|
|
|
|
|
9 |
## Model Details
|
10 |
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
|
11 |
|
|
|
20 |
8-bit quantization for MoE layers
|
21 |
|
22 |
- **Developed by:** Mochamad Aris Zamroni
|
|
|
|
|
|
|
|
|
23 |
|
24 |
### Model Sources [optional]
|
25 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
### Direct Use
|
28 |
+
This is Microsoft Windows DirectML optimized model.<br>
|
29 |
+
It might not be working in ONNX execution provider other than DmlExecutionProvider.<br>
|
30 |
+
The needed python scripts are included in this repository
|
31 |
|
32 |
Prerequisites:<br>
|
33 |
1. Install Python 3.10 from Windows Store:<br>
|
|
|
35 |
|
36 |
2. Open command line cmd.exe
|
37 |
|
38 |
+
3. Create python virtual environment, activate the environment then install onnxruntime-genai-directml<br>
|
39 |
mkdir c:\temp<br>
|
40 |
cd c:\temp<br>
|
41 |
python -m venv dmlgenai<br>
|
42 |
dmlgenai\Scripts\activate.bat<br>
|
43 |
pip install onnxruntime-genai-directml
|
44 |
|
45 |
+
5. Use the onnxgenairun.py to get chat interface.<br>
|
46 |
+
It is modified version of "https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py".<br>
|
47 |
+
The modification makes the text output changes to new line after ".", ":", ";" to make the output easier to be read.
|
48 |
|
49 |
+
python onnxgenairun.py --help<br>
|
50 |
+
python onnxgenairun.py -m . -v -g
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
+
5. (Optional) Device specific optimization.<br>
|
53 |
+
a. Open "dml-device-specific-optim.py" with text editor and change the file path accordingly.<br>
|
54 |
+
b. Run the python script: python dml-device-specific-optim.py<br>
|
55 |
+
c. Rename the original model.onnx to other file name and put and rename the optimized onnx file from step 5.b to model.onnx file.<br>
|
56 |
+
d. Rerun step 4.
|
57 |
|
58 |
#### Speeds, Sizes, Times [optional]
|
59 |
+
15 token/s in Radeon 780M with 8GB dedicated RAM.<br>
|
60 |
+
Increase to 16token/s with device specific optimized model.onnx.<br>
|
61 |
+
As comparison, LM Studio using GGUF INT4 model and VulkanML GPU acceleration runs at 13 token/s.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
#### Hardware
|
64 |
+
AMD Ryzen 7840U with integrated Radeon 780M GPU<br>
|
65 |
+
RAM 32GB<br>
|
66 |
+
8GB pre-allocated iGPU VRAM
|
|
|
67 |
|
68 |
#### Software
|
69 |
+
Microsoft DirectML on Windows 10
|
|
|
70 |
|
71 |
## Model Card Authors [optional]
|
72 |
Mochamad Aris Zamroni
|
73 |
|
74 |
## Model Card Contact
|
|
|
75 |
https://www.linkedin.com/in/zamroni/
|