pstan commited on
Commit
8225e59
·
verified ·
1 Parent(s): cf6d9ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -12
README.md CHANGED
@@ -19,13 +19,11 @@ inference: false
19
  This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
20
  DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
21
 
22
- ## Model Description
23
 
24
- - **Developed by:** Mistral AI
25
- - **Model type:** ONNX
26
- - **Language(s) (NLP):** Python, C, C++
27
- - **License:** Apache License Version 2.0
28
- - **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.
29
 
30
  ## Usage
31
 
@@ -85,10 +83,11 @@ python phi3-qa.py -m .\mistral-7b-instruct-v0.3
85
  - **Tested Configurations:**
86
  - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
87
  - **CPU:** AMD Ryzen CPU
 
 
88
 
89
- ## Optimized Configurations
90
-
91
- The following optimized configurations are available:
92
-
93
- 1. **ONNX model for int4 DML:** Optimized for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4.
94
- 2. **ONNX model for int4 CPU:** Optimized for CPU, using int4 quantization.
 
19
  This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
20
  DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
21
 
22
+ ## ONNX Models
23
 
24
+ Here are some of the optimized configurations we have added:
25
+ - **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
26
+ - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
 
 
27
 
28
  ## Usage
29
 
 
83
  - **Tested Configurations:**
84
  - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
85
  - **CPU:** AMD Ryzen CPU
86
+
87
+ ## Model Description
88
 
89
+ - **Developed by:** Mistral AI
90
+ - **Model type:** ONNX
91
+ - **Language(s) (NLP):** Python, C, C++
92
+ - **License:** Apache License Version 2.0
93
+ - **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.