Update README.md
Browse files
README.md
CHANGED
@@ -19,13 +19,11 @@ inference: false
|
|
19 |
This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
|
20 |
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
|
21 |
|
22 |
-
##
|
23 |
|
24 |
-
|
25 |
-
- **
|
26 |
-
- **
|
27 |
-
- **License:** Apache License Version 2.0
|
28 |
-
- **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.
|
29 |
|
30 |
## Usage
|
31 |
|
@@ -85,10 +83,11 @@ python phi3-qa.py -m .\mistral-7b-instruct-v0.3
|
|
85 |
- **Tested Configurations:**
|
86 |
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
|
87 |
- **CPU:** AMD Ryzen CPU
|
|
|
|
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
2. **ONNX model for int4 CPU:** Optimized for CPU, using int4 quantization.
|
|
|
19 |
This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
|
20 |
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
|
21 |
|
22 |
+
## ONNX Models
|
23 |
|
24 |
+
Here are some of the optimized configurations we have added:
|
25 |
+
- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
|
26 |
+
- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
|
|
|
|
|
27 |
|
28 |
## Usage
|
29 |
|
|
|
83 |
- **Tested Configurations:**
|
84 |
- **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
|
85 |
- **CPU:** AMD Ryzen CPU
|
86 |
+
|
87 |
+
## Model Description
|
88 |
|
89 |
+
- **Developed by:** Mistral AI
|
90 |
+
- **Model type:** ONNX
|
91 |
+
- **Language(s) (NLP):** Python, C, C++
|
92 |
+
- **License:** Apache License Version 2.0
|
93 |
+
- **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.
|
|