EmbeddedLLM
/

mistral-7b-instruct-v0.3-onnx

Text Generation

Model card Files Files and versions Community

pstan commited on Jun 17, 2024

Commit

8225e59

·

verified ·

1 Parent(s): cf6d9ca

Update README.md

Files changed (1) hide show

README.md +11 -12

README.md CHANGED Viewed

@@ -19,13 +19,11 @@ inference: false
 This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
 DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
-## Model Description
-- **Developed by:** Mistral AI
-- **Model type:** ONNX
-- **Language(s) (NLP):** Python, C, C++
-- **License:** Apache License Version 2.0
-- **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.
 ## Usage
@@ -85,10 +83,11 @@ python phi3-qa.py -m .\mistral-7b-instruct-v0.3
 - **Tested Configurations:**
 - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
 - **CPU:** AMD Ryzen CPU
-## Optimized Configurations
-The following optimized configurations are available:
-1. **ONNX model for int4 DML:** Optimized for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4.
-2. **ONNX model for int4 CPU:** Optimized for CPU, using int4 quantization.

 This model is an ONNX-optimized version of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
 DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
+## ONNX Models
+Here are some of the optimized configurations we have added:
+- **ONNX model for int4 DML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
+- **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
 ## Usage
 - **Tested Configurations:**
 - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
 - **CPU:** AMD Ryzen CPU
+## Model Description
+- **Developed by:** Mistral AI
+- **Model type:** ONNX
+- **Language(s) (NLP):** Python, C, C++
+- **License:** Apache License Version 2.0
+- **Model Description:** This model is a conversion of the Mistral-7B-Instruct-v0.3 for ONNX Runtime inference, optimized for CPU and DirectML.