zamroni111
/

Meta-Llama-3.1-8B-Instruct-ONNX-DirectML-GenAI-INT4

Text Generation

ONNX

directml

windows

conversational

Model card Files Files and versions Community

zamroni111 commited on Sep 11, 2024

Commit

bc78057

verified ·

1 Parent(s): 555883c

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -40

README.md CHANGED Viewed

@@ -14,16 +14,13 @@ This modelcard aims to be a base template for new models. It has been generated
 meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
 ### Model Description
-meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
 https://onnxruntime.ai/docs/genai/howto/install.html#directml
-Created using ONNX Runtime GenAI's builder.py
 https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
-INT4 accuracy level: FP32 (float32)
 8-bit quantization for MoE layers
 - **Developed by:** Mochamad Aris Zamroni
@@ -33,7 +30,6 @@ INT4 accuracy level: FP32 (float32)
 - **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
 - **Repository:** [More Information Needed]
@@ -45,33 +41,22 @@ https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 This is Windows DirectML optimized model.
-Prerequisites:
-1. Install Python 3.10 from Windows Store:
 https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US
 2. Open command line cmd.exe
-3. Create python virtual environment:
-mkdir c:\temp
-cd c:\temp
-python -m venv dmlgenai
-dmlgenai\Scripts\activate.bat
 pip install onnxruntime-genai-directml
-[More Information Needed]
 ## How to Get Started with the Model
 Use the code below to get started with the model.
@@ -84,11 +69,8 @@ Use the code below to get started with the model.
 #### Speeds, Sizes, Times [optional]
 15 token/s in Radeon 780M with 8GB dedicated RAM
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
@@ -109,18 +91,6 @@ Use the code below to get started with the model.
 [More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective

 meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
 ### Model Description
+meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
 https://onnxruntime.ai/docs/genai/howto/install.html#directml
+Created using ONNX Runtime GenAI's builder.py<br>
 https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
+INT4 accuracy level: FP32 (float32)<br>
 8-bit quantization for MoE layers
 - **Developed by:** Mochamad Aris Zamroni
 - **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
 - **Repository:** [More Information Needed]
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 This is Windows DirectML optimized model.
+Prerequisites:<br>
+1. Install Python 3.10 from Windows Store:<br>
 https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US
 2. Open command line cmd.exe
+3. Create python virtual environment and install onnxruntime-genai-directml<br>
+mkdir c:\temp<br>
+cd c:\temp<br>
+python -m venv dmlgenai<br>
+dmlgenai\Scripts\activate.bat<br>
 pip install onnxruntime-genai-directml
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 #### Speeds, Sizes, Times [optional]
 15 token/s in Radeon 780M with 8GB dedicated RAM
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective