language:
- en
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation
Model Card for Model ID
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
Model Description
meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
https://onnxruntime.ai/docs/genai/howto/install.html#directml
Created using ONNX Runtime GenAI's builder.py
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
INT4 accuracy level: FP32 (float32)
8-bit quantization for MoE layers
- Developed by: Mochamad Aris Zamroni
- Model type: [More Information Needed]
- Language(s) (NLP): [More Information Needed]
- License: [More Information Needed]
- Finetuned from model [optional]: [More Information Needed]
Model Sources [optional]
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
This is Windows DirectML optimized model.
Prerequisites:
Install Python 3.10 from Windows Store:
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=USOpen command line cmd.exe
Create python virtual environment and install onnxruntime-genai-directml
mkdir c:\temp
cd c:\temp
python -m venv dmlgenai
dmlgenai\Scripts\activate.bat
pip install onnxruntime-genai-directml
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Preprocessing [optional]
[More Information Needed]
Speeds, Sizes, Times [optional]
15 token/s in Radeon 780M with 8GB dedicated RAM
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
Microsoft Windows DirectML
Hardware
AMD Ryzen 7840U with integrated Radeon 780M GPU RAM 32GB shared VRAM 8GB
Software
Microsoft Windows DirectML
Model Card Authors [optional]
Mochamad Aris Zamroni