metadata

language:
  - en
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization

meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
https://onnxruntime.ai/docs/genai/howto/install.html#directml

INT4 accuracy level: FP32 (float32)
8-bit quantization for MoE layers

This is Windows DirectML optimized model.

Prerequisites:

Install Python 3.10 from Windows Store:
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US
Open command line cmd.exe
Create python virtual environment and install onnxruntime-genai-directml
mkdir c:\temp
cd c:\temp
python -m venv dmlgenai
dmlgenai\Scripts\activate.bat
pip install onnxruntime-genai-directml

Use the code below to get started with the model.

[More Information Needed]

[More Information Needed]

15 token/s in Radeon 780M with 8GB dedicated RAM

[More Information Needed]

[More Information Needed]

[More Information Needed]

[More Information Needed]

Microsoft Windows DirectML

AMD Ryzen 7840U with integrated Radeon 780M GPU RAM 32GB shared VRAM 8GB

Microsoft Windows DirectML

Mochamad Aris Zamroni