--- language: - en base_model: meta-llama/Meta-Llama-3.1-8B-Instruct pipeline_tag: text-generation --- # Model Card for Model ID This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). ## Model Details meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization ### Model Description meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
https://onnxruntime.ai/docs/genai/howto/install.html#directml Created using ONNX Runtime GenAI's builder.py
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py INT4 accuracy level: FP32 (float32)
8-bit quantization for MoE layers - **Developed by:** Mochamad Aris Zamroni - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use This is Windows DirectML optimized model. Prerequisites:
1. Install Python 3.10 from Windows Store:
https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US 2. Open command line cmd.exe 3. Create python virtual environment and install onnxruntime-genai-directml
mkdir c:\temp
cd c:\temp
python -m venv dmlgenai
dmlgenai\Scripts\activate.bat
pip install onnxruntime-genai-directml ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] #### Preprocessing [optional] [More Information Needed] #### Speeds, Sizes, Times [optional] 15 token/s in Radeon 780M with 8GB dedicated RAM #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure Microsoft Windows DirectML #### Hardware AMD Ryzen 7840U with integrated Radeon 780M GPU RAM 32GB shared VRAM 8GB #### Software Microsoft Windows DirectML ## Model Card Authors [optional] Mochamad Aris Zamroni ## Model Card Contact https://www.linkedin.com/in/zamroni/