zamroni111's picture
Update README.md
bc78057 verified
|
raw
history blame
3.01 kB
metadata
language:
  - en
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization

Model Description

meta-llama/Meta-Llama-3.1-8B-Instruct quantized to ONNX GenAI INT4 with Microsoft DirectML optimization
https://onnxruntime.ai/docs/genai/howto/install.html#directml

Created using ONNX Runtime GenAI's builder.py
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py

INT4 accuracy level: FP32 (float32)
8-bit quantization for MoE layers

  • Developed by: Mochamad Aris Zamroni
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

This is Windows DirectML optimized model.

Prerequisites:

  1. Install Python 3.10 from Windows Store:
    https://apps.microsoft.com/detail/9pjpw5ldxlz5?hl=en-us&gl=US

  2. Open command line cmd.exe

  3. Create python virtual environment and install onnxruntime-genai-directml
    mkdir c:\temp
    cd c:\temp
    python -m venv dmlgenai
    dmlgenai\Scripts\activate.bat
    pip install onnxruntime-genai-directml

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Preprocessing [optional]

[More Information Needed]

Speeds, Sizes, Times [optional]

15 token/s in Radeon 780M with 8GB dedicated RAM

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

Microsoft Windows DirectML

Hardware

AMD Ryzen 7840U with integrated Radeon 780M GPU RAM 32GB shared VRAM 8GB

Software

Microsoft Windows DirectML

Model Card Authors [optional]

Mochamad Aris Zamroni

Model Card Contact

https://www.linkedin.com/in/zamroni/