DirectML INT4 and INT8 AWQ model versions

by quicksln - opened Oct 1, 2024

Discussion

quicksln

Oct 1, 2024

Hello Phi Team

Like in the subject, any chances for Phi 3.5 ONNX DirectML INT4 and INT8 AWQ model versions?

Cheers
Peter

ProfLinh

Oct 24, 2024

+1. Can the team upload the dml versions of this model as well?

jojo1899

Nov 6, 2024

•

edited Nov 6, 2024

The model card says "The ONNX models are tested on: GPU SKU: RTX 4090 (DirectML)". But the onnx model optimized for directml is missing from the repo.
Can we expect the dml version to be uploaded soon?

soulyet

Nov 8, 2024

Hi Phi Team,
Any chances for Phi 3.5 ONNX DirectML INT4 and INT8 AWQ model versions? It will be very helpful if it could be released soon.
Thanks.
B.R

kvaishnavi

Microsoft org 21 days ago

With the newly uploaded INT4 AWQ models, there is now one optimized ONNX model for CPU and one optimized ONNX model for GPU (e.g. CUDA, DirectML). Here is a tutorial you can follow to create your own INT4 AWQ ONNX models.

For INT8 precision, you can create the FP32 ONNX model using ONNX Runtime GenAI's model builder and then use ONNX Runtime's INT8 quantization tools.

kvaishnavi changed discussion status to closed 21 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment