DirectML INT4 and INT8 AWQ model versions
Hello Phi Team
Like in the subject, any chances for Phi 3.5 ONNX DirectML INT4 and INT8 AWQ model versions?
Cheers
Peter
+1. Can the team upload the dml versions of this model as well?
The model card says "The ONNX models are tested on: GPU SKU: RTX 4090 (DirectML)". But the onnx model optimized for directml is missing from the repo.
Can we expect the dml version to be uploaded soon?
Hi Phi Team,
Any chances for Phi 3.5 ONNX DirectML INT4 and INT8 AWQ model versions? It will be very helpful if it could be released soon.
Thanks.
B.R
With the newly uploaded INT4 AWQ models, there is now one optimized ONNX model for CPU and one optimized ONNX model for GPU (e.g. CUDA, DirectML). Here is a tutorial you can follow to create your own INT4 AWQ ONNX models.
For INT8 precision, you can create the FP32 ONNX model using ONNX Runtime GenAI's model builder and then use ONNX Runtime's INT8 quantization tools.