Regarding quantized .gguf models

#1
by Haleshot - opened

Wanted to ask which quantized version would give the best ideal outputs (similar to the original 7B TIR hosted on numina - safetensors). I tried the q6_k and it seems to perform in a decent manner.

Wanted to ask the best possible versions from the ones available:
f_16
q5_k
q6_k
q8_0
q8_p

I will be using the type suggested to further fine tune the model on various other datasets.

Haleshot changed discussion title from Regarding quantized .gguf models to Regarding quantized `.gguf` models
Haleshot changed discussion title from Regarding quantized `.gguf` models to Regarding quantized .gguf models

They are in this order:
f16
q8_0
q6_k
q5_k

but q6_k is not degraded as far as I can tell.
try them all and decide... there is obviously a trade-off with the size, but all these version keep the output and embed tensors to f16 ehich makes them way better than the normal quantizations.

They are in this order:
f16
q8_0
q6_k
q5_k

but q6_k is not degraded as far as I can tell.
try them all and decide... there is obviously a trade-off with the size, but all these version keep the output and embed tensors to f16 ehich makes them way better than the normal quantizations.

I tried the q6_k and it seems to work well (atleast for the initial prompt inferences). I had downloaded the q8_0 from another person - https://huggingface.co/reach-vb/NuminaMath-7B-TIR-Q8_0-GGUF and the outputs were completely off (continued hallucinations). Will try your q8_0; hopefully that is something I can work with. I think the best (and highest in terms of size) model I could get to work with open-webui was q8_0 of the other person so hopefully I can compare the difference between your q8_0 and q6_k and see which I can keep to continue my project further.
As always, thanks for the prompt reply!

Sign up or log in to comment