K quants should not contain IQ4_NL types inside

#4
by concedo - opened

That breaks support for backends that don't support I-Quants

https://github.com/LostRuins/koboldcpp/discussions/976

Oh I don't know why I didn't get a notification from your GitHub ping...

I think it has something to do with the shape of the tensor and not being divisible by 256

Same thing on other people's quants too:

https://huggingface.co/mradermacher/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main?show_file_info=DeepSeek-Coder-V2-Lite-Instruct.Q2_K.gguf

Here's a comment from Slaren explaining when a similar thing happened with Qwen2 and my p100:

https://github.com/ggerganov/llama.cpp/issues/7805#issuecomment-2166507695

Hmm okay.

I wonder if it would be prudent to label such quants as non-K quants.

Have an I-Quanted tensor means it breaks all other backends that don't support it, while people assume it's a regular k quant and wonder why Q3_K_M works but not Q3_K_S

Never mind, ggerganov has provided an upcoming fix https://github.com/ggerganov/llama.cpp/pull/8489

A simple re-quant after this would solve the issue.

yeah it seems an odd choice to fallback to something that can be unsupported lol

glad a fix is coming..

Sign up or log in to comment