@bartowski on Hugging Face: "Old mixtral model quants may be broken! Recently Slaren over on llama.cpp…"

bartowski

posted an update Dec 2, 2024

Post

16323

Old mixtral model quants may be broken!

Recently Slaren over on llama.cpp refactored the model loader - in a way that's super awesome and very powerful - but with it came breaking of support for "split tensor MoE models", which applies to older mixtral models

You may have seen my upload of one such older mixtral model, ondurbin/bagel-dpo-8x7b-v0.2, and with the newest changes it seems to be able to run without issue

If you happen to run into issues with any other old mixtral models, drop a link here and I'll try to remake them with the new changes so that we can continue enjoying them :)

concedo

Dec 20, 2024

Btw for anyone late to the game - Old Mixtral quants should still work as expected in KoboldCpp. The new ones should work as well, so you have multiple options.

Y-A-R-K

Jan 2

actually, it wasn't just Mixtral. Something got broken in the older Llama and Alpaca encoders as well.

am a big fan of Fimbulvetr 10.7B v1.0, and when offloading, my speeds went from 10-13 T/s to a whopping 3 T/s.

It seems to have possibly been fixed in newer versions (am using KCPP as my backened, and they haven't done a build using the newest LCPP code, but BackyardAI has, and I'm now getting 10 T/s).

Sadly, SAO10K took down the un-quanted Fimbul 1.0 repo, so I'm not sure anyone would be able to re-quant and test if that does the trick.

Join the conversation