F16 versions of some models
First I wanted to thank you for your tireless work! It's much appreciated. I was hoping to convince you to make F16 versions at least of some of the main models (llama-3, mistral, mixtral, phi3) available. Any chance?
I currently make f16 quants when the Q8_0 quant is <= 10GB (as heuristic), so phi3 models should have one.
As fore trhe larger ones, I try to find a balance between blasting huggingface with endless extra terabytes nobody is using and providing actually useful quants. I.e. I am a bit reluctant to just indisciminately uploading big quants. The same can be said for the source/unquantized ggufs.
Consider your comment as noted - I am not set in stone, but at the moment, I only do f16's for 10B and smaller. I can add quants on request, but the problem then is that I have to download the source model again, which seems, again, wasteful of huggingfaces resources as they probably pay for that.
Sigh.