Model Fails to run on featherless.ai
I dont know if it's the same as for your benchmarks, i suspect the model could be quantized and not specified. Anyone found out why
im so sorry for this late response! this was an earlier version of my model that is still currently being updated and i always make sure to release 4-bit medium and small gguf quants for each model release as these are the models i use personally!
Thanks for your response. Having an fp8 version sounds standard operation to me. If the model is quanted, it should be specified in the model name so it avoids any confusion. Could you provide an fp8 version? People on featherless want to try your model :D
sorry again for the late response! ive been very busy with work (i work a kitchen job as of right now lmao) yeah, would you like a GGUF version of the model? or a different quant?
also in the meantime, check out the model merge of this model with its base: https://huggingface.co/netcat420/MFANN-Phigments12-slerp
and here is the latest version of the 3b model: https://huggingface.co/netcat420/MFANN3bv0.21
q4_k_X quants: https://huggingface.co/netcat420/MFANN3bv0.21-GGUF
also here is the v0.6 gguf quant: https://huggingface.co/netcat420/MFANN3bv0.6-GGUF/tree/main which at the time i was only doing q4_k_m quants, but im about to add q8_0 quants as requested. although due to phi-2 no longer being supported by llama.cpp, i have to use my own custom fork to keep the 3b phi-2 models going, hence the longer time it takes to create the quants
ok the q8_0 and the q6_k quants are now live at this link: https://huggingface.co/netcat420/MFANN3bv0.6-GGUF/tree/main
and version 0.21 is having q8_0 and q6_k quants uploaded as we speak! im going to start doing q4_k_m, q4_k_s, q6_k and q8_0 in my future releases! and these new quants should be live on this link once uploaded: https://huggingface.co/netcat420/MFANN3bv0.21-GGUF