Base model request
Hi!
Could you please do this for the base model?
or say specifically which convert_checkpoint.py you used, if you don't feel it's worth it to do it yourself, so I could (try to) do it - i'm having trouble replicating this
Thanks!
The team just released a fix for the some of the bugs I saw when converting to bf16 and back: (https://github.com/deepseek-ai/DeepSeek-V3/commit/8f1c9488b53068992f9525fab03b1868e6f7c8c1). I also left out the 3 other tp4 rank files for the int4 inference pipeline. With this fix and my fp8 module patch I'll redo the upload and add the base model.
*yea I had to modify quite a few things to get this to work (nothing worked out of the box, some modules, commit above related, were left in fp8 format) I'd imagine as we get further from initial release things will be cleaned up.
thanks! i've been hoping to run the base model and don't quite have enough to run it, but would be able to for probably 6bit and lower
and it seems like no one else is quantizing it! i dunno why, it looks interesting
any updates / please share which convert_checkpoint.py you used? / any missing steps?
thanks again!