Fine-tuning roadmap

#18
by RonanMcGovern - opened

What fine-tuning library is likely to first be able to support deepseek v3?

Transformers did not have v2 integrated.

MoE might take work, latent attention, and MTP. Then also supporting fp8 as a base model on which to train Loras…

Thanks, and thanks for the model.

I also have the same question, how to fine-tune DeepSeek-V3 ? Could a guide be provided?

+1 for finetuning script

This comment has been hidden

Sign up or log in to comment