Regarding model fine-tuning

#23
by mylsz - opened

This is amazing work—thank you for your contribution and for making it open source!
Can you provide some suggestions on how to continue fine-tuning the jinaclip-v2 model with local data? After I comment out @torch .inference_mode(), can I directly load and fine-tune this model?

Probably the forward method might be more convenient https://huggingface.co/jinaai/jina-clip-implementation/blob/main/modeling_clip.py#L637. You can of course fine-tune the model, I suggest you take a look at our technical report to understand our training recipe https://arxiv.org/abs/2412.08802

Basically if you care about text retrieval performance, you need to maintain the performance by adding text pairs (or even text triplets) alongside image-caption pairs. Otherwise simple CLIP like fine-tuning would be enough

@gmastrapas thanks for your advise. Do you have the training code public?

Sign up or log in to comment