Regarding model fine-tuning
#23
by
mylsz
- opened
Probably the forward method might be more convenient https://huggingface.co/jinaai/jina-clip-implementation/blob/main/modeling_clip.py#L637. You can of course fine-tune the model, I suggest you take a look at our technical report to understand our training recipe https://arxiv.org/abs/2412.08802
Basically if you care about text retrieval performance, you need to maintain the performance by adding text pairs (or even text triplets) alongside image-caption pairs. Otherwise simple CLIP like fine-tuning would be enough
@gmastrapas thanks for your advise. Do you have the training code public?