convert llava-v1.5-7b to liuhaotian/llava-v1.5-7b-hf format
Thank you for your outstanding work. I recently fine-tuned the Llava model based on the liuhaotian/llava-v1.5-7b model. Now, I want to adapt the Llava model using the VLLM framework to improve inference speed. I found that VLLM uses files in the format of llava-v1.5-7b-hf. I want to know how to convert my fine-tuned Llava-v1.5-7b model to the llava-v1.5-7b-hf format. Because if I directly load the Llava-v1.5-7b model using VLLM, I will get an error saying "Model architectures ['LlavaLlamaForCausalLM'] are not supported for now". So I must do the conversion. I want to know how the llava-v1.5-7b-hf format is obtained.
Hi,
We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.
However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.
Hi,
We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.
However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.
Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.
Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb
Hi,
We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.
However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.
Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.
Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.
Hi,
We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.
However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.
Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.
Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.
Following the instructions provided by nielsr's link is correct. The steps outlined there are very detailed.
Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb
The operation link you provided is correct, thank you. Also, do you happen to know how the llava-next project is fine-tuned? Because the official documentation does not provide specific fine-tuning code(https://github.com/LLaVA-VL/LLaVA-NeXT/).
LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.
I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook
LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.
I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook
Great, thank you for your work. However, in fact, I am more interested in the model fine-tuning process for llava-next-video. Do you have any suggestions? Or could you create a similar Jupyter notebook for fine-tuning?
We haven't added LLaVa-NeXT-Video to transformers yet
From Video-LLMs there is Video-LLaVa, I am working on adding a fine-tune script for it. Will let you know here when it's ready
@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook
@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook
Thank you for your enthusiastic help. If possible, I would also appreciate it if you could create a fine-tuning code for llava-next-video.
It is not yet added to transformers. We are planning to work on adding and creating notebooks for Llava-Next-Video next month
Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb
May I ask if there are any plans for transformers to support Llava-Next-Video?
As per the last conversation with the authors, they want to release a better version before adding it in transformers. You can track the issue here
Hi,
We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.
However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.
After conversion I found that the output logits are different. What might be the problem?
It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.
The original implementation applies padding to the images which is not present in the Transformers library
Yes, just confirmed that this is true - people who also face this problem should check this out.
It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.
The original implementation applies padding to the images which is not present in the Transformers library
Could you open an issue on the Transformers library? I had an implementation which 100% matches it, we could update the image processor.
Done, please see https://github.com/huggingface/transformers/issues/33175.
can you provide unconverted Llama Part Weights which is used for qwen-interleave-0.5B for single image or multiple image
https://huggingface.co/llava-hf/llava-1.5-7b-hf/discussions/26#66436cdfbf8f506d97a36a41
@ha1772007 you mean the original weights? They can be found here (https://huggingface.co/collections/lmms-lab/llava-next-interleave-66763c55c411b340b35873d1)
Done, please see https://github.com/huggingface/transformers/issues/33175.
@JackBAI
What should I do to ensure the same image preprocessing setting as LLaVA with Transformers library? I see that they seem to have added a do_pad
parameter to control how the image is processed, but I can't find the corresponding code in the main branch of the Transformers library.