mair-lab
/

vismin-idefics2-8b

Model card Files Files and versions Community

vismin-idefics2-8b / README.md

rabiulawal's picture

Added model usage code snipptes

6098bc6 verified 6 months ago

|

history blame contribute delete

2.14 kB

	---
	license: cc-by-4.0
	---


	Model Details

	The VisMin-Idefics2 model was developed as a fine-tuned version of the Idefics2 model, leveraging the VisMin dataset for enhanced performance in multimodal tasks. This model excels in visual-text alignment and is designed to handle tasks where models must differentiate between similar images based on textual descriptions. By employing the QLoRa technique and focusing on a rule-based selection of image-text pairs, the VisMin-Idefics2 model is optimized for fine-grained understanding and improved generalization across various multimodal benchmarks.

	Model Summary

	- Model Date: July 2024
	- Model type: Multi-modal model (image+text)
	- Parent Models: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

	Usage

	This section shows snippets of code for generation for fine-tuned idefics2-8b. The codes only differ by the input formatting. Let's first define some common imports and inputs.

	```python
	from transformers import AutoProcessor, AutoModelForVision2Seq

	model_name_or_path = "path/to/fine-tuned-model"
	if "A100" in gpu_name or "H100" in gpu_name:
	attn_implementation = "flash_attention_2"
	else:
	attn_implementation = None

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)
	processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", do_image_splitting=False)
	model = AutoModelForVision2Seq.from_pretrained(
	model_name_or_path,
	low_cpu_mem_usage=True,
	device_map="auto",
	torch_dtype=torch.float16,
	_attn_implementation=attn_implementation, # only A100, H100 GPUs
	quantization_config=quantization_config
	if model_name_or_path in ["HuggingFaceM4/idefics2-8b", "HuggingFaceM4/idefics2-8b-base"]
	else None,
	)
	```

	Bibtex
	```
	@article{vismin2024,
	title={VisMin: Visual Minimal-Change Understanding},
	author={Awal, Rabiul and Ahmadi, Saba and Zhang, Le and Agrawal, Aishwarya},
	year={2024}
	}
	```