mair-lab
/

vismin-idefics2-8b

Safetensors

Model card Files Files and versions Community

rabiulawal commited on Aug 9, 2024

Commit

6098bc6

verified ·

1 Parent(s): c76fdec

Added model usage code snipptes

Browse files

Files changed (1) hide show

README.md +54 -3

README.md CHANGED Viewed

@@ -1,3 +1,54 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+---
+**Model Details**
+The VisMin-Idefics2 model was developed as a fine-tuned version of the Idefics2 model, leveraging the VisMin dataset for enhanced performance in multimodal tasks. This model excels in visual-text alignment and is designed to handle tasks where models must differentiate between similar images based on textual descriptions. By employing the QLoRa technique and focusing on a rule-based selection of image-text pairs, the VisMin-Idefics2 model is optimized for fine-grained understanding and improved generalization across various multimodal benchmarks.
+**Model Summary**
+- Model Date: July 2024
+- Model type: Multi-modal model (image+text)
+- Parent Models: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+**Usage**
+This section shows snippets of code for generation for fine-tuned idefics2-8b. The codes only differ by the input formatting. Let's first define some common imports and inputs.
+```python
+from transformers import AutoProcessor, AutoModelForVision2Seq
+model_name_or_path = "path/to/fine-tuned-model"
+if "A100" in gpu_name or "H100" in gpu_name:
+     attn_implementation = "flash_attention_2"
+else:
+     attn_implementation = None
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", do_image_splitting=False)
+model = AutoModelForVision2Seq.from_pretrained(
+    model_name_or_path,
+    low_cpu_mem_usage=True,
+    device_map="auto",
+    torch_dtype=torch.float16,
+    _attn_implementation=attn_implementation,  # only A100, H100 GPUs
+    quantization_config=quantization_config
+    if model_name_or_path in ["HuggingFaceM4/idefics2-8b", "HuggingFaceM4/idefics2-8b-base"]
+    else None,
+)
+```
+**Bibtex**
+```
+ @article{vismin2024,
+    title={VisMin: Visual Minimal-Change Understanding},
+    author={Awal, Rabiul and Ahmadi, Saba and Zhang, Le and Agrawal, Aishwarya},
+    year={2024}
+}
+```