rabiulawal commited on
Commit
6098bc6
·
verified ·
1 Parent(s): c76fdec

Added model usage code snipptes

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+
5
+
6
+ **Model Details**
7
+
8
+ The VisMin-Idefics2 model was developed as a fine-tuned version of the Idefics2 model, leveraging the VisMin dataset for enhanced performance in multimodal tasks. This model excels in visual-text alignment and is designed to handle tasks where models must differentiate between similar images based on textual descriptions. By employing the QLoRa technique and focusing on a rule-based selection of image-text pairs, the VisMin-Idefics2 model is optimized for fine-grained understanding and improved generalization across various multimodal benchmarks.
9
+
10
+ **Model Summary**
11
+
12
+ - Model Date: July 2024
13
+ - Model type: Multi-modal model (image+text)
14
+ - Parent Models: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
15
+
16
+ **Usage**
17
+
18
+ This section shows snippets of code for generation for fine-tuned idefics2-8b. The codes only differ by the input formatting. Let's first define some common imports and inputs.
19
+
20
+ ```python
21
+ from transformers import AutoProcessor, AutoModelForVision2Seq
22
+
23
+ model_name_or_path = "path/to/fine-tuned-model"
24
+ if "A100" in gpu_name or "H100" in gpu_name:
25
+ attn_implementation = "flash_attention_2"
26
+ else:
27
+ attn_implementation = None
28
+
29
+ quantization_config = BitsAndBytesConfig(
30
+ load_in_4bit=True,
31
+ bnb_4bit_quant_type="nf4",
32
+ bnb_4bit_compute_dtype=torch.float16,
33
+ )
34
+ processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b", do_image_splitting=False)
35
+ model = AutoModelForVision2Seq.from_pretrained(
36
+ model_name_or_path,
37
+ low_cpu_mem_usage=True,
38
+ device_map="auto",
39
+ torch_dtype=torch.float16,
40
+ _attn_implementation=attn_implementation, # only A100, H100 GPUs
41
+ quantization_config=quantization_config
42
+ if model_name_or_path in ["HuggingFaceM4/idefics2-8b", "HuggingFaceM4/idefics2-8b-base"]
43
+ else None,
44
+ )
45
+ ```
46
+
47
+ **Bibtex**
48
+ ```
49
+ @article{vismin2024,
50
+ title={VisMin: Visual Minimal-Change Understanding},
51
+ author={Awal, Rabiul and Ahmadi, Saba and Zhang, Le and Agrawal, Aishwarya},
52
+ year={2024}
53
+ }
54
+ ```