TryingHard
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -31,8 +31,11 @@ Built upon Ovis1.5, **Ovis1.6** further enhances high-resolution image processin
|
|
31 |
| Ovis MLLMs | ViT | LLM | Model Weights | Demo |
|
32 |
|:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
|
33 |
| Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
|
|
|
|
|
|
|
34 |
|
35 |
-
## Quantized Model
|
36 |
We quantized Ovis1.6 with AutoGPTQ. Follow these steps to run it.
|
37 |
|
38 |
### Installation
|
@@ -45,29 +48,28 @@ pip install numpy==1.24.3 transformers==4.44.2 pillow==10.3.0 gekko pandas
|
|
45 |
```
|
46 |
2. Build AutoGPTQ: We customized AutoGPTQ to support Ovis model quantization. You need to build from source to install the customized version.
|
47 |
```bash
|
48 |
-
git clone https://github.com/
|
49 |
cd AutoGPTQ
|
50 |
pip install -vvv --no-build-isolation -e .
|
51 |
```
|
52 |
Check [this](https://github.com/AutoGPTQ/AutoGPTQ/issues/194) first if you are building inside a Docker container.
|
53 |
|
54 |
### Usage
|
55 |
-
Below is a code snippet to run Ovis1.6-Gemma2-9B-GPTQ-Int4 with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
|
56 |
```python
|
57 |
import torch
|
58 |
from PIL import Image
|
59 |
from transformers import GenerationConfig
|
60 |
-
from auto_gptq.modeling import
|
61 |
|
62 |
# load model
|
63 |
load_device = "cuda:0" # customize load device
|
64 |
-
model =
|
65 |
-
"
|
66 |
device=load_device,
|
67 |
-
multimodal_max_length=8192,
|
68 |
trust_remote_code=True
|
69 |
)
|
70 |
-
model.model.generation_config = GenerationConfig.from_pretrained("
|
71 |
text_tokenizer = model.get_text_tokenizer()
|
72 |
visual_tokenizer = model.get_visual_tokenizer()
|
73 |
|
@@ -156,14 +158,14 @@ for i in range(len(batch_input_ids)):
|
|
156 |
|
157 |
|
158 |
## Quantize Your Own Ovis Model with AutoGPTQ
|
159 |
-
We provide a demonstration code snippet for you to quantize your own fine-tuned
|
160 |
```python
|
161 |
from typing import Dict, Sequence, Union, List
|
162 |
import copy
|
163 |
import logging
|
164 |
|
165 |
from auto_gptq import BaseQuantizeConfig
|
166 |
-
from auto_gptq.modeling import
|
167 |
import torch
|
168 |
from torch.utils.data import Dataset, DataLoader
|
169 |
from PIL import Image
|
@@ -187,13 +189,13 @@ quantize_config = BaseQuantizeConfig(
|
|
187 |
|
188 |
|
189 |
# Load model
|
190 |
-
model =
|
191 |
model_path,
|
192 |
quantize_config,
|
193 |
torch_dtype=torch.bfloat16,
|
194 |
multimodal_max_length=8192,
|
195 |
trust_remote_code=True
|
196 |
-
)
|
197 |
print(f"Model Loaded!")
|
198 |
|
199 |
|
@@ -325,7 +327,7 @@ print(f"Dataloader Loaded!")
|
|
325 |
|
326 |
|
327 |
# start quantizing
|
328 |
-
model.quantize(train_loader, cache_examples_on_gpu=False
|
329 |
print(f"Model Quantized! Now Saving...")
|
330 |
|
331 |
model.save_quantized(quantize_save_path, use_safetensors=True)
|
|
|
31 |
| Ovis MLLMs | ViT | LLM | Model Weights | Demo |
|
32 |
|:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
|
33 |
| Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
|
34 |
+
| Ovis1.6-Llama3.2-3B | Siglip-400M | Llama-3.2-3B-Instruct | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Llama3.2-3B) |
|
35 |
+
| Ovis1.6-Gemma2-9B-GPTQ-Int4 | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4) | - |
|
36 |
+
| Ovis1.6-Llama3.2-3B-GPTQ-Int4 | Siglip-400M | Llama-3.2-3B-Instruct | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B-GPTQ-Int4) | - |
|
37 |
|
38 |
+
## Quantized Model
|
39 |
We quantized Ovis1.6 with AutoGPTQ. Follow these steps to run it.
|
40 |
|
41 |
### Installation
|
|
|
48 |
```
|
49 |
2. Build AutoGPTQ: We customized AutoGPTQ to support Ovis model quantization. You need to build from source to install the customized version.
|
50 |
```bash
|
51 |
+
git clone https://github.com/AIDC-AI/AutoGPTQ.git
|
52 |
cd AutoGPTQ
|
53 |
pip install -vvv --no-build-isolation -e .
|
54 |
```
|
55 |
Check [this](https://github.com/AutoGPTQ/AutoGPTQ/issues/194) first if you are building inside a Docker container.
|
56 |
|
57 |
### Usage
|
58 |
+
Below is a code snippet to run **Ovis1.6-Gemma2-9B-GPTQ-Int4** with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
|
59 |
```python
|
60 |
import torch
|
61 |
from PIL import Image
|
62 |
from transformers import GenerationConfig
|
63 |
+
from auto_gptq.modeling import OvisGemma2GPTQForCausalLM
|
64 |
|
65 |
# load model
|
66 |
load_device = "cuda:0" # customize load device
|
67 |
+
model = OvisGemma2GPTQForCausalLM.from_pretrained(
|
68 |
+
"AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4",
|
69 |
device=load_device,
|
|
|
70 |
trust_remote_code=True
|
71 |
)
|
72 |
+
model.model.generation_config = GenerationConfig.from_pretrained("AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4")
|
73 |
text_tokenizer = model.get_text_tokenizer()
|
74 |
visual_tokenizer = model.get_visual_tokenizer()
|
75 |
|
|
|
158 |
|
159 |
|
160 |
## Quantize Your Own Ovis Model with AutoGPTQ
|
161 |
+
We provide a demonstration code snippet for you to quantize your own fine-tuned **Ovis1.6-Gemma2-9B** model. Before running the code, you need to **follow the ABOVE installation steps** to obtain an environment for quantization.
|
162 |
```python
|
163 |
from typing import Dict, Sequence, Union, List
|
164 |
import copy
|
165 |
import logging
|
166 |
|
167 |
from auto_gptq import BaseQuantizeConfig
|
168 |
+
from auto_gptq.modeling import OvisGemma2GPTQForCausalLM
|
169 |
import torch
|
170 |
from torch.utils.data import Dataset, DataLoader
|
171 |
from PIL import Image
|
|
|
189 |
|
190 |
|
191 |
# Load model
|
192 |
+
model = OvisGemma2GPTQForCausalLM.from_pretrained(
|
193 |
model_path,
|
194 |
quantize_config,
|
195 |
torch_dtype=torch.bfloat16,
|
196 |
multimodal_max_length=8192,
|
197 |
trust_remote_code=True
|
198 |
+
).cuda()
|
199 |
print(f"Model Loaded!")
|
200 |
|
201 |
|
|
|
327 |
|
328 |
|
329 |
# start quantizing
|
330 |
+
model.quantize(train_loader, cache_examples_on_gpu=False)
|
331 |
print(f"Model Quantized! Now Saving...")
|
332 |
|
333 |
model.save_quantized(quantize_save_path, use_safetensors=True)
|