yujiepan commited on
Commit
b4499b5
1 Parent(s): cf85772

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -6,10 +6,63 @@ widget:
6
  - text: Hello!
7
  example_title: Hello world
8
  group: Python
 
 
9
  ---
10
 
11
  This model is for debugging. It is randomly initialized using the config from [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) but with smaller size.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Codes:
14
  ```python
15
  import os
@@ -91,4 +144,4 @@ def try_inference():
91
 
92
 
93
  try_inference()
94
- ```
 
6
  - text: Hello!
7
  example_title: Hello world
8
  group: Python
9
+ base_model:
10
+ - Qwen/Qwen2-VL-7B-Instruct
11
  ---
12
 
13
  This model is for debugging. It is randomly initialized using the config from [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) but with smaller size.
14
 
15
+ Usage:
16
+ ```python
17
+ from PIL import Image
18
+ import requests
19
+ import torch
20
+ from torchvision import io
21
+ from typing import Dict
22
+ from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
23
+
24
+ model_id = "yujiepan/qwen2-vl-tiny-random"
25
+
26
+ # Load the model in half-precision on the available device(s)
27
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
28
+ model_id, torch_dtype="auto", device_map="auto"
29
+ )
30
+ processor = AutoProcessor.from_pretrained(model_id)
31
+
32
+ # Image
33
+ url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
34
+ image = Image.open(requests.get(url, stream=True).raw)
35
+ conversation = [
36
+ {
37
+ "role": "user",
38
+ "content": [
39
+ {
40
+ "type": "image",
41
+ },
42
+ {"type": "text", "text": "Describe this image."},
43
+ ],
44
+ }
45
+ ]
46
+ text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
47
+ # Excepted output: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Describe this image.<|im_end|>\n<|im_start|>assistant\n'
48
+
49
+ inputs = processor(
50
+ text=[text_prompt], images=[image], padding=True, return_tensors="pt"
51
+ )
52
+ inputs = inputs.to("cuda")
53
+
54
+ output_ids = model.generate(**inputs, max_new_tokens=128)
55
+ generated_ids = [
56
+ output_ids[len(input_ids) :]
57
+ for input_ids, output_ids in zip(inputs.input_ids, output_ids)
58
+ ]
59
+ output_text = processor.batch_decode(
60
+ generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
61
+ )
62
+ print(output_text)
63
+ ```
64
+
65
+
66
  Codes:
67
  ```python
68
  import os
 
144
 
145
 
146
  try_inference()
147
+ ```