prithivMLmods commited on
Commit
489fbb9
·
verified ·
1 Parent(s): d100fd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +302 -0
README.md CHANGED
@@ -19,6 +19,308 @@ tags:
19
 
20
  *Omni-Reasoner-2B* is based on Qwen2VL and is designed for mathematical and content-based explanations. It excels in providing detailed reasoning about content and solving math problems with proper content formatting. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  # **Key Enhancements**
23
  1. **Advanced Reasoning Capabilities**:
24
  - Enhanced ability to perform long-form reasoning for complex mathematical and content-based queries.
 
19
 
20
  *Omni-Reasoner-2B* is based on Qwen2VL and is designed for mathematical and content-based explanations. It excels in providing detailed reasoning about content and solving math problems with proper content formatting. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
21
 
22
+ # **Use it with Transformers**
23
+
24
+ *Before using, ensure that the required libraries are successfully installed in the environment.*
25
+
26
+ !pip install gradio spaces transformers accelerate numpy requests torch torchvision qwen-vl-utils av ipython reportlab fpdf python-docx pillow huggingface_hub
27
+
28
+ *ChemQwen With Inference Documentation, **Before using, make sure that the `hf_token` is provided in the login field in the code below.***
29
+
30
+ ```python
31
+
32
+ # Authenticate with Hugging Face
33
+ from huggingface_hub import login
34
+
35
+ # Log in to Hugging Face using the provided token
36
+ hf_token = '----xxxxx----'
37
+ login(hf_token)
38
+
39
+ # Demo
40
+ import gradio as gr
41
+ import spaces
42
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, TextIteratorStreamer
43
+ from qwen_vl_utils import process_vision_info
44
+ import torch
45
+ from PIL import Image
46
+ import os
47
+ import uuid
48
+ import io
49
+ from threading import Thread
50
+ from reportlab.lib.pagesizes import A4
51
+ from reportlab.lib.styles import getSampleStyleSheet
52
+ from reportlab.lib import colors
53
+ from reportlab.platypus import SimpleDocTemplate, Image as RLImage, Paragraph, Spacer
54
+ from reportlab.pdfbase import pdfmetrics
55
+ from reportlab.pdfbase.ttfonts import TTFont
56
+ import docx
57
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
58
+
59
+ # Define model options
60
+ MODEL_OPTIONS = {
61
+ "Omni-Reasoner": "prithivMLmods/Omni-Reasoner-2B",
62
+ }
63
+
64
+ # Preload models and processors into CUDA
65
+ models = {}
66
+ processors = {}
67
+ for name, model_id in MODEL_OPTIONS.items():
68
+ print(f"Loading {name}...")
69
+ models[name] = Qwen2VLForConditionalGeneration.from_pretrained(
70
+ model_id,
71
+ trust_remote_code=True,
72
+ torch_dtype=torch.float16
73
+ ).to("cuda").eval()
74
+ processors[name] = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
75
+
76
+ image_extensions = Image.registered_extensions()
77
+
78
+ def identify_and_save_blob(blob_path):
79
+ """Identifies if the blob is an image and saves it."""
80
+ try:
81
+ with open(blob_path, 'rb') as file:
82
+ blob_content = file.read()
83
+ try:
84
+ Image.open(io.BytesIO(blob_content)).verify() # Check if it's a valid image
85
+ extension = ".png" # Default to PNG for saving
86
+ media_type = "image"
87
+ except (IOError, SyntaxError):
88
+ raise ValueError("Unsupported media type. Please upload a valid image.")
89
+
90
+ filename = f"temp_{uuid.uuid4()}_media{extension}"
91
+ with open(filename, "wb") as f:
92
+ f.write(blob_content)
93
+
94
+ return filename, media_type
95
+
96
+ except FileNotFoundError:
97
+ raise ValueError(f"The file {blob_path} was not found.")
98
+ except Exception as e:
99
+ raise ValueError(f"An error occurred while processing the file: {e}")
100
+
101
+ @spaces.GPU
102
+ def qwen_inference(model_name, media_input, text_input=None):
103
+ """Handles inference for the selected model."""
104
+ model = models[model_name]
105
+ processor = processors[model_name]
106
+
107
+ if isinstance(media_input, str):
108
+ media_path = media_input
109
+ if media_path.endswith(tuple([i for i in image_extensions.keys()])):
110
+ media_type = "image"
111
+ else:
112
+ try:
113
+ media_path, media_type = identify_and_save_blob(media_input)
114
+ except Exception as e:
115
+ raise ValueError("Unsupported media type. Please upload a valid image.")
116
+
117
+ messages = [
118
+ {
119
+ "role": "user",
120
+ "content": [
121
+ {
122
+ "type": media_type,
123
+ media_type: media_path
124
+ },
125
+ {"type": "text", "text": text_input},
126
+ ],
127
+ }
128
+ ]
129
+
130
+ text = processor.apply_chat_template(
131
+ messages, tokenize=False, add_generation_prompt=True
132
+ )
133
+ image_inputs, _ = process_vision_info(messages)
134
+ inputs = processor(
135
+ text=[text],
136
+ images=image_inputs,
137
+ padding=True,
138
+ return_tensors="pt",
139
+ ).to("cuda")
140
+
141
+ streamer = TextIteratorStreamer(
142
+ processor.tokenizer, skip_prompt=True, skip_special_tokens=True
143
+ )
144
+ generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=1024)
145
+
146
+ thread = Thread(target=model.generate, kwargs=generation_kwargs)
147
+ thread.start()
148
+
149
+ buffer = ""
150
+ for new_text in streamer:
151
+ buffer += new_text
152
+ # Remove <|im_end|> or similar tokens from the output
153
+ buffer = buffer.replace("<|im_end|>", "")
154
+ yield buffer
155
+
156
+ def format_plain_text(output_text):
157
+ """Formats the output text as plain text without LaTeX delimiters."""
158
+ # Remove LaTeX delimiters and convert to plain text
159
+ plain_text = output_text.replace("\\(", "").replace("\\)", "").replace("\\[", "").replace("\\]", "")
160
+ return plain_text
161
+
162
+ def generate_document(media_path, output_text, file_format, font_size, line_spacing, alignment, image_size):
163
+ """Generates a document with the input image and plain text output."""
164
+ plain_text = format_plain_text(output_text)
165
+ if file_format == "pdf":
166
+ return generate_pdf(media_path, plain_text, font_size, line_spacing, alignment, image_size)
167
+ elif file_format == "docx":
168
+ return generate_docx(media_path, plain_text, font_size, line_spacing, alignment, image_size)
169
+
170
+ def generate_pdf(media_path, plain_text, font_size, line_spacing, alignment, image_size):
171
+ """Generates a PDF document."""
172
+ filename = f"output_{uuid.uuid4()}.pdf"
173
+ doc = SimpleDocTemplate(
174
+ filename,
175
+ pagesize=A4,
176
+ rightMargin=inch,
177
+ leftMargin=inch,
178
+ topMargin=inch,
179
+ bottomMargin=inch
180
+ )
181
+ styles = getSampleStyleSheet()
182
+ styles["Normal"].fontSize = int(font_size)
183
+ styles["Normal"].leading = int(font_size) * line_spacing
184
+ styles["Normal"].alignment = {
185
+ "Left": 0,
186
+ "Center": 1,
187
+ "Right": 2,
188
+ "Justified": 4
189
+ }[alignment]
190
+
191
+ story = []
192
+
193
+ # Add image with size adjustment
194
+ image_sizes = {
195
+ "Small": (200, 200),
196
+ "Medium": (400, 400),
197
+ "Large": (600, 600)
198
+ }
199
+ img = RLImage(media_path, width=image_sizes[image_size][0], height=image_sizes[image_size][1])
200
+ story.append(img)
201
+ story.append(Spacer(1, 12))
202
+
203
+ # Add plain text output
204
+ text = Paragraph(plain_text, styles["Normal"])
205
+ story.append(text)
206
+
207
+ doc.build(story)
208
+ return filename
209
+
210
+ def generate_docx(media_path, plain_text, font_size, line_spacing, alignment, image_size):
211
+ """Generates a DOCX document."""
212
+ filename = f"output_{uuid.uuid4()}.docx"
213
+ doc = docx.Document()
214
+
215
+ # Add image with size adjustment
216
+ image_sizes = {
217
+ "Small": docx.shared.Inches(2),
218
+ "Medium": docx.shared.Inches(4),
219
+ "Large": docx.shared.Inches(6)
220
+ }
221
+ doc.add_picture(media_path, width=image_sizes[image_size])
222
+ doc.add_paragraph()
223
+
224
+ # Add plain text output
225
+ paragraph = doc.add_paragraph()
226
+ paragraph.paragraph_format.line_spacing = line_spacing
227
+ paragraph.paragraph_format.alignment = {
228
+ "Left": WD_ALIGN_PARAGRAPH.LEFT,
229
+ "Center": WD_ALIGN_PARAGRAPH.CENTER,
230
+ "Right": WD_ALIGN_PARAGRAPH.RIGHT,
231
+ "Justified": WD_ALIGN_PARAGRAPH.JUSTIFY
232
+ }[alignment]
233
+ run = paragraph.add_run(plain_text)
234
+ run.font.size = docx.shared.Pt(int(font_size))
235
+
236
+ doc.save(filename)
237
+ return filename
238
+
239
+ # CSS for output styling
240
+ css = """
241
+ #output {
242
+ height: 500px;
243
+ overflow: auto;
244
+ border: 1px solid #ccc;
245
+ }
246
+ .submit-btn {
247
+ background-color: #cf3434 !important;
248
+ color: white !important;
249
+ }
250
+ .submit-btn:hover {
251
+ background-color: #ff2323 !important;
252
+ }
253
+ .download-btn {
254
+ background-color: #35a6d6 !important;
255
+ color: white !important;
256
+ }
257
+ .download-btn:hover {
258
+ background-color: #22bcff !important;
259
+ }
260
+ """
261
+
262
+ # Gradio app setup
263
+ with gr.Blocks(css=css) as demo:
264
+ gr.Markdown("# ChemQwen Chemical Identifier")
265
+
266
+ with gr.Tab(label="Image Input"):
267
+
268
+ with gr.Row():
269
+ with gr.Column():
270
+ model_choice = gr.Dropdown(
271
+ label="Model Selection",
272
+ choices=list(MODEL_OPTIONS.keys()),
273
+ value="Omni-Reasoner"
274
+ )
275
+ input_media = gr.File(
276
+ label="Upload Image", type="filepath"
277
+ )
278
+ text_input = gr.Textbox(label="Question", placeholder="Ask a question about the image...")
279
+ submit_btn = gr.Button(value="Submit", elem_classes="submit-btn")
280
+
281
+ with gr.Column():
282
+ output_text = gr.Textbox(label="Output Text", lines=10)
283
+ plain_text_output = gr.Textbox(label="Standardized Plain Text", lines=10)
284
+
285
+ submit_btn.click(
286
+ qwen_inference, [model_choice, input_media, text_input], [output_text]
287
+ ).then(
288
+ lambda output_text: format_plain_text(output_text), [output_text], [plain_text_output]
289
+ )
290
+
291
+ # Add examples directly usable by clicking
292
+ with gr.Row():
293
+ with gr.Column():
294
+ line_spacing = gr.Dropdown(
295
+ choices=[0.5, 1.0, 1.15, 1.5, 2.0, 2.5, 3.0],
296
+ value=1.5,
297
+ label="Line Spacing"
298
+ )
299
+ font_size = gr.Dropdown(
300
+ choices=["8", "10", "12", "14", "16", "18", "20", "22", "24"],
301
+ value="18",
302
+ label="Font Size"
303
+ )
304
+ alignment = gr.Dropdown(
305
+ choices=["Left", "Center", "Right", "Justified"],
306
+ value="Justified",
307
+ label="Text Alignment"
308
+ )
309
+ image_size = gr.Dropdown(
310
+ choices=["Small", "Medium", "Large"],
311
+ value="Small",
312
+ label="Image Size"
313
+ )
314
+ file_format = gr.Radio(["pdf", "docx"], label="File Format", value="pdf")
315
+ get_document_btn = gr.Button(value="Get Document", elem_classes="download-btn")
316
+
317
+ get_document_btn.click(
318
+ generate_document, [input_media, output_text, file_format, font_size, line_spacing, alignment, image_size], gr.File(label="Download Document")
319
+ )
320
+
321
+ demo.launch(debug=True)
322
+ ```
323
+
324
  # **Key Enhancements**
325
  1. **Advanced Reasoning Capabilities**:
326
  - Enhanced ability to perform long-form reasoning for complex mathematical and content-based queries.