clayton07 commited on
Commit
4736f54
·
verified ·
1 Parent(s): 5e12612

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -32
README.md CHANGED
@@ -12,18 +12,24 @@ pinned: false
12
 
13
 
14
  This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
15
- It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr))
 
 
16
 
17
  ## Prerequisites
18
 
19
  - Python 3.8+
 
 
 
 
20
  - CUDA-compatible GPU (recommended for optimal performance)
21
 
22
  ## Installation
23
 
24
  1. Clone the repository:
25
  ```
26
- git clone https://github.com/your-username/multimodal-rag-app.git
27
  cd multimodal-rag-app
28
  ```
29
 
@@ -50,38 +56,24 @@ It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton
50
  3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
51
 
52
 
53
- ## Features
54
-
55
- - Image upload or selection of an example image
56
- - Text-based querying of uploaded images
57
- - Multimodal RAG processing using custom RAG model and Qwen2-VL
58
- - Adjustable response length
59
-
60
-
61
  ## Usage
62
 
63
  1. Choose to upload an image or use the example image.
64
  2. If uploading, select an image file (PNG, JPG, or JPEG).
65
- 3. Enter a text query about the image in the provided input field.
66
  4. Adjust the maximum number of tokens for the response using the slider.
67
- 5. View the generated response based on the image and your query.
68
-
69
- ## Deployment
70
 
71
- This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:
72
 
73
- 1. Ensure all dependencies are listed in `requirements.txt`.
74
- 2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
75
- 3. Follow the platform-specific deployment instructions, which typically involve:
76
- - Connecting your GitHub repository to the deployment platform
77
- - Configuring environment variables if necessary
78
- - Setting up any required build processes
79
 
80
- Note: For optimal performance, deploy on a platform that provides GPU support.
81
 
82
- ## Disclaimer
83
 
84
- The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
 
85
 
86
  ## Contributing
87
 
@@ -89,11 +81,4 @@ Contributions are welcome! Please feel free to submit a Pull Request.
89
 
90
  ## License
91
 
92
- GNU Public License v2
93
-
94
- ## Acknowledgments
95
-
96
- - This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
97
- - The custom RAG implementation is based on the [colpali model](https://huggingface.co/vidore/colpali).
98
-
99
-
 
12
 
13
 
14
  This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
15
+
16
+
17
+ It is deployed here on HuggingFace Spaces https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr
18
 
19
  ## Prerequisites
20
 
21
  - Python 3.8+
22
+ - Pytorch 2.4.1
23
+ - Torchvision 0.19.1
24
+ - Qwen V1
25
+ - Byaldi
26
  - CUDA-compatible GPU (recommended for optimal performance)
27
 
28
  ## Installation
29
 
30
  1. Clone the repository:
31
  ```
32
+ git clone https://github.com/Claytonn7/qwen2-colpali-ocr.git
33
  cd multimodal-rag-app
34
  ```
35
 
 
56
  3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
57
 
58
 
 
 
 
 
 
 
 
 
59
  ## Usage
60
 
61
  1. Choose to upload an image or use the example image.
62
  2. If uploading, select an image file (PNG, JPG, or JPEG).
63
+ 3. Enter a single keyword in the provided input field.
64
  4. Adjust the maximum number of tokens for the response using the slider.
65
+ 5. View the extracted text from the image, with the searched keyword highlighted. Example screenshot [here](https://github.com/Claytonn7/qwen2-colpali-ocr/blob/main/examples-app/6-keyword-highlight2.jpg)
 
 
66
 
67
+ NB: Check the examples-app directory on this repo for more example screenshots.
68
 
69
+ ## Disclaimer
 
 
 
 
 
70
 
71
+ The app utilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in very slow processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
72
 
73
+ ## Acknowledgments
74
 
75
+ - This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
76
+ - The [byaldi](https://github.com/AnswerDotAI/byaldi) implementation of the [colpali model](https://huggingface.co/vidore/colpali).
77
 
78
  ## Contributing
79
 
 
81
 
82
  ## License
83
 
84
+ This project is licensed under the GPL-2.0 License