clayton07 commited on
Commit
080848a
·
verified ·
1 Parent(s): b5ba0b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -1
README.md CHANGED
@@ -8,5 +8,92 @@ sdk_version: 1.38.0
8
  app_file: app.py
9
  pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+ # Qwen2-Colpali-OCR
12
+
13
+
14
+ This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
15
+ It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr))
16
+
17
+ ## Prerequisites
18
+
19
+ - Python 3.8+
20
+ - CUDA-compatible GPU (recommended for optimal performance)
21
+
22
+ ## Installation
23
+
24
+ 1. Clone the repository:
25
+ ```
26
+ git clone https://github.com/your-username/multimodal-rag-app.git
27
+ cd multimodal-rag-app
28
+ ```
29
+
30
+ 2. Create a virtual environment:
31
+ ```
32
+ python -m venv venv
33
+ source venv/bin/activate # On Windows, use `venv\Scripts\activate`
34
+ ```
35
+
36
+ 3. Install the required packages:
37
+ ```
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ ## Running the Application Locally
42
+
43
+ 1. Ensure you're in the project directory and your virtual environment is activated.
44
+
45
+ 2. Run the Streamlit app:
46
+ ```
47
+ streamlit run app.py
48
+ ```
49
+
50
+ 3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
51
+
52
+
53
+ ## Features
54
+
55
+ - Image upload or selection of an example image
56
+ - Text-based querying of uploaded images
57
+ - Multimodal RAG processing using custom RAG model and Qwen2-VL
58
+ - Adjustable response length
59
+
60
+
61
+ ## Usage
62
+
63
+ 1. Choose to upload an image or use the example image.
64
+ 2. If uploading, select an image file (PNG, JPG, or JPEG).
65
+ 3. Enter a text query about the image in the provided input field.
66
+ 4. Adjust the maximum number of tokens for the response using the slider.
67
+ 5. View the generated response based on the image and your query.
68
+
69
+ ## Deployment
70
+
71
+ This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:
72
+
73
+ 1. Ensure all dependencies are listed in `requirements.txt`.
74
+ 2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
75
+ 3. Follow the platform-specific deployment instructions, which typically involve:
76
+ - Connecting your GitHub repository to the deployment platform
77
+ - Configuring environment variables if necessary
78
+ - Setting up any required build processes
79
+
80
+ Note: For optimal performance, deploy on a platform that provides GPU support.
81
+
82
+ ## Disclaimer
83
+
84
+ The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
85
+
86
+ ## Contributing
87
+
88
+ Contributions are welcome! Please feel free to submit a Pull Request.
89
+
90
+ ## License
91
+
92
+ GNU Public License v2
93
+
94
+ ## Acknowledgments
95
+
96
+ - This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
97
+ - The custom RAG implementation is based on the [colpali model](https://huggingface.co/vidore/colpali).
98
+
99