Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -8,5 +8,92 @@ sdk_version: 1.38.0
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
11 |
+
# Qwen2-Colpali-OCR
|
12 |
+
|
13 |
+
|
14 |
+
This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
|
15 |
+
It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr))
|
16 |
+
|
17 |
+
## Prerequisites
|
18 |
+
|
19 |
+
- Python 3.8+
|
20 |
+
- CUDA-compatible GPU (recommended for optimal performance)
|
21 |
+
|
22 |
+
## Installation
|
23 |
+
|
24 |
+
1. Clone the repository:
|
25 |
+
```
|
26 |
+
git clone https://github.com/your-username/multimodal-rag-app.git
|
27 |
+
cd multimodal-rag-app
|
28 |
+
```
|
29 |
+
|
30 |
+
2. Create a virtual environment:
|
31 |
+
```
|
32 |
+
python -m venv venv
|
33 |
+
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
34 |
+
```
|
35 |
+
|
36 |
+
3. Install the required packages:
|
37 |
+
```
|
38 |
+
pip install -r requirements.txt
|
39 |
+
```
|
40 |
+
|
41 |
+
## Running the Application Locally
|
42 |
+
|
43 |
+
1. Ensure you're in the project directory and your virtual environment is activated.
|
44 |
+
|
45 |
+
2. Run the Streamlit app:
|
46 |
+
```
|
47 |
+
streamlit run app.py
|
48 |
+
```
|
49 |
+
|
50 |
+
3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
|
51 |
+
|
52 |
+
|
53 |
+
## Features
|
54 |
+
|
55 |
+
- Image upload or selection of an example image
|
56 |
+
- Text-based querying of uploaded images
|
57 |
+
- Multimodal RAG processing using custom RAG model and Qwen2-VL
|
58 |
+
- Adjustable response length
|
59 |
+
|
60 |
+
|
61 |
+
## Usage
|
62 |
+
|
63 |
+
1. Choose to upload an image or use the example image.
|
64 |
+
2. If uploading, select an image file (PNG, JPG, or JPEG).
|
65 |
+
3. Enter a text query about the image in the provided input field.
|
66 |
+
4. Adjust the maximum number of tokens for the response using the slider.
|
67 |
+
5. View the generated response based on the image and your query.
|
68 |
+
|
69 |
+
## Deployment
|
70 |
+
|
71 |
+
This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:
|
72 |
+
|
73 |
+
1. Ensure all dependencies are listed in `requirements.txt`.
|
74 |
+
2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
|
75 |
+
3. Follow the platform-specific deployment instructions, which typically involve:
|
76 |
+
- Connecting your GitHub repository to the deployment platform
|
77 |
+
- Configuring environment variables if necessary
|
78 |
+
- Setting up any required build processes
|
79 |
+
|
80 |
+
Note: For optimal performance, deploy on a platform that provides GPU support.
|
81 |
+
|
82 |
+
## Disclaimer
|
83 |
+
|
84 |
+
The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
|
85 |
+
|
86 |
+
## Contributing
|
87 |
+
|
88 |
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
89 |
+
|
90 |
+
## License
|
91 |
+
|
92 |
+
GNU Public License v2
|
93 |
+
|
94 |
+
## Acknowledgments
|
95 |
+
|
96 |
+
- This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
|
97 |
+
- The custom RAG implementation is based on the [colpali model](https://huggingface.co/vidore/colpali).
|
98 |
+
|
99 |
|
|