Spaces:

clayton07
/

qwen2-colpali-ocr

Running

App Files Files Community

clayton07 commited on Sep 29, 2024

Commit

080848a

verified ·

1 Parent(s): b5ba0b7

Update README.md

Browse files

Files changed (1) hide show

README.md +88 -1

README.md CHANGED Viewed

@@ -8,5 +8,92 @@ sdk_version: 1.38.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 app_file: app.py
 pinned: false
 ---
+# Qwen2-Colpali-OCR
+This application demonstrates a Multimodal Retrieval-Augmented Generation (RAG) system using the Qwen2-VL model and a custom RAG implementation. It allows users to upload images and ask questions about them, combining visual and textual information to generate responses.
+It is deployed here on HuggingFace Spaces [https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr]([url](https://huggingface.co/spaces/clayton07/qwen2-colpali-ocr))
+## Prerequisites
+- Python 3.8+
+- CUDA-compatible GPU (recommended for optimal performance)
+## Installation
+1. Clone the repository:
+   ```
+   git clone https://github.com/your-username/multimodal-rag-app.git
+   cd multimodal-rag-app
+   ```
+2. Create a virtual environment:
+   ```
+   python -m venv venv
+   source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
+   ```
+3. Install the required packages:
+   ```
+   pip install -r requirements.txt
+   ```
+## Running the Application Locally
+1. Ensure you're in the project directory and your virtual environment is activated.
+2. Run the Streamlit app:
+   ```
+   streamlit run app.py
+   ```
+3. Open a web browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
+## Features
+- Image upload or selection of an example image
+- Text-based querying of uploaded images
+- Multimodal RAG processing using custom RAG model and Qwen2-VL
+- Adjustable response length
+## Usage
+1. Choose to upload an image or use the example image.
+2. If uploading, select an image file (PNG, JPG, or JPEG).
+3. Enter a text query about the image in the provided input field.
+4. Adjust the maximum number of tokens for the response using the slider.
+5. View the generated response based on the image and your query.
+## Deployment
+This application can be deployed on various platforms that support Streamlit apps. Here are general steps for deployment:
+1. Ensure all dependencies are listed in `requirements.txt`.
+2. Choose a deployment platform (e.g., Streamlit Cloud, Heroku, or a cloud provider like AWS or GCP).
+3. Follow the platform-specific deployment instructions, which typically involve:
+   - Connecting your GitHub repository to the deployment platform
+   - Configuring environment variables if necessary
+   - Setting up any required build processes
+Note: For optimal performance, deploy on a platform that provides GPU support.
+## Disclaimer
+The apputilizes the free tier of HuggingFace Spaces, which only has support for CPU, resulting in slower processing times. For optimal performance, it's recommended to run the app locally on a machine with GPU support.
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## License
+GNU Public License v2
+## Acknowledgments
+- This project uses the [Qwen2-VL model](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) from Hugging Face.
+- The custom RAG implementation is based on the [colpali model](https://huggingface.co/vidore/colpali).