# `GPT4VisionAPI` Documentation **Table of Contents** - [Introduction](#introduction) - [Installation](#installation) - [Module Overview](#module-overview) - [Class: GPT4VisionAPI](#class-gpt4visionapi) - [Initialization](#initialization) - [Methods](#methods) - [encode_image](#encode_image) - [run](#run) - [__call__](#__call__) - [Examples](#examples) - [Example 1: Basic Usage](#example-1-basic-usage) - [Example 2: Custom API Key](#example-2-custom-api-key) - [Example 3: Adjusting Maximum Tokens](#example-3-adjusting-maximum-tokens) - [Additional Information](#additional-information) - [References](#references) ## Introduction Welcome to the documentation for the `GPT4VisionAPI` module! This module is a powerful wrapper for the OpenAI GPT-4 Vision model. It allows you to interact with the model to generate descriptions or answers related to images. This documentation will provide you with comprehensive information on how to use this module effectively. ## Installation Before you start using the `GPT4VisionAPI` module, make sure you have the required dependencies installed. You can install them using the following commands: ```bash pip3 install --upgrade swarms ``` ## Module Overview The `GPT4VisionAPI` module serves as a bridge between your application and the OpenAI GPT-4 Vision model. It allows you to send requests to the model and retrieve responses related to images. Here are some key features and functionality provided by this module: - Encoding images to base64 format. - Running the GPT-4 Vision model with specified tasks and images. - Customization options such as setting the OpenAI API key and maximum token limit. ## Class: GPT4VisionAPI The `GPT4VisionAPI` class is the core component of this module. It encapsulates the functionality required to interact with the GPT-4 Vision model. Below, we'll dive into the class in detail. ### Initialization When initializing the `GPT4VisionAPI` class, you have the option to provide the OpenAI API key and set the maximum token limit. Here are the parameters and their descriptions: | Parameter | Type | Default Value | Description | |---------------------|----------|-------------------------------|----------------------------------------------------------------------------------------------------------| | openai_api_key | str | `OPENAI_API_KEY` environment variable (if available) | The OpenAI API key. If not provided, it defaults to the `OPENAI_API_KEY` environment variable. | | max_tokens | int | 300 | The maximum number of tokens to generate in the model's response. | Here's how you can initialize the `GPT4VisionAPI` class: ```python from swarm_models import GPT4VisionAPI # Initialize with default API key and max_tokens api = GPT4VisionAPI() # Initialize with custom API key and max_tokens custom_api_key = "your_custom_api_key" api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500) ``` ### Methods #### encode_image This method allows you to encode an image from a URL to base64 format. It's a utility function used internally by the module. ```python def encode_image(img: str) -> str: """ Encode image to base64. Parameters: - img (str): URL of the image to encode. Returns: str: Base64 encoded image. """ ``` #### run The `run` method is the primary way to interact with the GPT-4 Vision model. It sends a request to the model with a task and an image URL, and it returns the model's response. ```python def run(task: str, img: str) -> str: """ Run the GPT-4 Vision model. Parameters: - task (str): The task or question related to the image. - img (str): URL of the image to analyze. Returns: str: The model's response. """ ``` #### __call__ The `__call__` method is a convenient way to run the GPT-4 Vision model. It has the same functionality as the `run` method. ```python def __call__(task: str, img: str) -> str: """ Run the GPT-4 Vision model (callable). Parameters: - task (str): The task or question related to the image. - img (str): URL of the image to analyze. Returns: str: The model's response. """ ``` ## Examples Let's explore some usage examples of the `GPT4VisionAPI` module to better understand how to use it effectively. ### Example 1: Basic Usage In this example, we'll use the module with the default API key and maximum tokens to analyze an image. ```python from swarm_models import GPT4VisionAPI # Initialize with default API key and max_tokens api = GPT4VisionAPI() # Define the task and image URL task = "What is the color of the object?" img = "https://i.imgur.com/2M2ZGwC.jpeg" # Run the GPT-4 Vision model response = api.run(task, img) # Print the model's response print(response) ``` ### Example 2: Custom API Key If you have a custom API key, you can initialize the module with it as shown in this example. ```python from swarm_models import GPT4VisionAPI # Initialize with custom API key and max_tokens custom_api_key = "your_custom_api_key" api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500) # Define the task and image URL task = "What is the object in the image?" img = "https://i.imgur.com/3T3ZHwD.jpeg" # Run the GPT-4 Vision model response = api.run(task, img) # Print the model's response print(response) ``` ### Example 3: Adjusting Maximum Tokens You can also customize the maximum token limit when initializing the module. In this example, we set it to 1000 tokens. ```python from swarm_models import GPT4VisionAPI # Initialize with default API key and custom max_tokens api = GPT4VisionAPI(max_tokens=1000) # Define the task and image URL task = "Describe the scene in the image." img = "https://i.imgur.com/4P4ZRxU.jpeg" # Run the GPT-4 Vision model response = api.run(task, img) # Print the model's response print(response) ``` ## Additional Information - If you encounter any errors or issues with the module, make sure to check your API key and internet connectivity. - It's recommended to handle exceptions when using the module to gracefully handle errors. - You can further customize the module to fit your specific use case by modifying the code as needed. ## References - [OpenAI API Documentation](https://beta.openai.com/docs/) This documentation provides a comprehensive guide on how to use the `GPT4VisionAPI` module effectively. It covers initialization, methods, usage examples, and additional information to ensure a smooth experience when working with the GPT-4 Vision model.