# `Agent` Documentation Swarm Agent is a powerful autonomous agent framework designed to connect Language Models (LLMs) with various tools and long-term memory. This class provides the ability to ingest and process various types of documents such as PDFs, text files, Markdown files, JSON files, and more. The Agent structure offers a wide range of features to enhance the capabilities of LLMs and facilitate efficient task execution. 1. **Conversational Loop**: It establishes a conversational loop with a language model. This means it allows you to interact with the model in a back-and-forth manner, taking turns in the conversation. 2. **Feedback Collection**: The class allows users to provide feedback on the responses generated by the model. This feedback can be valuable for training and improving the model's responses over time. 3. **Stoppable Conversation**: You can define custom stopping conditions for the conversation, allowing you to stop the interaction based on specific criteria. For example, you can stop the conversation if a certain keyword is detected in the responses. 4. **Retry Mechanism**: The class includes a retry mechanism that can be helpful if there are issues generating responses from the model. It attempts to generate a response multiple times before raising an error. ## Architecture ```mermaid graph TD A[Task Initiation] -->|Receives Task| B[Initial LLM Processing] B -->|Interprets Task| C[Tool Usage] C -->|Calls Tools| D[Function 1] C -->|Calls Tools| E[Function 2] D -->|Returns Data| C E -->|Returns Data| C C -->|Provides Data| F[Memory Interaction] F -->|Stores and Retrieves Data| G[RAG System] G -->|ChromaDB/Pinecone| H[Enhanced Data] F -->|Provides Enhanced Data| I[Final LLM Processing] I -->|Generates Final Response| J[Output] C -->|No Tools Available| K[Skip Tool Usage] K -->|Proceeds to Memory Interaction| F F -->|No Memory Available| L[Skip Memory Interaction] L -->|Proceeds to Final LLM Processing| I ``` ### `Agent` Attributes | Attribute | Description | |------------|-------------| | `id` | A unique identifier for the agent instance. | | `llm` | The language model instance used by the agent. | | `template` | The template used for formatting responses. | | `max_loops` | The maximum number of loops the agent can run. | | `stopping_condition` | A callable function that determines when the agent should stop looping. | | `loop_interval` | The interval (in seconds) between loops. | | `retry_attempts` | The number of retry attempts for failed LLM calls. | | `retry_interval` | The interval (in seconds) between retry attempts. | | `return_history` | A boolean indicating whether the agent should return the conversation history. | | `stopping_token` | A token that, when present in the response, stops the agent from looping. | | `dynamic_loops` | A boolean indicating whether the agent should dynamically determine the number of loops. | | `interactive` | A boolean indicating whether the agent should run in interactive mode. | | `dashboard` | A boolean indicating whether the agent should display a dashboard. | | `agent_name` | The name of the agent instance. | | `agent_description` | A description of the agent instance. | | `system_prompt` | The system prompt used to initialize the conversation. | | `tools` | A list of callable functions representing tools the agent can use. | | `dynamic_temperature_enabled` | A boolean indicating whether the agent should dynamically adjust the temperature of the LLM. | | `sop` | The standard operating procedure for the agent. | | `sop_list` | A list of strings representing the standard operating procedure. | | `saved_state_path` | The file path for saving and loading the agent's state. | | `autosave` | A boolean indicating whether the agent should automatically save its state. | | `context_length` | The maximum length of the context window (in tokens) for the LLM. | | `user_name` | The name used to represent the user in the conversation. | | `self_healing_enabled` | A boolean indicating whether the agent should attempt to self-heal in case of errors. | | `code_interpreter` | A boolean indicating whether the agent should interpret and execute code snippets. | | `multi_modal` | A boolean indicating whether the agent should support multimodal inputs (e.g., text and images). | | `pdf_path` | The file path of a PDF document to be ingested. | | `list_of_pdf` | A list of file paths for PDF documents to be ingested. | | `tokenizer` | An instance of a tokenizer used for token counting and management. | | `long_term_memory` | An instance of a `BaseVectorDatabase` implementation for long-term memory management. | | `preset_stopping_token` | A boolean indicating whether the agent should use a preset stopping token. | | `traceback` | An object used for traceback handling. | | `traceback_handlers` | A list of traceback handlers. | | `streaming_on` | A boolean indicating whether the agent should stream its responses. | | `docs` | A list of document paths or contents to be ingested. | | `docs_folder` | The path to a folder containing documents to be ingested. | | `verbose` | A boolean indicating whether the agent should print verbose output. | | `parser` | A callable function used for parsing input data. | | `best_of_n` | An integer indicating the number of best responses to generate (for sampling). | | `callback` | A callable function to be called after each agent loop. | | `metadata` | A dictionary containing metadata for the agent. | | `callbacks` | A list of callable functions to be called during the agent's execution. | | `logger_handler` | A handler for logging messages. | | `search_algorithm` | A callable function representing the search algorithm for long-term memory retrieval. | | `logs_to_filename` | The file path for logging agent activities. | | `evaluator` | A callable function used for evaluating the agent's responses. | | `output_json` | A boolean indicating whether the agent's output should be in JSON format. | | `stopping_func` | A callable function used as a stopping condition for the agent. | | `custom_loop_condition` | A callable function used as a custom loop condition for the agent. | | `sentiment_threshold` | A float value representing the sentiment threshold for evaluating responses. | | `custom_exit_command` | A string representing a custom command for exiting the agent's loop. | | `sentiment_analyzer` | A callable function used for sentiment analysis on the agent's outputs. | | `limit_tokens_from_string` | A callable function used for limiting the number of tokens in a string. | | `custom_tools_prompt` | A callable function used for generating a custom prompt for tool usage. | | `tool_schema` | A data structure representing the schema for the agent's tools. | | `output_type` | A type representing the expected output type of the agent's responses. | | `function_calling_type` | A string representing the type of function calling (e.g., "json"). | | `output_cleaner` | A callable function used for cleaning the agent's output. | | `function_calling_format_type` | A string representing the format type for function calling (e.g., "OpenAI"). | | `list_base_models` | A list of base models used for generating tool schemas. | | `metadata_output_type` | A string representing the output type for metadata. | | `state_save_file_type` | A string representing the file type for saving the agent's state (e.g., "json", "yaml"). | | `chain_of_thoughts` | A boolean indicating whether the agent should use the chain of thoughts technique. | | `algorithm_of_thoughts` | A boolean indicating whether the agent should use the algorithm of thoughts technique. | | `tree_of_thoughts` | A boolean indicating whether the agent should use the tree of thoughts technique. | | `tool_choice` | A string representing the method for tool selection (e.g., "auto"). | | `execute_tool` | A boolean indicating whether the agent should execute tools. | | `rules` | A string representing the rules for the agent's behavior. | | `planning` | A boolean indicating whether the agent should perform planning. | | `planning_prompt` | A string representing the prompt for planning. | | `device` | A string representing the device on which the agent should run. | | `custom_planning_prompt` | A string representing a custom prompt for planning. | | `memory_chunk_size` | An integer representing the maximum size of memory chunks for long-term memory retrieval. | | `agent_ops_on` | A boolean indicating whether agent operations should be enabled. | | `return_step_meta` | A boolean indicating whether or not to return JSON of all the steps and additional metadata | | `output_type` | A Literal type indicating whether to output "string", "str", "list", "json", "dict", "yaml" | ### `Agent` Methods | Method | Description | Inputs | Usage Example | |--------|-------------|--------|----------------| | `run(task, img=None, *args, **kwargs)` | Runs the autonomous agent loop to complete the given task. | `task` (str): The task to be performed.
`img` (str, optional): Path to an image file, if the task involves image processing.
`*args`, `**kwargs`: Additional arguments to pass to the language model. | `response = agent.run("Generate a report on financial performance.")` | | `__call__(task, img=None, *args, **kwargs)` | An alternative way to call the `run` method. | Same as `run`. | `response = agent("Generate a report on financial performance.")` | | `parse_and_execute_tools(response, *args, **kwargs)` | Parses the agent's response and executes any tools mentioned in it. | `response` (str): The agent's response to be parsed.
`*args`, `**kwargs`: Additional arguments to pass to the tool execution. | `agent.parse_and_execute_tools(response)` | | `long_term_memory_prompt(query, *args, **kwargs)` | Generates a prompt for querying the agent's long-term memory. | `query` (str): The query to search for in long-term memory.
`*args`, `**kwargs`: Additional arguments to pass to the long-term memory retrieval. | `memory_retrieval = agent.long_term_memory_prompt("financial performance")` | | `add_memory(message)` | Adds a message to the agent's memory. | `message` (str): The message ## Features - **Language Model Integration**: The Swarm Agent allows seamless integration with different language models, enabling users to leverage the power of state-of-the-art models. - **Tool Integration**: The framework supports the integration of various tools, enabling the agent to perform a wide range of tasks, from code execution to data analysis and beyond. - **Long-term Memory Management**: The Swarm Agent incorporates long-term memory management capabilities, allowing it to store and retrieve relevant information for effective decision-making and task execution. - **Document Ingestion**: The agent can ingest and process various types of documents, including PDFs, text files, Markdown files, JSON files, and more, enabling it to extract relevant information for task completion. - **Interactive Mode**: Users can interact with the agent in an interactive mode, enabling real-time communication and task execution. - **Dashboard**: The framework provides a visual dashboard for monitoring the agent's performance and activities. - **Dynamic Temperature Control**: The Swarm Agent supports dynamic temperature control, allowing for adjustments to the model's output diversity during task execution. - **Autosave and State Management**: The agent can save its state automatically, enabling seamless resumption of tasks after interruptions or system restarts. - **Self-Healing and Error Handling**: The framework incorporates self-healing and error-handling mechanisms to ensure robust and reliable operation. - **Code Interpretation**: The agent can interpret and execute code snippets, expanding its capabilities for tasks involving programming or scripting. - **Multimodal Support**: The framework supports multimodal inputs, enabling the agent to process and reason about various data types, such as text, images, and audio. - **Tokenization and Token Management**: The Swarm Agent provides tokenization capabilities, enabling efficient management of token usage and context window truncation. - **Sentiment Analysis**: The agent can perform sentiment analysis on its generated outputs, allowing for evaluation and adjustment of responses based on sentiment thresholds. - **Output Filtering and Cleaning**: The framework supports output filtering and cleaning, ensuring that generated responses adhere to specific criteria or guidelines. - **Asynchronous and Concurrent Execution**: The Swarm Agent supports asynchronous and concurrent task execution, enabling efficient parallelization and scaling of operations. - **Planning and Reasoning**: The agent can engage in planning and reasoning processes, leveraging techniques such as algorithm of thoughts and chain of thoughts to enhance decision-making and task execution. - **Agent Operations and Monitoring**: The framework provides integration with agent operations and monitoring tools, enabling real-time monitoring and management of the agent's activities. ## Getting Started First run the following: ```bash pip3 install -U swarms ``` And, then now you can get started with the following: ```python import os from swarms import Agent from swarm_models import OpenAIChat from swarms.prompts.finance_agent_sys_prompt import ( FINANCIAL_AGENT_SYS_PROMPT, ) # Get the OpenAI API key from the environment variable api_key = os.getenv("OPENAI_API_KEY") # Create an instance of the OpenAIChat class model = OpenAIChat( api_key=api_key, model_name="gpt-4o-mini", temperature=0.1 ) # Initialize the agent agent = Agent( agent_name="Financial-Analysis-Agent_sas_chicken_eej", system_prompt=FINANCIAL_AGENT_SYS_PROMPT, llm=model, max_loops=1, autosave=True, dashboard=False, verbose=True, dynamic_temperature_enabled=True, saved_state_path="finance_agent.json", user_name="swarms_corp", retry_attempts=1, context_length=200000, return_step_meta=False, output_type="str", ) agent.run( "How can I establish a ROTH IRA to buy stocks and get a tax break? What are the criteria" ) print(out) ``` This example initializes an instance of the `Agent` class with an OpenAI language model and a maximum of 3 loops. The `run()` method is then called with a task to generate a report on financial performance, and the agent's response is printed. ## Advanced Usage The Swarm Agent provides numerous advanced features and customization options. Here are a few examples of how to leverage these features: ### Tool Integration To integrate tools with the Swarm Agent, you can pass a list of callable functions with types and doc strings to the `tools` parameter when initializing the `Agent` instance. The agent will automatically convert these functions into an OpenAI function calling schema and make them available for use during task execution. ## Requirements for a tool - Function - With types - with doc strings ```python from swarms import Agent from swarm_models import OpenAIChat from swarms_memory import ChromaDB import subprocess import os # Making an instance of the ChromaDB class memory = ChromaDB( metric="cosine", n_results=3, output_dir="results", docs_folder="docs", ) # Model model = OpenAIChat( api_key=os.getenv("OPENAI_API_KEY"), model_name="gpt-4o-mini", temperature=0.1, ) # Tools in swarms are simple python functions and docstrings def terminal( code: str, ): """ Run code in the terminal. Args: code (str): The code to run in the terminal. Returns: str: The output of the code. """ out = subprocess.run( code, shell=True, capture_output=True, text=True ).stdout return str(out) def browser(query: str): """ Search the query in the browser with the `browser` tool. Args: query (str): The query to search in the browser. Returns: str: The search results. """ import webbrowser url = f"https://www.google.com/search?q={query}" webbrowser.open(url) return f"Searching for {query} in the browser." def create_file(file_path: str, content: str): """ Create a file using the file editor tool. Args: file_path (str): The path to the file. content (str): The content to write to the file. Returns: str: The result of the file creation operation. """ with open(file_path, "w") as file: file.write(content) return f"File {file_path} created successfully." def file_editor(file_path: str, mode: str, content: str): """ Edit a file using the file editor tool. Args: file_path (str): The path to the file. mode (str): The mode to open the file in. content (str): The content to write to the file. Returns: str: The result of the file editing operation. """ with open(file_path, mode) as file: file.write(content) return f"File {file_path} edited successfully." # Agent agent = Agent( agent_name="Devin", system_prompt=( "Autonomous agent that can interact with humans and other" " agents. Be Helpful and Kind. Use the tools provided to" " assist the user. Return all code in markdown format." ), llm=model, max_loops="auto", autosave=True, dashboard=False, streaming_on=True, verbose=True, stopping_token="", interactive=True, tools=[terminal, browser, file_editor, create_file], streaming=True, long_term_memory=memory, ) # Run the agent out = agent( "Create a CSV file with the latest tax rates for C corporations in the following ten states and the District of Columbia: Alabama, California, Florida, Georgia, Illinois, New York, North Carolina, Ohio, Texas, and Washington." ) print(out) ``` ### Long-term Memory Management The Swarm Agent supports integration with various vector databases for long-term memory management. You can pass an instance of a `BaseVectorDatabase` implementation to the `long_term_memory` parameter when initializing the `Agent`. ```python import os from swarms_memory import ChromaDB from swarms import Agent from swarm_models import Anthropic from swarms.prompts.finance_agent_sys_prompt import ( FINANCIAL_AGENT_SYS_PROMPT, ) # Initilaize the chromadb client chromadb = ChromaDB( metric="cosine", output_dir="fiance_agent_rag", # docs_folder="artifacts", # Folder of your documents ) # Model model = Anthropic(anthropic_api_key=os.getenv("ANTHROPIC_API_KEY")) # Initialize the agent agent = Agent( agent_name="Financial-Analysis-Agent", system_prompt=FINANCIAL_AGENT_SYS_PROMPT, agent_description="Agent creates ", llm=model, max_loops="auto", autosave=True, dashboard=False, verbose=True, streaming_on=True, dynamic_temperature_enabled=True, saved_state_path="finance_agent.json", user_name="swarms_corp", retry_attempts=3, context_length=200000, long_term_memory=chromadb, ) agent.run( "What are the components of a startups stock incentive equity plan" ) ``` ### Document Ingestion The Swarm Agent can ingest various types of documents, such as PDFs, text files, Markdown files, and JSON files. You can pass a list of document paths or contents to the `docs` parameter when initializing the `Agent`. ```python from swarms.structs import Agent # Initialize the agent with documents agent = Agent(llm=llm, max_loops=3, docs=["path/to/doc1.pdf", "path/to/doc2.txt"]) ``` ### Interactive Mode The Swarm Agent supports an interactive mode, where users can engage in real-time communication with the agent. To enable interactive mode, set the `interactive` parameter to `True` when initializing the `Agent`. ```python from swarms.structs import Agent # Initialize the agent in interactive mode agent = Agent(llm=llm, max_loops=3, interactive=True) # Run the agent in interactive mode agent.interactive_run() ``` ### Sentiment Analysis The Swarm Agent can perform sentiment analysis on its generated outputs using a sentiment analyzer function. You can pass a callable function to the `sentiment_analyzer` parameter when initializing the `Agent`. ```python from swarms.structs import Agent from my_sentiment_analyzer import sentiment_analyzer_function # Initialize the agent with a sentiment analyzer agent = Agent( agent_name = "sentiment-analyzer-agent-01", system_prompt="..." llm=llm, max_loops=3, sentiment_analyzer=sentiment_analyzer_function) ``` ### Undo Functionality ```python # Feature 2: Undo functionality response = agent.run("Another task") print(f"Response: {response}") previous_state, message = agent.undo_last() print(message) ``` ### Response Filtering ```python # Feature 3: Response filtering agent.add_response_filter("report") response = agent.filtered_run("Generate a report on finance") print(response) ``` ### Saving and Loading State ```python # Save the agent state agent.save_state('saved_flow.json') # Load the agent state agent = Agent(llm=llm_instance, max_loops=5) agent.load('saved_flow.json') agent.run("Continue with the task") ``` ### Async and Concurrent Execution ```python # Run a task concurrently response = await agent.run_concurrent("Concurrent task") print(response) # Run multiple tasks concurrently tasks = [ {"task": "Task 1"}, {"task": "Task 2", "img": "path/to/image.jpg"}, {"task": "Task 3", "custom_param": 42} ] responses = agent.bulk_run(tasks) print(responses) ``` ### Various other settings ```python # # Convert the agent object to a dictionary print(agent.to_dict()) print(agent.to_toml()) print(agent.model_dump_json()) print(agent.model_dump_yaml()) # Ingest documents into the agent's knowledge base agent.ingest_docs("your_pdf_path.pdf") # Receive a message from a user and process it agent.receive_message(name="agent_name", message="message") # Send a message from the agent to a user agent.send_agent_message(agent_name="agent_name", message="message") # Ingest multiple documents into the agent's knowledge base agent.ingest_docs("your_pdf_path.pdf", "your_csv_path.csv") # Run the agent with a filtered system prompt agent.filtered_run( "How can I establish a ROTH IRA to buy stocks and get a tax break? What are the criteria?" ) # Run the agent with multiple system prompts agent.bulk_run( [ "How can I establish a ROTH IRA to buy stocks and get a tax break? What are the criteria?", "Another system prompt", ] ) # Add a memory to the agent agent.add_memory("Add a memory to the agent") # Check the number of available tokens for the agent agent.check_available_tokens() # Perform token checks for the agent agent.tokens_checks() # Print the dashboard of the agent agent.print_dashboard() # Fetch all the documents from the doc folders agent.get_docs_from_doc_folders() # Activate agent ops agent.activate_agentops() agent.check_end_session_agentops() # Dump the model to a JSON file agent.model_dump_json() print(agent.to_toml()) ```