{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "uF4xtbj5CRl6" }, "source": [ "# Welcome to the Edith Project!\n", "This project focuses on developing a context-based chatbot where users provide a context and can ask questions about it. The chatbot is trained using the SQuAD1.1 dataset, which provides a rich set of question-answering examples. In this notebook, we fine-tune a BERT model to adapt it to our specific task. Throughout the notebook, we will provide clear explanations for each step of the code, ensuring that readers can easily follow along and understand how the model is being prepared, trained, and evaluated. By the end, you’ll see how everything comes together to build a powerful question-answering system.\n", "\n", "## Sections:\n", "### 1. Data Preparation\n", "In this section, we will preprocess the SQuAD1.1 dataset, convert it into the right format, and tokenize the input text using BERT's tokenizer. The goal is to prepare our data efficiently for model training.\n", "\n", "### 2. Model Selection\n", "We will select the BERT model as the backbone of our chatbot and explain why this pre-trained transformer model is suitable for question-answering tasks. We’ll also load the pre-trained model and tokenizer to kick-start the fine-tuning process.\n", "\n", "### 3. Fine-Tuning and Training\n", "Here, we’ll describe how we fine-tune the BERT model for our question-answering task, covering details such as learning rates, batch sizes, and optimization steps. We will also monitor key metrics like loss and F1 score during training to gauge performance.\n", "\n", "### 4. Evaluation and Inference\n", "After training, we will evaluate the model using various metrics like Exact Match (EM) and F1 score. We’ll also demonstrate how the chatbot handles inference, where the user provides a context, and the chatbot returns the most relevant answer.\n", "\n", "### 5. Conclusion\n", "In this final section, we’ll summarize the outcomes of the project, highlighting key performance metrics and possible improvements for future iterations of the Edith Project.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", "execution": { "iopub.execute_input": "2024-10-20T05:15:34.102347Z", "iopub.status.busy": "2024-10-20T05:15:34.101700Z", "iopub.status.idle": "2024-10-20T05:15:46.300086Z", "shell.execute_reply": "2024-10-20T05:15:46.299068Z", "shell.execute_reply.started": "2024-10-20T05:15:34.102305Z" }, "id": "IFjuS7x9CRl8" }, "outputs": [], "source": [ "!pip install -q torch transformers datasets tqdm scikit-learn rouge-score nltk datasets\n", "\n", "import re\n", "import pandas as pd\n", "import numpy as np\n", "import torch\n", "from torch.utils.data import Dataset, DataLoader\n", "from transformers import BertTokenizerFast, BertForQuestionAnswering, get_scheduler\n", "from tqdm import tqdm\n", "import os\n", "from sklearn.metrics import f1_score, precision_score, recall_score\n", "from rouge_score import rouge_scorer\n", "from nltk.translate.bleu_score import sentence_bleu\n", "from IPython.display import display, HTML\n", "from datasets import load_dataset\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "from scipy.ndimage import gaussian_filter1d\n", "\n", "warnings.filterwarnings(\"ignore\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "_lQkGREWCRl9" }, "source": [ "### 1. Data Preparation\n", "\n", "In this section, we will load our data. The dataset we are using for this project is the well-known **SQuAD1.1** (Stanford Question Answering Dataset), developed by Stanford University. SQuAD1.1 has a wide range of applications in natural language understanding and question-answering tasks. This dataset is readily available on Hugging Face, and we can load it directly using the `datasets` library.\n", "\n", "To avoid memory-related issues, we will only utilize the training dataset and split it into 95% for training and 5% for validation. The main reason for this split is that SQuAD1.1 contains approximately 87K samples, so even 5% (around 4.3K) provides a substantial number of examples for evaluation while ensuring the majority of data is used for training. This split ensures that our model has enough data to learn effectively while still being able to test its performance on a meaningful validation set.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "referenced_widgets": [ "8d0b370df64747b488f187e906b24f97", "66da7373074b464d99d90a643d56761c", "e8e4e46cc960422880747857854c489c", "f443fe30cc2b49d482045a3c65d2d468", "e8e6e53cf4974ff39ec4bdc9c6b26533" ] }, "execution": { "iopub.execute_input": "2024-10-20T05:16:33.843115Z", "iopub.status.busy": "2024-10-20T05:16:33.842244Z", "iopub.status.idle": "2024-10-20T05:16:38.885273Z", "shell.execute_reply": "2024-10-20T05:16:38.884335Z", "shell.execute_reply.started": "2024-10-20T05:16:33.843071Z" }, "id": "bz_ahmL8CRl9", "outputId": "168c6f05-61ea-484f-b096-0cdfe40eb717" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8d0b370df64747b488f187e906b24f97", "version_major": 2, "version_minor": 0 }, "text/plain": [ "README.md: 0%| | 0.00/7.62k [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ContextQuestionAnswerAnswer Start IndexAnswer End Index
0Starting in the 1890s and stretching in some p...Which industries did European settlers in Alas...fishing and logging496515
1Details of school casualties had been under no...How many students were disabled in Xinhua?546450453
2Different religious traditions assign differin...What are some religious traditions that are fo...expansive powers and abilities, psychological ...120233
3The First Great Awakening was an evangelical a...What movement made a permanent mark on Protest...The First Great Awakening025
4Bacteria can be grown in the laboratory on nut...What are poultry eggs used for aside from cons...Many vaccines to infectious diseases can be gr...120197
\n", "" ], "text/plain": [ " Context \\\n", "0 Starting in the 1890s and stretching in some p... \n", "1 Details of school casualties had been under no... \n", "2 Different religious traditions assign differin... \n", "3 The First Great Awakening was an evangelical a... \n", "4 Bacteria can be grown in the laboratory on nut... \n", "\n", " Question \\\n", "0 Which industries did European settlers in Alas... \n", "1 How many students were disabled in Xinhua? \n", "2 What are some religious traditions that are fo... \n", "3 What movement made a permanent mark on Protest... \n", "4 What are poultry eggs used for aside from cons... \n", "\n", " Answer Answer Start Index \\\n", "0 fishing and logging 496 \n", "1 546 450 \n", "2 expansive powers and abilities, psychological ... 120 \n", "3 The First Great Awakening 0 \n", "4 Many vaccines to infectious diseases can be gr... 120 \n", "\n", " Answer End Index \n", "0 515 \n", "1 453 \n", "2 233 \n", "3 25 \n", "4 197 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an empty list to hold the rows\n", "train_rows = []\n", "\n", "# Iterate over the training data and collect relevant fields\n", "for data in train_data:\n", " # For each answer in the list, create a new row\n", " for answer, start_index in zip(data['answers']['text'], data['answers']['answer_start']):\n", " # Calculate the end index of the answer\n", " end_index = start_index + len(answer) if start_index is not None else 0\n", "\n", " # Append a dictionary for each entry\n", " train_rows.append({\n", " 'ID': data['id'],\n", " 'Title': data['title'],\n", " 'Context': data['context'],\n", " 'Question': data['question'],\n", " 'Answer': answer,\n", " 'Answer Start Index': start_index if start_index is not None else 0,\n", " 'Answer End Index': end_index if end_index is not None else 0\n", " })\n", "\n", "# Convert the list of dictionaries into a DataFrame\n", "train_df = pd.DataFrame(train_rows)\n", "\n", "# Replace any missing values (NaN) in \"Answer Start Index\" or \"Answer End Index\" with 0\n", "train_df['Answer Start Index'] = train_df['Answer Start Index'].fillna(0).astype(int)\n", "train_df['Answer End Index'] = train_df['Answer End Index'].fillna(0).astype(int)\n", "\n", "# Specify the columns to include in the DataFrame\n", "train_df = train_df[['Context', 'Question', 'Answer', 'Answer Start Index', 'Answer End Index']]\n", "\n", "# Display the first few rows\n", "train_df.head()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:29:05.162471Z", "iopub.status.busy": "2024-10-20T05:29:05.161536Z", "iopub.status.idle": "2024-10-20T05:29:05.768166Z", "shell.execute_reply": "2024-10-20T05:29:05.767242Z", "shell.execute_reply.started": "2024-10-20T05:29:05.162431Z" }, "id": "absM2smvCRl-", "outputId": "2ec3e4ad-0177-48ab-f986-bffc3e508d28" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ContextQuestionAnswerAnswer Start IndexAnswer End Index
0The Pew Forum on Religion & Public Life ranks ...What percentage of Egyptians polled support de...84%468471
1The Ann Arbor Hands-On Museum is located in a ...Ann Arbor ranks 1st among what goods sold?books402407
2One important aspect of the rule-of-law initia...In developing countries, who makes most of the...the executive612625
3In December 1547, Francis was in Malacca (Mala...Who impressed Xavier by taking notes in church?Anjiro160166
4Groups are also applied in many other mathemat...What represents elements of the fundamental gr...loops489494
\n", "
" ], "text/plain": [ " Context \\\n", "0 The Pew Forum on Religion & Public Life ranks ... \n", "1 The Ann Arbor Hands-On Museum is located in a ... \n", "2 One important aspect of the rule-of-law initia... \n", "3 In December 1547, Francis was in Malacca (Mala... \n", "4 Groups are also applied in many other mathemat... \n", "\n", " Question Answer \\\n", "0 What percentage of Egyptians polled support de... 84% \n", "1 Ann Arbor ranks 1st among what goods sold? books \n", "2 In developing countries, who makes most of the... the executive \n", "3 Who impressed Xavier by taking notes in church? Anjiro \n", "4 What represents elements of the fundamental gr... loops \n", "\n", " Answer Start Index Answer End Index \n", "0 468 471 \n", "1 402 407 \n", "2 612 625 \n", "3 160 166 \n", "4 489 494 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an empty list to hold the rows\n", "val_rows = []\n", "\n", "# Iterate over the training data and collect relevant fields\n", "for data in val_data:\n", " # For each answer in the list, create a new row\n", " for answer, start_index in zip(data['answers']['text'], data['answers']['answer_start']):\n", " # Calculate the end index of the answer\n", " end_index = start_index + len(answer) if start_index is not None else 0\n", "\n", " # Append a dictionary for each entry\n", " val_rows.append({\n", " 'ID': data['id'],\n", " 'Title': data['title'],\n", " 'Context': data['context'],\n", " 'Question': data['question'],\n", " 'Answer': answer,\n", " 'Answer Start Index': start_index if start_index is not None else 0,\n", " 'Answer End Index': end_index if end_index is not None else 0\n", " })\n", "\n", "# Convert the list of dictionaries into a DataFrame\n", "val_df = pd.DataFrame(val_rows)\n", "\n", "# Replace any missing values (NaN) in \"Answer Start Index\" or \"Answer End Index\" with 0\n", "val_df['Answer Start Index'] = val_df['Answer Start Index'].fillna(0).astype(int)\n", "val_df['Answer End Index'] = val_df['Answer End Index'].fillna(0).astype(int)\n", "\n", "# Specify the columns to include in the DataFrame\n", "val_df = val_df[['Context', 'Question', 'Answer', 'Answer Start Index', 'Answer End Index']]\n", "\n", "# Display the first few rows\n", "val_df.head()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:30:08.174343Z", "iopub.status.busy": "2024-10-20T05:30:08.173933Z", "iopub.status.idle": "2024-10-20T05:30:11.267282Z", "shell.execute_reply": "2024-10-20T05:30:11.266457Z", "shell.execute_reply.started": "2024-10-20T05:30:08.174305Z" }, "id": "H-5ifapQCRl-" }, "outputs": [], "source": [ "train_df.to_csv(\"squad_train.csv\", index=False)\n", "val_df.to_csv(\"squad_val.csv\", index=False)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "hG5pwTKsCRl-" }, "source": [ "1. **Loading the SQuAD1.1 dataset**: The code begins by loading the SQuAD1.1 dataset using the `load_dataset` function from the `datasets` library. The training set (`train_data`) is then split into training and validation sets using an 95:5 ratio. This ensures that a portion of the original training data is reserved for validation purposes during model training.\n", "\n", "2. **Initializing lists for storing data**: An empty list, `train_rows`, is created to store the processed rows of data from the training set. This will later be converted into a DataFrame for easier manipulation and saving.\n", "\n", "3. **Processing training data**: The code iterates over each entry in the `train_data` split. For each entry, it extracts the ID, title, context, question, and answers from the dataset. For each answer, the corresponding start and end indices are calculated and stored in a dictionary.\n", "\n", "4. **Converting training data to DataFrame**: Once all rows of the training data have been processed, they are converted into a pandas DataFrame. This allows for easier data manipulation, including replacing missing values and ensuring that the start and end indices are integers.\n", "\n", "5. **Processing validation data**: The validation data undergoes the same process as the training data. The relevant fields (ID, title, context, question, and answers) are extracted, the start and end indices are calculated, and the data is stored in a list of dictionaries.\n", "\n", "6. **Converting validation data to DataFrame**: Similar to the training data, the processed validation data is converted into a pandas DataFrame. Any missing values in the start and end indices are replaced with zeroes, and the columns are organized for consistency.\n", "\n", "7. **Saving data to CSV**: Finally, the training and validation DataFrames are saved as CSV files (`squad_train.csv` and `squad_val.csv`). This makes it easy to load the preprocessed data for further analysis or model training.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:18.689006Z", "iopub.status.busy": "2024-10-20T05:42:18.688153Z", "iopub.status.idle": "2024-10-20T05:42:18.870198Z", "shell.execute_reply": "2024-10-20T05:42:18.869197Z", "shell.execute_reply.started": "2024-10-20T05:42:18.688967Z" }, "id": "KXHxa1-_CRl-" }, "outputs": [], "source": [ "# Create a custom dataset class for our data\n", "class SQuADataset(Dataset):\n", " def __init__(self, df, tokenizer, max_len):\n", " self.df = df\n", " self.tokenizer = tokenizer\n", " self.max_len = max_len\n", "\n", " def __len__(self):\n", " return self.df.shape[0]\n", "\n", " def __getitem__(self, idx):\n", " context = self.df.iloc[idx]['Context']\n", " question = self.df.iloc[idx]['Question']\n", " answer = self.df.iloc[idx]['Answer']\n", " answer_start = self.df.iloc[idx]['Answer Start Index']\n", "\n", " encoding = self.tokenizer.encode_plus(\n", " context,\n", " question,\n", " max_length=self.max_len,\n", " truncation=True,\n", " padding='max_length',\n", " return_offsets_mapping=True,\n", " return_token_type_ids=True,\n", " return_attention_mask=True,\n", " return_tensors='pt'\n", " )\n", "\n", " offset_mapping = encoding['offset_mapping'].squeeze().tolist()\n", "\n", " # Initialize start and end positions of the answer\n", " start_position = 0\n", " end_position = 0\n", "\n", " # Find the token positions corresponding to the answer if it's not unanswerable\n", " if answer_start != -1:\n", " for idx, (start, end) in enumerate(offset_mapping):\n", " if start <= answer_start < end:\n", " start_position = idx\n", " if start < answer_start + len(answer) <= end:\n", " end_position = idx\n", " break\n", "\n", " return {\n", " 'input_ids': encoding['input_ids'].squeeze(),\n", " 'attention_mask': encoding['attention_mask'].squeeze(),\n", " 'token_type_ids': encoding['token_type_ids'].squeeze(),\n", " 'start_positions': torch.tensor(start_position),\n", " 'end_positions': torch.tensor(end_position)\n", " }\n", "\n", "\n", "# Set up the tokenizer and model\n", "tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')\n", "\n", "# Create the dataset and data loader\n", "train_dataset = SQuADataset(train_df, tokenizer, max_len=256)\n", "val_dataset = SQuADataset(val_df, tokenizer, max_len=256)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:19.426931Z", "iopub.status.busy": "2024-10-20T05:42:19.426035Z", "iopub.status.idle": "2024-10-20T05:42:19.441008Z", "shell.execute_reply": "2024-10-20T05:42:19.439956Z", "shell.execute_reply.started": "2024-10-20T05:42:19.426887Z" }, "id": "HKSWQwK-CRl_", "outputId": "8f7d41d1-41dc-4683-aa56-fd43a208f98a" }, "outputs": [ { "data": { "text/plain": [ "{'input_ids': tensor([ 101, 3225, 1999, 1996, 13678, 1998, 10917, 1999, 2070, 3182,\n", " 2000, 1996, 2220, 28088, 1010, 2751, 18545, 1999, 7397, 1998,\n", " 1996, 3518, 19898, 3700, 2716, 5190, 1997, 11257, 1998, 7322,\n", " 2000, 7397, 1012, 7397, 2001, 3985, 5100, 2004, 2019, 4114,\n", " 3700, 1999, 4878, 1012, 7397, 1005, 1055, 3007, 1010, 2029,\n", " 2018, 2042, 1999, 4133, 2912, 2127, 5518, 1010, 2001, 2333,\n", " 2167, 2000, 2238, 4887, 1012, 2810, 1997, 1996, 7397, 3099,\n", " 1005, 1055, 7330, 2211, 2008, 2168, 2095, 1012, 2647, 7489,\n", " 2013, 5120, 1998, 4701, 2036, 3876, 1999, 4643, 7397, 1010,\n", " 2073, 2027, 3133, 1996, 5645, 1998, 15899, 6088, 1012, 102,\n", " 2029, 6088, 2106, 2647, 7322, 1999, 7397, 4088, 1029, 102,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0]),\n", " 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),\n", " 'token_type_ids': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),\n", " 'start_positions': tensor(94),\n", " 'end_positions': tensor(96)}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_dataset[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:20.436694Z", "iopub.status.busy": "2024-10-20T05:42:20.436305Z", "iopub.status.idle": "2024-10-20T05:42:20.441943Z", "shell.execute_reply": "2024-10-20T05:42:20.440869Z", "shell.execute_reply.started": "2024-10-20T05:42:20.436643Z" }, "id": "WAeJpg0wCRl_" }, "outputs": [], "source": [ "train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n", "val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)" ] }, { "cell_type": "markdown", "metadata": { "id": "HOkWDbSnCRl_" }, "source": [ "1. **Creating a Custom Dataset Class**: This code defines a custom dataset class called `SQuADataset`, which is built on top of PyTorch's `Dataset` class. This class is specifically designed to handle data from the SQuAD dataset, making it easier to load and work with during model training. It takes a DataFrame with context, questions, and answers, along with a tokenizer and a maximum sequence length as its parameters.\n", "\n", "2. **Implementing the `__len__` Method**: The `__len__` method is implemented to return the total number of samples in the DataFrame. This is important for the DataLoader to know how many items it can iterate over when creating batches.\n", "\n", "3. **Implementing the `__getitem__` Method**: In the `__getitem__` method, we retrieve the data for a specific index. This includes extracting the context, question, answer, and the starting position of the answer from the DataFrame. This method is crucial because it enables the DataLoader to access each sample during training.\n", "\n", "4. **Encoding Context and Question**: Here, the `tokenizer.encode_plus` function is used to convert the context and question into a format suitable for the BERT model. It generates input IDs, attention masks, and token type IDs, ensuring that the sequence length does not exceed the specified maximum. This encoding step is essential for preparing the text for the model.\n", "\n", "5. **Extracting Offset Mappings**: The encoded data includes offset mappings, which help us locate where in the original text the tokens correspond. This is particularly useful for identifying the exact position of the answer within the context.\n", "\n", "6. **Determining Answer Positions**: The code checks if the answer is unanswerable (indicated by `answer_start` being `-1`). If there is an answer, it uses the offset mappings to find the token positions that correspond to the answer in the context. These start and end positions will be used later for training the model.\n", "\n", "7. **Returning Encoded Inputs**: The `__getitem__` method returns a dictionary that contains the encoded input IDs, attention masks, token type IDs, and the start and end positions of the answer as tensors. This format is ready for input into the BERT model, simplifying the training process.\n", "\n", "8. **Setting Up the Tokenizer**: The code initializes a tokenizer using `BertTokenizerFast`, which is known for its speed and efficiency. This tokenizer will be used to prepare the context and questions in our dataset.\n", "\n", "9. **Creating Dataset Instances**: Two instances of the `SQuADataset` class are created: one for the training data and another for validation. The maximum sequence length is set to 256 tokens, which is a common practice to ensure a balance between memory usage and performance.\n", "\n", "10. **Setting Up Data Loaders**: Finally, the code sets up `DataLoader` instances for both the training and validation datasets. The training DataLoader shuffles the data to provide the model with a varied set of examples in each epoch, while the validation DataLoader maintains the order for evaluation. These DataLoaders make batch processing during training and evaluation efficient and straightforward.\n", "\n", "Overall, this code lays the groundwork for preparing the SQuAD dataset for training a BERT model, ensuring that the input data is properly formatted and that the answer positions are accurately identified for effective question answering.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "6MRByJNaCRl_" }, "source": [ "### 2. Model Selection\n", "\n", "In this section, we are selecting the BERT model, which stands for **Bidirectional Encoder Representations from Transformers**. BERT was developed by Google and is one of the most powerful models for natural language understanding tasks. What sets BERT apart from previous models is its **bidirectional training of transformers**—meaning it looks at the entire context (both left and right) of a word in a sentence, rather than just reading text from left to right or right to left. This allows it to gain a much deeper understanding of the context in which a word appears.\n", "\n", "BERT was pre-trained on a massive amount of text data, including Wikipedia and books, using two key tasks:\n", "1. **Masked Language Modeling (MLM)**: Where random words in a sentence are masked, and the model learns to predict them based on the surrounding context. This allows BERT to capture relationships between words.\n", "2. **Next Sentence Prediction (NSP)**: Where BERT learns to predict whether two sentences follow each other in the text, which helps the model grasp the relationships between different sentences.\n", "\n", "For our task of **Question Answering**, we use `BertForQuestionAnswering`, a version of BERT that has been fine-tuned for answering questions given a context. The model takes both the context and question as input and predicts the start and end positions of the answer within the context. This makes BERT an excellent choice for our project, as its deep contextual understanding enables it to pinpoint the answer even when the relationship between the question and context is complex.\n", "\n", "In this particular code block, the line `model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')` loads a pre-trained version of BERT specifically fine-tuned for the question-answering task. The `bert-base-uncased` version uses a vocabulary where all words are lowercased, which simplifies the task of dealing with different word cases in the input text.\n", "\n", "Next, we move the model to the GPU using `model.to(device)`. This step ensures that if a GPU (CUDA) is available, the model will use it for faster computation, which is important for both training and inference. If no GPU is available, the model will default to the CPU, though it will be slower.\n", "\n", "In summary, BERT is a great fit for our task because:\n", "- Its bidirectional nature helps it understand the question and context deeply.\n", "- It has already been fine-tuned for question-answering tasks, saving us the effort of training from scratch.\n", "- It's robust and has been proven to perform well on tasks like SQuAD, which is the dataset we are using.\n", "\n", "By leveraging BERT, we can expect strong performance on our question-answering task, allowing the model to identify precise answers within a given context.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:21.891350Z", "iopub.status.busy": "2024-10-20T05:42:21.890979Z", "iopub.status.idle": "2024-10-20T05:42:22.349968Z", "shell.execute_reply": "2024-10-20T05:42:22.348904Z", "shell.execute_reply.started": "2024-10-20T05:42:21.891314Z" }, "id": "xKM-qTWdCRl_", "outputId": "b04e78e1-02fd-447a-b7c9-7e925af7763e" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] }, { "data": { "text/plain": [ "BertForQuestionAnswering(\n", " (bert): BertModel(\n", " (embeddings): BertEmbeddings(\n", " (word_embeddings): Embedding(30522, 768, padding_idx=0)\n", " (position_embeddings): Embedding(512, 768)\n", " (token_type_embeddings): Embedding(2, 768)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (encoder): BertEncoder(\n", " (layer): ModuleList(\n", " (0-11): 12 x BertLayer(\n", " (attention): BertAttention(\n", " (self): BertSdpaSelfAttention(\n", " (query): Linear(in_features=768, out_features=768, bias=True)\n", " (key): Linear(in_features=768, out_features=768, bias=True)\n", " (value): Linear(in_features=768, out_features=768, bias=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (output): BertSelfOutput(\n", " (dense): Linear(in_features=768, out_features=768, bias=True)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " )\n", " (intermediate): BertIntermediate(\n", " (dense): Linear(in_features=768, out_features=3072, bias=True)\n", " (intermediate_act_fn): GELUActivation()\n", " )\n", " (output): BertOutput(\n", " (dense): Linear(in_features=3072, out_features=768, bias=True)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " )\n", " )\n", " )\n", " )\n", " (qa_outputs): Linear(in_features=768, out_features=2, bias=True)\n", ")" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\n", "device\n", "\n", "model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')\n", "model.to(device)" ] }, { "cell_type": "markdown", "metadata": { "id": "hrN0kscZCRmA" }, "source": [ "### 3. Fine-Tuning and Training\n", "\n", "**Pre-Code Explanation**:\n", "In this section, we define the function that will fine-tune the BERT model for our specific question-answering task. Fine-tuning a pre-trained model like BERT involves adjusting the weights of the model slightly to specialize it on a new dataset. We start by setting hyperparameters such as the number of epochs, the learning rate (`3e-5`), and the optimizer. In this case, we use the `AdamW` optimizer, which is a variation of Adam designed to work well with transformers like BERT by incorporating weight decay to reduce overfitting.\n", "\n", "A **scheduler** is set up to adjust the learning rate gradually over the training process using the `get_scheduler` function, which linearly decays the learning rate as training progresses. This helps maintain model stability during training and prevents drastic fluctuations in the loss. The function accepts the training and validation datasets, performs training in epochs, and logs important statistics such as loss values to track the model's performance.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:57.738843Z", "iopub.status.busy": "2024-10-20T05:42:57.738090Z", "iopub.status.idle": "2024-10-20T05:42:57.750717Z", "shell.execute_reply": "2024-10-20T05:42:57.749771Z", "shell.execute_reply.started": "2024-10-20T05:42:57.738792Z" }, "id": "g6atLKO1CRmA" }, "outputs": [], "source": [ "# Define the training function\n", "def train_model(model, training_data, validation_data, epochs=5, learning_rate=3e-5):\n", " loss_history = [] # Store loss values\n", " optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)\n", "\n", " # Set up the learning rate scheduler\n", " total_steps = epochs * len(training_data)\n", " scheduler = get_scheduler(\n", " name=\"linear\", optimizer=optimizer, num_warmup_steps=0, num_training_steps=total_steps\n", " )\n", "\n", " for ep in range(epochs): # Loop through epochs\n", " model.train()\n", " epoch_train_loss = 0\n", " train_progress = tqdm(training_data, desc=\"Training\")\n", "\n", " for step_idx, step_batch in enumerate(train_progress):\n", " step_batch = {key: value.to(device) for key, value in step_batch.items()}\n", " output = model(**step_batch)\n", " loss_value = output.loss\n", " epoch_train_loss += loss_value.item()\n", " loss_history.append(loss_value.item())\n", "\n", " optimizer.zero_grad()\n", " loss_value.backward()\n", " optimizer.step()\n", " scheduler.step()\n", "\n", " # Print the average loss for the last 100 batches\n", " if (step_idx + 1) % 100 == 0:\n", " avg_last_100 = sum(loss_history[-100:]) / len(loss_history[-100:])\n", " print(f\"Avg loss for last 100 steps (step {step_idx + 1}): {avg_last_100}\")\n", "\n", " # Update progress bar with the current loss\n", " train_progress.set_postfix({\"loss\": loss_value.item()})\n", "\n", " average_train_loss = epoch_train_loss / len(training_data)\n", " print(f\"Average Training Loss for Epoch {ep + 1}: {average_train_loss}\")\n", "\n", "\n", " model.eval()\n", " total_validation_loss = 0\n", " validation_progress = tqdm(validation_data, desc=\"Validation\")\n", "\n", " for val_idx, val_batch in enumerate(validation_progress):\n", " val_batch = {key: value.to(device) for key, value in val_batch.items()}\n", " with torch.no_grad():\n", " output = model(**val_batch)\n", " val_loss = output.loss\n", " total_validation_loss += val_loss.item()\n", "\n", " average_validation_loss = total_validation_loss / len(validation_data)\n", " print(f\"Average Validation Loss for Epoch {ep + 1}: {average_validation_loss}\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T05:42:58.376954Z", "iopub.status.busy": "2024-10-20T05:42:58.376364Z", "iopub.status.idle": "2024-10-20T11:14:12.678036Z", "shell.execute_reply": "2024-10-20T11:14:12.676408Z", "shell.execute_reply.started": "2024-10-20T05:42:58.376902Z" }, "id": "Cvywll7_CRmA", "outputId": "9f0415ca-48a9-4d43-e2da-e632e388feea" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Training: 4%|▍ | 99/2601 [02:28<1:02:40, 1.50s/it, loss=1.93]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 100): 2.9181237602233887\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 8%|▊ | 199/2601 [04:58<1:00:01, 1.50s/it, loss=1.58]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 200): 1.843290798664093\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 11%|█▏ | 299/2601 [07:28<57:40, 1.50s/it, loss=1.44] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 300): 1.6185305523872375\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 15%|█▌ | 399/2601 [09:58<55:06, 1.50s/it, loss=1.56] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 400): 1.5289679777622223\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 19%|█▉ | 499/2601 [12:28<52:31, 1.50s/it, loss=1.72] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 500): 1.447012750506401\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 23%|██▎ | 599/2601 [14:58<50:03, 1.50s/it, loss=1.71] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 600): 1.3697541028261184\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 27%|██▋ | 699/2601 [17:28<47:32, 1.50s/it, loss=1.34] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 700): 1.3844239920377732\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 31%|███ | 799/2601 [19:58<45:05, 1.50s/it, loss=1.19] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 800): 1.2496907639503478\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 35%|███▍ | 899/2601 [22:28<42:47, 1.51s/it, loss=1.33] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 900): 1.2965397924184798\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 38%|███▊ | 999/2601 [24:58<40:03, 1.50s/it, loss=1.28] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1000): 1.2438341867923737\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 42%|████▏ | 1099/2601 [27:28<37:34, 1.50s/it, loss=0.851]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1100): 1.2373633629083634\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 46%|████▌ | 1199/2601 [29:58<35:01, 1.50s/it, loss=1.28] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1200): 1.1354780173301697\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 50%|████▉ | 1299/2601 [32:29<32:41, 1.51s/it, loss=1.13] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1300): 1.1638590478897095\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 54%|█████▍ | 1399/2601 [34:59<30:03, 1.50s/it, loss=0.963]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1400): 1.1318764191865922\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 58%|█████▊ | 1499/2601 [37:29<27:33, 1.50s/it, loss=0.97] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1500): 1.1282511121034622\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 61%|██████▏ | 1599/2601 [39:59<25:03, 1.50s/it, loss=1.43] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1600): 1.1512188524007798\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 65%|██████▌ | 1699/2601 [42:29<22:34, 1.50s/it, loss=0.924]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1700): 1.1037893587350844\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 69%|██████▉ | 1799/2601 [45:00<20:01, 1.50s/it, loss=1.37] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1800): 1.1223931908607483\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 73%|███████▎ | 1899/2601 [47:30<17:32, 1.50s/it, loss=1.22] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1900): 1.0834963840246201\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 77%|███████▋ | 1999/2601 [50:00<15:02, 1.50s/it, loss=1.3] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2000): 1.0893496036529542\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 81%|████████ | 2099/2601 [52:30<12:33, 1.50s/it, loss=0.894]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2100): 1.0850343543291092\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 85%|████████▍ | 2199/2601 [55:00<10:04, 1.50s/it, loss=0.973]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2200): 1.098406316637993\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 88%|████████▊ | 2299/2601 [57:30<07:33, 1.50s/it, loss=0.931]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2300): 1.0876016092300416\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 92%|█████████▏| 2399/2601 [1:00:00<05:02, 1.50s/it, loss=1.54]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2400): 1.0816582387685776\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 96%|█████████▌| 2499/2601 [1:02:30<02:33, 1.50s/it, loss=1.12] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2500): 1.0321623659133912\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|█████████▉| 2599/2601 [1:05:00<00:03, 1.50s/it, loss=1.02] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2600): 1.0567743647098542\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|██████████| 2601/2601 [1:05:03<00:00, 1.50s/it, loss=1.02] \n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Training Loss for Epoch 1: 1.295620406077303\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Validation: 100%|██████████| 137/137 [01:12<00:00, 1.90it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Validation Loss for Epoch 1: 0.9500351820113885\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 4%|▍ | 99/2601 [02:28<1:02:41, 1.50s/it, loss=0.697]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 100): 0.8142830759286881\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 8%|▊ | 199/2601 [04:58<1:00:13, 1.50s/it, loss=0.865]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 200): 0.8329307380318641\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 11%|█▏ | 299/2601 [07:29<57:33, 1.50s/it, loss=1.08] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 300): 0.7891313913464546\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 15%|█▌ | 399/2601 [09:59<55:04, 1.50s/it, loss=0.666]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 400): 0.7705841365456582\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 19%|█▉ | 499/2601 [12:29<52:39, 1.50s/it, loss=1.27] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 500): 0.8177345561981201\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 23%|██▎ | 599/2601 [14:59<50:00, 1.50s/it, loss=1.09] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 600): 0.8327104163169861\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 27%|██▋ | 699/2601 [17:29<47:32, 1.50s/it, loss=0.866]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 700): 0.810658627152443\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 31%|███ | 799/2601 [19:59<45:09, 1.50s/it, loss=0.88] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 800): 0.8449877372384071\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 35%|███▍ | 899/2601 [22:30<42:32, 1.50s/it, loss=0.661]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 900): 0.8116492775082588\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 38%|███▊ | 999/2601 [25:00<40:05, 1.50s/it, loss=1.19] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1000): 0.8410855442285537\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 42%|████▏ | 1099/2601 [27:30<37:36, 1.50s/it, loss=1.02] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1100): 0.804014807343483\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 46%|████▌ | 1199/2601 [30:00<35:11, 1.51s/it, loss=0.962]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1200): 0.8255293396115303\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 50%|████▉ | 1299/2601 [32:30<32:29, 1.50s/it, loss=0.837]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1300): 0.8005913615226745\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 54%|█████▍ | 1399/2601 [35:00<30:02, 1.50s/it, loss=1.16] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1400): 0.7952284491062165\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 58%|█████▊ | 1499/2601 [37:30<27:38, 1.51s/it, loss=0.865]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1500): 0.7911309421062469\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 61%|██████▏ | 1599/2601 [40:01<25:07, 1.50s/it, loss=0.47] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1600): 0.8090464314818382\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 65%|██████▌ | 1699/2601 [42:31<22:34, 1.50s/it, loss=0.98] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1700): 0.7942702674865723\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 69%|██████▉ | 1799/2601 [45:01<20:00, 1.50s/it, loss=0.549]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1800): 0.7940633481740952\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 73%|███████▎ | 1899/2601 [47:31<17:36, 1.50s/it, loss=0.664]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1900): 0.7622592401504517\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 77%|███████▋ | 1999/2601 [50:01<15:07, 1.51s/it, loss=0.791]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2000): 0.8292095738649369\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 81%|████████ | 2099/2601 [52:32<12:34, 1.50s/it, loss=0.874]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2100): 0.8038192576169968\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 85%|████████▍ | 2199/2601 [55:02<10:02, 1.50s/it, loss=0.98] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2200): 0.8064938718080521\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 88%|████████▊ | 2299/2601 [57:32<07:32, 1.50s/it, loss=1.25] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2300): 0.8164249670505523\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 92%|█████████▏| 2399/2601 [1:00:02<05:02, 1.50s/it, loss=1.06] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2400): 0.7864927875995636\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 96%|█████████▌| 2499/2601 [1:02:32<02:33, 1.50s/it, loss=0.497]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2500): 0.8313259115815163\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|█████████▉| 2599/2601 [1:05:02<00:02, 1.49s/it, loss=0.522]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2600): 0.7982275319099427\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|██████████| 2601/2601 [1:05:04<00:00, 1.50s/it, loss=0.727]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Training Loss for Epoch 2: 0.8081949698562945\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Validation: 100%|██████████| 137/137 [01:12<00:00, 1.90it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Validation Loss for Epoch 2: 0.9094252549383762\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 4%|▍ | 99/2601 [02:28<1:02:36, 1.50s/it, loss=0.409]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 100): 0.5861263217031956\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 8%|▊ | 199/2601 [04:59<1:00:14, 1.50s/it, loss=0.527]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 200): 0.5470401339232922\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 11%|█▏ | 299/2601 [07:29<57:43, 1.50s/it, loss=0.637] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 300): 0.5824183216691017\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 15%|█▌ | 399/2601 [09:59<55:12, 1.50s/it, loss=0.592]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 400): 0.5499672368168831\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 19%|█▉ | 499/2601 [12:29<52:39, 1.50s/it, loss=0.695]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 500): 0.5772905349731445\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 23%|██▎ | 599/2601 [14:59<50:05, 1.50s/it, loss=1.13] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 600): 0.5968974930047989\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 27%|██▋ | 699/2601 [17:29<47:28, 1.50s/it, loss=0.443]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 700): 0.5567461925745011\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 31%|███ | 799/2601 [19:59<45:03, 1.50s/it, loss=0.395]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 800): 0.5894309616088867\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 35%|███▍ | 899/2601 [22:29<42:34, 1.50s/it, loss=0.771]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 900): 0.5873063451051712\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 38%|███▊ | 999/2601 [24:59<40:05, 1.50s/it, loss=0.476]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1000): 0.5972063279151917\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 42%|████▏ | 1099/2601 [27:29<37:30, 1.50s/it, loss=0.565]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1100): 0.6006231245398521\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 46%|████▌ | 1199/2601 [29:59<35:02, 1.50s/it, loss=0.472]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1200): 0.5915874150395394\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 50%|████▉ | 1299/2601 [32:29<32:33, 1.50s/it, loss=0.532]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1300): 0.5902719554305077\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 54%|█████▍ | 1399/2601 [35:00<30:05, 1.50s/it, loss=0.443]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1400): 0.580033713877201\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 58%|█████▊ | 1499/2601 [37:30<27:34, 1.50s/it, loss=0.808]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1500): 0.5817161786556244\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 61%|██████▏ | 1599/2601 [40:00<25:03, 1.50s/it, loss=0.733]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1600): 0.5787480530142785\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 65%|██████▌ | 1699/2601 [42:30<22:31, 1.50s/it, loss=0.44] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1700): 0.5743653793632985\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 69%|██████▉ | 1799/2601 [45:00<20:03, 1.50s/it, loss=0.373]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1800): 0.5669003206491471\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 73%|███████▎ | 1899/2601 [47:30<17:35, 1.50s/it, loss=0.441]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1900): 0.5621130636334419\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 77%|███████▋ | 1999/2601 [50:00<15:02, 1.50s/it, loss=0.396]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2000): 0.592484669983387\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 81%|████████ | 2099/2601 [52:30<12:34, 1.50s/it, loss=0.507]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2100): 0.5556239295005798\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 85%|████████▍ | 2199/2601 [55:00<10:04, 1.50s/it, loss=0.486]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2200): 0.571899283528328\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 88%|████████▊ | 2299/2601 [57:31<07:35, 1.51s/it, loss=0.279]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2300): 0.5776233053207398\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 92%|█████████▏| 2399/2601 [1:00:01<05:02, 1.50s/it, loss=0.65]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2400): 0.5737731790542603\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 96%|█████████▌| 2499/2601 [1:02:31<02:32, 1.50s/it, loss=0.893]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2500): 0.5883317263424397\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|█████████▉| 2599/2601 [1:05:01<00:02, 1.50s/it, loss=0.677]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2600): 0.575586271584034\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|██████████| 2601/2601 [1:05:03<00:00, 1.50s/it, loss=0.634]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Training Loss for Epoch 3: 0.5781794790692716\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Validation: 100%|██████████| 137/137 [01:12<00:00, 1.90it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Validation Loss for Epoch 3: 0.999135330210637\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 4%|▍ | 99/2601 [02:28<1:02:43, 1.50s/it, loss=0.285]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 100): 0.413784771412611\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 8%|▊ | 199/2601 [04:59<1:00:03, 1.50s/it, loss=0.432]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 200): 0.43284411162137987\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 11%|█▏ | 299/2601 [07:29<57:39, 1.50s/it, loss=0.355] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 300): 0.44033734187483786\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 15%|█▌ | 399/2601 [09:59<55:03, 1.50s/it, loss=0.457] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 400): 0.4177935104072094\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 19%|█▉ | 499/2601 [12:29<52:36, 1.50s/it, loss=0.262]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 500): 0.42030997440218926\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 23%|██▎ | 599/2601 [14:59<50:10, 1.50s/it, loss=0.417]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 600): 0.43777878060936926\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 27%|██▋ | 699/2601 [17:29<47:39, 1.50s/it, loss=0.252]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 700): 0.44543042451143267\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 31%|███ | 799/2601 [19:59<45:02, 1.50s/it, loss=0.331]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 800): 0.41820422530174256\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 35%|███▍ | 899/2601 [22:29<42:16, 1.49s/it, loss=0.434]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 900): 0.4102541197836399\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 38%|███▊ | 999/2601 [24:59<40:03, 1.50s/it, loss=0.381]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1000): 0.4401281327009201\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 42%|████▏ | 1099/2601 [27:29<37:33, 1.50s/it, loss=0.375]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1100): 0.40734992161393163\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 46%|████▌ | 1199/2601 [29:59<35:01, 1.50s/it, loss=0.347]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1200): 0.4229107615351677\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 50%|████▉ | 1299/2601 [32:30<32:35, 1.50s/it, loss=0.596]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1300): 0.4366910111904144\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 54%|█████▍ | 1399/2601 [35:00<30:02, 1.50s/it, loss=0.514]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1400): 0.41040504559874535\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 58%|█████▊ | 1499/2601 [37:30<27:33, 1.50s/it, loss=0.321]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1500): 0.41208766609430314\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 61%|██████▏ | 1599/2601 [40:00<25:02, 1.50s/it, loss=0.362]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1600): 0.4288914044201374\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 65%|██████▌ | 1699/2601 [42:30<22:39, 1.51s/it, loss=0.348]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1700): 0.4072799864411354\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 69%|██████▉ | 1799/2601 [45:00<20:03, 1.50s/it, loss=0.18] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1800): 0.40538637042045594\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 73%|███████▎ | 1899/2601 [47:30<17:32, 1.50s/it, loss=0.394]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1900): 0.41915873721241953\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 77%|███████▋ | 1999/2601 [50:00<15:00, 1.50s/it, loss=0.36] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2000): 0.42285729214549067\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 81%|████████ | 2099/2601 [52:30<12:32, 1.50s/it, loss=0.45] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2100): 0.3937626303732395\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 85%|████████▍ | 2199/2601 [55:00<10:03, 1.50s/it, loss=0.365]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2200): 0.4223772123456001\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 88%|████████▊ | 2299/2601 [57:30<07:33, 1.50s/it, loss=0.447]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2300): 0.4191547580063343\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 92%|█████████▏| 2399/2601 [1:00:00<05:03, 1.50s/it, loss=0.423]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2400): 0.4223488415777683\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 96%|█████████▌| 2499/2601 [1:02:30<02:33, 1.50s/it, loss=0.597]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2500): 0.4321207369863987\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|█████████▉| 2599/2601 [1:05:00<00:02, 1.50s/it, loss=0.288]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2600): 0.4203851442039013\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|██████████| 2601/2601 [1:05:03<00:00, 1.50s/it, loss=0.632]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Training Loss for Epoch 4: 0.4216204704870769\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Validation: 100%|██████████| 137/137 [01:12<00:00, 1.90it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Validation Loss for Epoch 4: 1.0965224147713097\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 4%|▍ | 99/2601 [02:28<1:02:26, 1.50s/it, loss=0.324] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 100): 0.31893704675137996\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 8%|▊ | 199/2601 [04:58<1:00:02, 1.50s/it, loss=0.446]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 200): 0.30915156394243243\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 11%|█▏ | 299/2601 [07:28<57:27, 1.50s/it, loss=0.232] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 300): 0.32221948117017746\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 15%|█▌ | 399/2601 [09:58<54:44, 1.49s/it, loss=0.185]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 400): 0.32024823017418386\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 19%|█▉ | 499/2601 [12:28<52:28, 1.50s/it, loss=0.563]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 500): 0.3506594298779964\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 23%|██▎ | 599/2601 [14:57<49:54, 1.50s/it, loss=0.215] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 600): 0.3250641016662121\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 27%|██▋ | 699/2601 [17:27<47:40, 1.50s/it, loss=0.338]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 700): 0.3352129091322422\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 31%|███ | 799/2601 [19:57<45:01, 1.50s/it, loss=0.209]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 800): 0.3395147521793842\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 35%|███▍ | 899/2601 [22:27<42:32, 1.50s/it, loss=0.423]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 900): 0.3108686701953411\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 38%|███▊ | 999/2601 [24:57<39:59, 1.50s/it, loss=0.362]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1000): 0.355789770334959\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 42%|████▏ | 1099/2601 [27:26<37:32, 1.50s/it, loss=0.39] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1100): 0.3256390055269003\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 46%|████▌ | 1199/2601 [29:56<35:05, 1.50s/it, loss=0.194]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1200): 0.32589283064007757\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 50%|████▉ | 1299/2601 [32:26<32:32, 1.50s/it, loss=0.374] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1300): 0.3462872489541769\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 54%|█████▍ | 1399/2601 [34:56<30:08, 1.50s/it, loss=0.299]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1400): 0.3290054628252983\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 58%|█████▊ | 1499/2601 [37:26<27:36, 1.50s/it, loss=0.216]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1500): 0.30920374870300293\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 61%|██████▏ | 1599/2601 [39:56<25:04, 1.50s/it, loss=0.364]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1600): 0.31953610375523567\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 65%|██████▌ | 1699/2601 [42:26<22:35, 1.50s/it, loss=0.185]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1700): 0.3104771442711353\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 69%|██████▉ | 1799/2601 [44:57<20:05, 1.50s/it, loss=0.263]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1800): 0.3241106171905994\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 73%|███████▎ | 1899/2601 [47:27<17:31, 1.50s/it, loss=0.191]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 1900): 0.3105654291808605\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 77%|███████▋ | 1999/2601 [49:57<15:02, 1.50s/it, loss=0.279] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2000): 0.33756139226257803\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 81%|████████ | 2099/2601 [52:27<12:27, 1.49s/it, loss=0.388] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2100): 0.33014354825019837\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 85%|████████▍ | 2199/2601 [54:56<10:02, 1.50s/it, loss=0.302] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2200): 0.3082833808660507\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 88%|████████▊ | 2299/2601 [57:26<07:32, 1.50s/it, loss=0.19] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2300): 0.321821516752243\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 92%|█████████▏| 2399/2601 [59:56<05:02, 1.50s/it, loss=0.294] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2400): 0.32693059600889685\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 96%|█████████▌| 2499/2601 [1:02:26<02:32, 1.50s/it, loss=0.188]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2500): 0.3449856662750244\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|█████████▉| 2599/2601 [1:04:56<00:02, 1.50s/it, loss=0.201] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Avg loss for last 100 steps (step 2600): 0.3208476223796606\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Training: 100%|██████████| 2601/2601 [1:04:58<00:00, 1.50s/it, loss=0.647]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Training Loss for Epoch 5: 0.3262371334495429\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Validation: 100%|██████████| 137/137 [01:12<00:00, 1.90it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Validation Loss for Epoch 5: 1.1965753805898403\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "train_model(model, train_loader, val_loader)" ] }, { "cell_type": "markdown", "metadata": { "id": "pzpP5N5_CRmA" }, "source": [ "#### Saving model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:14:12.680953Z", "iopub.status.busy": "2024-10-20T11:14:12.680601Z", "iopub.status.idle": "2024-10-20T11:14:13.827992Z", "shell.execute_reply": "2024-10-20T11:14:13.826938Z", "shell.execute_reply.started": "2024-10-20T11:14:12.680918Z" }, "id": "7MnkV0CtCRmA", "outputId": "744a3ed8-f243-4f80-bad4-71c873644901" }, "outputs": [ { "data": { "text/plain": [ "('squad-bert-trained/BERT_model/tokenizer_config.json',\n", " 'squad-bert-trained/BERT_model/special_tokens_map.json',\n", " 'squad-bert-trained/BERT_model/vocab.txt',\n", " 'squad-bert-trained/BERT_model/added_tokens.json',\n", " 'squad-bert-trained/BERT_model/tokenizer.json')" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define the directory where you want to save the model\n", "model_save_path = 'squad-bert-trained/BERT_model'\n", "\n", "# Create the directory if it doesn't exist\n", "if not os.path.exists(model_save_path):\n", " os.makedirs(model_save_path)\n", "\n", "\n", "# Save the trained model and tokenizer\n", "model.save_pretrained(model_save_path)\n", "tokenizer.save_pretrained(model_save_path)" ] }, { "cell_type": "markdown", "metadata": { "id": "uBRSQrYaCRmA" }, "source": [ "#### Creating a .zip folder to load in local system" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:18:15.398687Z", "iopub.status.busy": "2024-10-20T11:18:15.398050Z", "iopub.status.idle": "2024-10-20T11:18:38.861353Z", "shell.execute_reply": "2024-10-20T11:18:38.860282Z", "shell.execute_reply.started": "2024-10-20T11:18:15.398639Z" }, "id": "kyIuj_TiCRmA", "outputId": "d936a1df-ade4-4ac0-eac9-870ad0c0d672" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " adding: squad-bert-trained/BERT_model/ (stored 0%)\n", " adding: squad-bert-trained/BERT_model/tokenizer_config.json (deflated 76%)\n", " adding: squad-bert-trained/BERT_model/model.safetensors (deflated 7%)\n", " adding: squad-bert-trained/BERT_model/config.json (deflated 47%)\n", " adding: squad-bert-trained/BERT_model/special_tokens_map.json (deflated 42%)\n", " adding: squad-bert-trained/BERT_model/tokenizer.json (deflated 71%)\n", " adding: squad-bert-trained/BERT_model/vocab.txt (deflated 53%)\n" ] }, { "data": { "text/html": [ "model.zip
" ], "text/plain": [ "/kaggle/working/model.zip" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compress the model directory into a zip file\n", "!zip -r model.zip squad-bert-trained/BERT_model\n", "\n", "# Download the zip file to your local system\n", "from IPython.display import FileLink\n", "\n", "# Display a link to download the file\n", "FileLink(r'model.zip')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Drb4Uto-CRmB" }, "source": [ "#### Loading the model and setting it to evaluation model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:32:43.467717Z", "iopub.status.busy": "2024-10-20T11:32:43.467314Z", "iopub.status.idle": "2024-10-20T11:32:43.749348Z", "shell.execute_reply": "2024-10-20T11:32:43.748466Z", "shell.execute_reply.started": "2024-10-20T11:32:43.467677Z" }, "id": "MZbsLvxyCRmB", "outputId": "b50ba0e5-8b3d-4a11-9953-687181e892a1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Moving the model to cuda\n", "\n", "\n" ] }, { "data": { "text/plain": [ "BertForQuestionAnswering(\n", " (bert): BertModel(\n", " (embeddings): BertEmbeddings(\n", " (word_embeddings): Embedding(30522, 768, padding_idx=0)\n", " (position_embeddings): Embedding(512, 768)\n", " (token_type_embeddings): Embedding(2, 768)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (encoder): BertEncoder(\n", " (layer): ModuleList(\n", " (0-11): 12 x BertLayer(\n", " (attention): BertAttention(\n", " (self): BertSdpaSelfAttention(\n", " (query): Linear(in_features=768, out_features=768, bias=True)\n", " (key): Linear(in_features=768, out_features=768, bias=True)\n", " (value): Linear(in_features=768, out_features=768, bias=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (output): BertSelfOutput(\n", " (dense): Linear(in_features=768, out_features=768, bias=True)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " )\n", " (intermediate): BertIntermediate(\n", " (dense): Linear(in_features=768, out_features=3072, bias=True)\n", " (intermediate_act_fn): GELUActivation()\n", " )\n", " (output): BertOutput(\n", " (dense): Linear(in_features=3072, out_features=768, bias=True)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " )\n", " )\n", " )\n", " )\n", " (qa_outputs): Linear(in_features=768, out_features=2, bias=True)\n", ")" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load the saved model and tokenizer\n", "model_save_path = 'squad-bert-trained/BERT_model'\n", "model = BertForQuestionAnswering.from_pretrained(model_save_path)\n", "tokenizer = BertTokenizerFast.from_pretrained(model_save_path)\n", "\n", "print(f\"Moving the model to {device}\\n\\n\")\n", "model.to(device)\n", "\n", "# Set the model to evaluation mode\n", "model.eval()" ] }, { "cell_type": "markdown", "metadata": { "id": "vPFgF0lnCRmB" }, "source": [ "**Post-Code Explanation**:\n", "The core of the training process is handled within the loop, where the model’s parameters are iteratively updated based on the training data and loss function.\n", "\n", "- **Gradient Updates**:\n", " - At the beginning of each batch iteration, `optimizer.zero_grad()` clears out the gradients from the previous iteration. This is essential because PyTorch accumulates gradients, and not clearing them would result in incorrect gradient updates.\n", " - `loss_value.backward()` performs backpropagation, calculating the gradients for the current batch of data with respect to the loss. The gradients are stored in the model's parameters.\n", " - `optimizer.step()` updates the model's weights using the gradients computed during backpropagation, adjusting the model to minimize the loss.\n", "\n", "- **Learning Rate Scheduling**:\n", " - After each batch, the learning rate scheduler (`scheduler.step()`, if included) is triggered, adjusting the learning rate dynamically based on predefined rules (such as decay). This helps in ensuring steady convergence and avoiding overshooting the optimal weights.\n", "\n", "- **Loss Tracking**:\n", " - Throughout the training process, we monitor the loss at regular intervals (e.g., every 100 batches). This involves keeping track of batch losses and printing the average, which gives us an indication of how well the model is learning over time. This step is crucial for diagnosing issues such as vanishing gradients or exploding losses early in the training process.\n", "\n", "- **Validation Phase**:\n", " - Once an epoch is completed, the model switches to evaluation mode using `model.eval()`. This phase disables dropout and batch normalization layers, which are only active during training. Additionally, `torch.no_grad()` ensures that no gradients are calculated during validation, which reduces computational overhead and memory usage.\n", " - During validation, the model's performance is assessed on unseen validation data by computing the validation loss. This serves as a proxy for how well the model generalizes to data outside the training set.\n", "\n", "- **Epoch Summary**:\n", " - At the end of each epoch, both training and validation losses are printed. These metrics are critical in diagnosing potential overfitting (training loss drops while validation loss increases) or underfitting (both losses remain high). If overfitting is detected, strategies such as early stopping or regularization might be applied in subsequent training runs.\n", "\n", "- **Fine-tuning**:\n", " - Monitoring these metrics allows us to decide if further fine-tuning is necessary. Fine-tuning might include adjustments in learning rate, model architecture, batch size, or the number of epochs. This step helps ensure the model is trained effectively while avoiding problems like overfitting or underfitting.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "tLjeYuLwCRmB" }, "source": [ "### 4. Evaluation and Inference\n", "\n", "After completing the training phase and saving the model in the previous section, we now move on to the evaluation and inference process. This section is critical in determining how well the model performs on unseen data, and it provides insights into the model's generalization capabilities.\n", "\n", "- **Inference**:\n", " - To begin, we will load the saved model checkpoint and run a few sample inputs through it to observe its predictions. This allows us to visually assess how well the model handles the task, whether it's generating coherent responses or correctly answering questions. Running individual examples also helps in detecting any obvious issues, such as grammatical errors, incompleteness, or irrelevant outputs.\n", " - We’ll also compare the model's predictions against the ground truth to evaluate its performance qualitatively.\n", "\n", "- **Evaluation Using Metrics**:\n", " - After running some basic examples, we will systematically evaluate the model using standard metrics for language models. In this project, we are using **ROUGE**, **BLEU**, and **F1** scores, which are commonly used for evaluating the quality of text generation tasks such as question answering, summarization, or machine translation.\n", " - **ROUGE (Recall-Oriented Understudy for Gisting Evaluation)**: This metric measures the overlap between the generated text and the reference text. ROUGE-1 measures the overlap of unigrams, ROUGE-2 for bigrams, and ROUGE-L for the longest common subsequence. It is especially useful for tasks like summarization, where capturing important words and phrases is critical.\n", " - **BLEU (Bilingual Evaluation Understudy)**: BLEU measures the precision of n-grams between the generated and reference texts, primarily focusing on fluency and syntactic quality. It is widely used for translation tasks.\n", " - **F1 Accuracy**: This metric evaluates the balance between precision and recall, focusing on how well the model's answers match the reference answers. It is especially important for tasks like question answering, where both exact matches and partially correct answers are considered.\n", " \n", "- **Automated Evaluation**:\n", " - We will loop through the test dataset and generate predictions for each input. For every prediction, we will calculate the ROUGE, BLEU, and F1 scores. These metrics will give us a quantitative evaluation of the model’s performance across the entire test set.\n", " - By averaging these scores, we will determine the overall performance of the model. A higher average score in all metrics indicates that the model has learned to generate meaningful, accurate, and fluent responses.\n", "\n", "- **Analysis of Results**:\n", " - Once the evaluation is complete, we will analyze the results. If the scores indicate that the model performs well (e.g., high ROUGE and BLEU scores), it means that the model has effectively learned the task and can generalize to unseen data. On the other hand, if the evaluation scores are low, it may be a sign that the model is either underfitting or overfitting.\n", " - In case of suboptimal performance, further fine-tuning of the model or hyperparameters may be required, or additional training data might be necessary to improve the results.\n", "\n", "- **Next Steps**:\n", " - Based on the evaluation results, we might consider improvements, such as adjusting the training process, refining the model’s architecture, or experimenting with data augmentation techniques. Evaluating the error cases in detail will guide future decisions to enhance the model’s overall performance.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:33:08.831479Z", "iopub.status.busy": "2024-10-20T11:33:08.831085Z", "iopub.status.idle": "2024-10-20T11:33:08.840458Z", "shell.execute_reply": "2024-10-20T11:33:08.839365Z", "shell.execute_reply.started": "2024-10-20T11:33:08.831442Z" }, "id": "fl2R3ze6CRmB" }, "outputs": [], "source": [ "def answer_question(question, context):\n", " max_context_size = 512\n", " chunk_size = max_context_size\n", "\n", " chunks = [context[i:i+chunk_size] for i in range(0, len(context), chunk_size)]\n", "\n", " answers = []\n", " for chunk in chunks:\n", " # Tokenize the chunk\n", " inputs = tokenizer(chunk, question, return_tensors='pt', truncation=True, max_length=max_context_size).to(device)\n", "\n", " # Generate the output\n", " with torch.no_grad():\n", " outputs = model(**inputs)\n", "\n", " # Get most likely beginning and end of the answer span\n", " answer_start_scores = outputs.start_logits\n", " answer_end_scores = outputs.end_logits\n", "\n", " # Find the tokens with the highest `start` and `end` scores\n", " answer_start = torch.argmax(answer_start_scores)\n", " answer_end = torch.argmax(answer_end_scores) + 1\n", "\n", " # Convert the tokens to text\n", " answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))\n", "\n", " answers.append(answer)\n", "\n", " # Combine the answers from each chunk\n", " answer = ' '.join(answers)\n", " answer = answer.replace('[CLS]', '')\n", " return answer.strip()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:42:47.083484Z", "iopub.status.busy": "2024-10-20T11:42:47.083069Z", "iopub.status.idle": "2024-10-20T11:42:47.192611Z", "shell.execute_reply": "2024-10-20T11:42:47.191665Z", "shell.execute_reply.started": "2024-10-20T11:42:47.083444Z" }, "id": "whlIg4zRCRmB", "outputId": "692f1f9c-d19f-43ce-8477-291e758e2965" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Q1: When was it discovered Beyonce was a co-owner of the music service, Tidal\n", "A1: march 30, 2015 jay - z on the release of tidal.\n", "\n", "Q2: The parent company of Tidal became under the ownership of whom in 2015?\n", "A2: jay z spotify\n", "\n", "Q3: What kind of service is Tidal?\n", "A3: music streaming all artist owned\n", "\n" ] } ], "source": [ "context = \"\"\"On March 30, 2015, it was announced that Beyoncé is a co-owner, with various other music artists, in the music streaming service Tidal. The service specialises in lossless audio and high\n", "definition music videos. Beyoncé's husband Jay Z acquired the parent company of Tidal, Aspiro, in the first quarter of 2015. Including Beyoncé and Jay-Z, sixteen artist stakeholders\n", "(such as Kanye West, Rihanna, Madonna, Chris Martin, Nicki Minaj and more) co-own Tidal, with the majority owning a 3% equity stake. The idea of having an all artist owned streaming\n", "service was created by those involved to adapt to the increased demand for streaming within the current music industry, and to rival other streaming services such as Spotify, which have\n", "been criticised for their low payout of royalties. \"The challenge is to get everyone to respect music again, to recognize its value\", stated Jay-Z on the release of Tidal.\"\"\"\n", "\n", "questions = [\n", " \"When was it discovered Beyonce was a co-owner of the music service, Tidal\",\n", " \"The parent company of Tidal became under the ownership of whom in 2015?\",\n", " \"What kind of service is Tidal?\",\n", "\n", "]\n", "\n", "# True Answers for validation (if needed)\n", "True_Answers = [\"September 1876\", \"twice\", \"The Observer\", \"three\", \"1987\"]\n", "\n", "# Assuming you have a function called answer_question that returns answers based on the context and question.\n", "# Here we would loop through questions and print the results in the specified format.\n", "\n", "for index, question in enumerate(questions, start=1):\n", " answer = answer_question(question, context)\n", " print(f\"Q{index}: {question}\")\n", " print(f\"A{index}: {answer}\")\n", " print()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:43:37.270767Z", "iopub.status.busy": "2024-10-20T11:43:37.270373Z", "iopub.status.idle": "2024-10-20T11:43:37.292985Z", "shell.execute_reply": "2024-10-20T11:43:37.292164Z", "shell.execute_reply.started": "2024-10-20T11:43:37.270730Z" }, "id": "4RNn3AFLCRmB", "outputId": "78068e75-d902-4842-ba54-89668dbcbf61" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Question: What is the difference between Music and Harmony?\n", "Answer: it is a universal language that can evoke emotions, convey messages, and bring people together. music has been an integral part of human culture for centuries, with various genres and styles emerging across the world. from classical to contemporary, music has the power to inspire, heal, and entertain\n" ] } ], "source": [ "# Test with a new question and context\n", "context = \"\"\"\n", "Music is an art form whose medium is sound and silence. It is a universal language that\n", "can evoke emotions, convey messages, and bring people together. Music has been an\n", "integral part of human culture for centuries, with various genres and styles emerging\n", "across the world. From classical to contemporary, music has the power to inspire, heal,\n", "and entertain.\n", "\"\"\"\n", "question = \"What is the difference between Music and Harmony?\"\n", "\n", "# Get the answer from the model\n", "answer = answer_question(question, context)\n", "print(f\"Question: {question}\")\n", "print(f\"Answer: {answer}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T11:43:57.547784Z", "iopub.status.busy": "2024-10-20T11:43:57.547404Z", "iopub.status.idle": "2024-10-20T11:43:57.608202Z", "shell.execute_reply": "2024-10-20T11:43:57.607381Z", "shell.execute_reply.started": "2024-10-20T11:43:57.547745Z" }, "id": "t_XeFyKECRmC", "outputId": "d88b3671-2622-4cb7-8060-ef25b866b8e8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Question: What are the applications of Space Robotics?\n", "Answer: space exploration, satellite maintenance, and planetary surface operations\n" ] } ], "source": [ "context = \"\"\"\n", "Space robotics is a field of robotics that focuses on the design, development, and operation of robots that can\n", "survive and function in the harsh environment of space. These robots are designed to perform a variety of tasks,\n", "such as space exploration, satellite maintenance, and planetary surface operations.\n", "Space robots can be autonomous or remotely controlled, and they often require specialized\n", "systems to withstand the extreme temperatures, radiation, and vacuum of space.\n", "\n", "Space robotics has many applications, including:\n", "\n", "Planetary exploration: Robots like NASA's Curiosity Rover and Perseverance Rover\n", "have been used to explore the surface of Mars and gather data about the planet's geology and climate.\n", "Satellite maintenance: Robots like the Canadarm2 robotic arm on the\n", "International Space Station have been used to perform maintenance tasks and repairs on satellites in orbit.\n", "Asteroid mining: Robots are being developed to explore and mine asteroids\n", "for valuable resources like water and precious metals.\n", "\"\"\"\n", "\n", "question = \"What are the applications of Space Robotics?\"\n", "\n", "# Get the answer from the model\n", "answer = answer_question(question, context)\n", "print(f\"Question: {question}\")\n", "print(f\"Answer: {answer}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T12:06:51.353291Z", "iopub.status.busy": "2024-10-20T12:06:51.352590Z", "iopub.status.idle": "2024-10-20T12:06:51.365351Z", "shell.execute_reply": "2024-10-20T12:06:51.364392Z", "shell.execute_reply.started": "2024-10-20T12:06:51.353249Z" }, "id": "OAaexydpCRmC" }, "outputs": [], "source": [ "\n", "# Define your function for calculating metrics and generating answers\n", "def calculate_metrics(val_df):\n", " metrics = {\n", " 'rouge1': [],\n", " 'rouge2': [],\n", " 'rougel': [],\n", " 'bleu': [],\n", " 'f1': [] # Add F1 score to metrics\n", " }\n", "\n", " # Initialize the scorer\n", " scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)\n", "\n", " # Initialize lists to store answers and predicted answers\n", " answers = []\n", " predicted_answers = []\n", "\n", " # Iterate over the predictions\n", " for question_id in range(val_df.shape[0]):\n", " predicted_answer = answer_question(val_df.iloc[question_id]['Question'], val_df.iloc[question_id]['Context']).lower().strip()\n", " answer = val_df.iloc[question_id]['Answer'].lower().strip()\n", "\n", " # Store the answers and predicted answers\n", " answers.append(answer)\n", " predicted_answers.append(predicted_answer)\n", "\n", " # Calculate ROUGE scores\n", " rouge_scores = scorer.score(answer, predicted_answer)\n", " metrics['rouge1'].append(rouge_scores['rouge1'].fmeasure)\n", " metrics['rouge2'].append(rouge_scores['rouge2'].fmeasure)\n", " metrics['rougel'].append(rouge_scores['rougeL'].fmeasure)\n", "\n", " # Calculate BLEU score\n", " metrics['bleu'].append(sentence_bleu([answer.split()], predicted_answer.split(), weights=(1.0, 0.0, 0.0, 0.0)))\n", "\n", " # Calculate F1 score\n", " # Use a basic precision and recall approach based on the ROUGE scores\n", " precision = rouge_scores['rouge1'].precision\n", " recall = rouge_scores['rouge1'].recall\n", " if precision + recall > 0: # Avoid division by zero\n", " f1 = 2 * (precision * recall) / (precision + recall)\n", " else:\n", " f1 = 0.0\n", " metrics['f1'].append(f1)\n", "\n", " return metrics, answers, predicted_answers\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T12:06:52.671717Z", "iopub.status.busy": "2024-10-20T12:06:52.670887Z", "iopub.status.idle": "2024-10-20T12:08:23.146385Z", "shell.execute_reply": "2024-10-20T12:08:23.145359Z", "shell.execute_reply.started": "2024-10-20T12:06:52.671664Z" }, "id": "ZtT1_cIKCRmC", "outputId": "06e5029b-28be-4b41-a4db-3a4f6c16b7a7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average ROUGE-1: 0.5453534984095015\n", "Average ROUGE-2: 0.3224320844012193\n", "Average ROUGE-L: 0.5431188724586591\n", "Average BLEU: 0.39141109481540254\n", "Average F1 Score: 0.5453534984095015\n" ] } ], "source": [ "# Import Gaussian filter for smoothing\n", "from scipy.ndimage import gaussian_filter1d\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "# Calculate metrics\n", "metrics, answers, predicted_answers = calculate_metrics(val_df)\n", "\n", "# Calculate average scores\n", "average_metrics = {key: sum(value) / len(value) for key, value in metrics.items()}\n", "\n", "# Print the average metrics\n", "print('Average ROUGE-1:', average_metrics['rouge1'])\n", "print('Average ROUGE-2:', average_metrics['rouge2'])\n", "print('Average ROUGE-L:', average_metrics['rougel'])\n", "print('Average BLEU:', average_metrics['bleu'])\n", "print('Average F1 Score:', average_metrics['f1']) # Print the average F1 score\n", "\n", "# Calculate the lengths of answers\n", "actual_lengths = [len(answer) for answer in answers]\n", "predicted_lengths = [len(predicted_answer) for predicted_answer in predicted_answers]\n", "\n", "# Create a DataFrame for easier plotting\n", "import pandas as pd\n", "\n", "lengths_df = pd.DataFrame({\n", " 'Index': range(len(actual_lengths)),\n", " 'Actual Lengths': actual_lengths,\n", " 'Predicted Lengths': predicted_lengths\n", "})\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T12:11:02.216132Z", "iopub.status.busy": "2024-10-20T12:11:02.215447Z", "iopub.status.idle": "2024-10-20T12:11:02.529747Z", "shell.execute_reply": "2024-10-20T12:11:02.528455Z", "shell.execute_reply.started": "2024-10-20T12:11:02.216090Z" }, "id": "hEjlBDrRCRmD", "outputId": "8c4bb27d-854b-4c8a-a229-2c1b22032d7b" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1cAAAIsCAYAAAAeUFNGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/xnp5ZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABQQ0lEQVR4nO3deVxU9f7H8Tcgi6DDomZmkuFCpZJhbpdSMzPXSFPDBTG9ijes3OqqmWlame1ibqXpz2ualXuKkrkkei33pWt2pdLEcB8WZZ/fH/2cX+MgAh6HAV7Px4PHw/mec77nc8YvM/Pme84ZF4vFYhEAAAAA4Ka4lnQBAAAAAFAWEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAADhUcHKzY2NgS2feuXbsUHBysXbt2lcj+nVFkZKQiIyNLugwAKBMIVwBQDi1fvlzBwcHX/dm/f39Jl3hTFi9erOXLl5d0GTYiIyMVHBys9u3b57s8ISHB+vzHxcUVuf/k5GTFxsbqP//5z82WCgAopgolXQAAoOQ8//zzuvPOO+3aAwMDS6Aa4yxZskT+/v7q3r27TXvTpk118OBBubu7l0hdnp6e+u2333Tw4EGFhITYLFuzZo08PT2VmZlZrL7PnDmjGTNmqGbNmrr33nsLvd28efOKtT8AgD3CFQCUY61atVKjRo1KugyHcXV1laenZ4ntPzAwUDk5OVq7dq1NuMrMzFR8fLzatGmjDRs2OKSWK1euqGLFivLw8HDI/gCgPOC0QABAvrKzs9WsWTONHTvWbllaWpoaNWqkt956S5KUlZWlDz/8UN27d1eTJk3UuHFj9enTR//+979vuJ8xY8aobdu2du2xsbEKDg62afvqq6/Uv39/tWzZUg0bNlSnTp302Wef2azTtm1b/fzzz/r++++tp9ldvaboetdcrV+/Xt27d1dISIiaN2+u0aNHKzk52a7OBx54QMnJyXr22Wf1wAMPqEWLFnrrrbeUm5t7w+O8qkuXLlq3bp3y8vKsbd9++60yMjLUoUOHfLdJTk7W2LFj9be//U0NGzZU586d9eWXX1qX79q1Sz169JAkjR071nrcV0+NjIyMVJcuXXT48GH17dtX999/v9577z3rsmuvucrMzFRsbKwef/xxNWrUSA899JCGDRumEydOWNf5+uuv1b17dz3wwAMKDQ1V165dtXDhwkI/DwBQFjFzBQDlWFpami5cuGDT5uLiIn9/f7m7u6tdu3aKj4/XpEmTbGY4vvnmG2VlZalTp07Wfr744gt16dJFPXv2VHp6ur788kv9/e9/1xdffFGk09QKsmTJEtWrV09t27ZVhQoVtHnzZk2aNEkWi0V9+/aVJI0bN06TJ0+Wt7e3hg4dKkmqWrXqdftcvny5xo4dq0aNGmnkyJE6f/68/ud//kd79+7VypUrZTKZrOvm5uZq0KBBCgkJ0UsvvaSdO3dq/vz5qlWrlvr06VOoY+jSpYtiY2O1a9cutWzZUpK0du1atWjRQlWqVLFb/9y5c+rVq5dcXFzUt29fBQQEaNu2bXr55ZeVlpamAQMGqE6dOnr++ec1ffp0Pf3002rSpIkkKTQ01NrPpUuXNHjwYHXu3FlPPPFEvvu6eozR0dHauXOnOnfurP79+ys9PV0JCQk6duyYAgMDlZCQoJEjR6ply5YaPXq0JCkxMVF79+5VVFRUoZ4HACiLCFcAUI4NGDDArs3Dw0OHDh2SJHXq1ElfffWVEhIS9Mgjj1jXWbdunWrVqmU9pdDX11fffvutTQDr1auXOnbsqEWLFumNN94wpN5//etf8vLysj7u16+fBg0apE8//dQartq1a6cPPvhA/v7+Cg8PL7C/7OxsvfPOO6pfv74WL15sPWWwSZMmio6O1oIFC/T8889b18/MzFTHjh0VExMjSerdu7e6deumL7/8stDhqnbt2mrYsKHWrl2rli1bKiUlRVu3btWUKVPyXf/9999Xbm6u1qxZI39/f+t+R44cqRkzZigiIkJVq1ZVq1atNH36dDVu3Djf4z579qwmTZqkiIiIAutbuXKldu7cqbFjx9qMjyFDhshisUiStmzZokqVKmnevHlyc3Mr1HEDQHnAaYEAUI5NmDBBn376qc3Pxx9/bF3eokUL+fv7a926ddY2s9msHTt2WGetJMnNzc0arPLy8nTp0iXl5OSoYcOG+vHHHw2r96/BKjU1VRcuXFCzZs108uRJpaamFrm/w4cP6/z58+rdu7fNtVht2rRRUFCQtmzZYrdN7969bR43adJEv//+e5H227VrV8XHxysrK0sbNmyQm5ub2rVrZ7eexWLRxo0b1bZtW1ksFl24cMH689BDDyk1NVVHjhwp1D49PDzsbvCRn40bN8rf31/9+vWzW+bi4iJJMplMunLlihISEgq1bwAoL5i5AoByLCQkpMAbWlSoUEHt27fX2rVrlZWVJQ8PD23cuFHZ2dk24UqSVqxYofnz5+uXX35Rdna2tT2/uxEW1549exQbG6v9+/frypUrNstSU1NVuXLlIvWXlJQkSbr77rvtlgUFBWnPnj02bZ6engoICLBp8/X1ldlsLtJ+O3XqpLfeekvbtm3T6tWr1aZNG1WqVMluvQsXLiglJUWff/65Pv/883z7uva0zuupXr16oW5eceLECd19992qUOH6HxH69Omj9evXa/DgwapevbrCwsLUsWNHtWrVqlC1AEBZRbgCABSoc+fO+vzzz7Vt2za1a9dOcXFxCgoK0j333GNdZ9WqVRozZozatWunQYMGqUqVKnJzc9OcOXN08uTJAvu/OhtyrWtvEnHixAkNGDBAQUFBGjNmjGrUqCF3d3dt3bpVCxYssLlBxK1i1Clwt912m5o1a6ZPP/1Ue/fuve6XKl89pieeeELdunXLd51rb/pxPX+d9btZVapU0cqVK7V9+3Zt27ZN27Zt0/Lly/Xkk09ab3ICAOUR4QoAUKCmTZuqWrVqWrdunUJDQ/Xvf//beqOIqzZs2KBatWppxowZNmFp+vTpN+zfZDIpJSXFrv3qrNJV3377rbKysjRr1izdcccd1vZr7/wnXT+wXetqP7/88ov15hJX/fLLLzb7MVqXLl00fvx4mUym6874BAQEyMfHR3l5efrb3/5WYH+FPeYbCQwM1IEDB5SdnV3g94F5eHiobdu2atu2rfLy8jRx4kR9/vnnevbZZ3XXXXcZUgsAlDZccwUAKJCrq6s6dOigzZs3a/Xq1crJybE7JfDqjM7VGx5I0oEDB7R///4b9h8YGKjU1FQdPXrU2nbmzBnFx8ffcB+pqan66quv7PqsWLFivoHtWg0bNlSVKlW0dOlSZWVlWdu3bt2q48ePq02bNjfso7g6dOigYcOG6dVXX73u6Xpubm56/PHHtWHDBh07dsxu+V9PCaxYsaIkFeq4C9K+fXtdvHhRixcvtlt29bm/ePGiTburq6t1Bu2vzyMAlDfMXAFAObZt2zYlJibatYeGhqpWrVrWx1fv+jd9+nTVr19fderUsVm/TZs22rhxo2JiYtSmTRv9/vvvWrp0qerWravLly8XWEOnTp30zjvvaNiwYYqMjFRGRoaWLFmiu+++2+ZmDWFhYXJ3d9fQoUMVERGh9PR0ffHFF6pSpYrOnj1r02eDBg20ZMkSzZw5U3fddZcCAgLsZqYkyd3dXaNHj9bYsWPVr18/de7c2Xor9po1a+Z7N0WjVK5cWc8999wN1xs1apR27dqlXr16qWfPnqpbt67MZrOOHDminTt36vvvv5f0Z0g1mUxaunSpfHx85O3trZCQEJv/x8J48skntXLlSr355ps6ePCgmjRpoitXrmjnzp3q3bu32rVrp/Hjx8tsNqtFixaqXr26kpKS9K9//Uv33nuv3dgAgPKEcAUA5dj1Ttt78803bT6Uh4aGqkaNGjp9+rTdrJUkde/eXefOndPnn3+u7du3q27dunr77bcVFxdn/fB/Pf7+/poxY4amTp2qt99+W3feeadGjhyp3377zSZcBQUFafr06frggw/01ltvqWrVqurdu7cCAgI0btw4mz5jYmKUlJSkTz75ROnp6WrWrFm+4epq7V5eXvr444/1zjvvyNvbW+3atdOLL75o8x1XJaVq1ar64osv9NFHHyk+Pl5LliyRn5+f6tata/2OKenPoDh16lS99957mjhxonJycuz+HwvDzc1NH3/8sWbNmqW1a9dq48aN8vPzU2hoqHV26oknntCyZcv02WefKSUlRdWqVVPHjh313HPPydWVk2IAlF8ulr+eXwEAAAAAKBb+vAQAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgLsFAigRWVlZ+vDDD7Vq1SqlpKQoODhYw4cPV1hYWIHbxcbGasaMGXbtHh4eOnTokE3b1TubXWvUqFEaMmRI8YsHAKAE8R7qvAhXAErEmDFjtGHDBvXv31+1a9fWihUrNGTIEC1cuFAPPvjgDbefOHGivL29rY+vfsHstcLCwhQeHm7Tdt99991c8QAAlCDeQ50X4QqAwx08eFBff/21XnrpJQ0aNEjSn19c2qVLF73zzjtaunTpDft4/PHHFRAQcMP1ateubffGAAC3giNmE/5q9+7d6tu3ryRp586dhXpNROnHe6hzI1wVwGKxKC+PrwEDjLZ+/Xq5ubmpR4+eys3NkyRVqOCu7t2f0gcfvK/ffz+lGjVq5Lvt1d/J3Nw8mc0p8vHxkYuLy3X3ZbFYlJ5+WS4uLvL09DT+YADg//zzn//Uxo0bFRnZX3fddZdWrvxzNuHTTxeoSZMm193u6uvahAmv2s0mXH2NtN8mT5MnT1HFit66cuWy8vIs110XZQvvoSXD1dWlwOfqKsJVAfLyLLpwIb2kywDKnIMHD6tWrUBlZbnY/I7Vrl1PkvTDD/v10EOmfLe9ciVLkvTYY4/pypXLqlixoh5+uI2GDRuugIAqduuvWLFCS5YskcViUe3ad6t//0Fq377DLTgqAOXZjz8e1rp16/Tssy+oT59ISdLDD7dT//5P6623pmn27PnX3fbq61qzZg/Lz8/PZtn1PoesXPmlTp9OUpcu4friiyW6ePGyLBYPYw4GTo330JIREOAjNzfCFQAndP78OVWpUtWu/WrbuXNnr7tt5comPfVULzVoECIPD3cdOLBfy5cv048/HtG8ef8jH59K1nUbNQrRI488pjvuuEPnzp3V8uVf6LXXxis9PU3duvUw/sAAlFtbtmySm5ubwsO7Wds8PT3VpUu45sz5SMnJf6h69dsL7OPPWYI0eXsXPJuQkmLWxx/P0qBBQ3Xx4gXDjgGlA++hzo1wBcDhMjMz5e7ubtfu4fHnX12zsjKvu22vXr1tHrdp86juvbeBXnttvJYv/1KRkQOsy2bNsv1LcefO4Ro0qJ/mzPlInTp1kaen100cBQD8v2PHflKtWoE2H04l6d57G0iSfv752A3DVa9e4YWaTfj449kKCKii8PDuWrDgE+MOAqUC76HOje+5AuBwnp6eys7OtmvPyvrzdAUPj6Kd192+fQdVqVJFu3d/X+B67u7u6t69l9LSUnX06NEi7QMACmLEbMKLL47TlClvqUuXJ7Vp00Y9++xgpaen2az73//+rNWrl+u550Ze9w5vKNt4D3VuzFwBcLgqVarm+0Hj/PlzkqSqVasVuc/bbquu1FTzDderXr26JBVqXQAoLEfNJnzwwdtq3vxvatashTGFo9ThPdS5MXMFwOHq1QvWyZMn7P4i++OPh/9vef0i9WexWHT69Gn5+fnfcN2kpFOSVKh1AaCwHDGbsGnTRh0+fFDDhg2/qVpRuvEe6twIVwAcrk2bR5Wbm6tVq1ZY27KysrRu3Rrdd19D63UJf/zxh3777VebbS9evGjX34oVX+rSpYtq3rxlgetdvpyuZcuWyM/PT8HB9xp0NADw52zC1ZmDvzJyNuGjjz7UI4+0k7u7u06fTtLp00lKS/vzA/aZM38UeOohyg7eQ50bpwUCcLgGDRrqkUfaac6cGbp06YJq1qyluLi1On06SWPGvGJdb8qUCdq/f6+2b99tbevRo4sefbS9goLqyMPDUwcP7temTRtVr159hYc/ZV1v+fJl+u67rQoLe1jVq9+u8+fP6euvVys5+Q+98spr+Z6+AwDFVa9esPbt26P09DSbm1rc7GxC/frB1rYzZ5IVHx+n+Pg4u/UHDuynunXra8GCz4p5BCgteA91boQrACVi/PhJ+uSTGtqwYZ1SU1NVp05dTZv2gRo3Di1wu/btO+rQoYPasuVbZWVl6vbba6hPn/6KihooL6//v3NRSMj9Onz4oNauXSmz2Swvr4q6774GGjt2gpo0aXqrDw9AOdOmzaNasmSRVq1aYf2eq+vNJmRmZuiuu2pbt7148aL8/W1Ps8pvNuGNN96x2++mTRu0aVO8xo+fpNtuq34LjgzOiPdQ5+VisVgsJV2Es8rNzeNLhAEAQKG88soYbdu2WU8/3cc6m/Djj0f04YezrB96hw0bYjeb8OijYfnOJtStW0+zZs23+dB7rXnz5ujTTz/W2rXf2H0BMQDj/Pklwje+ooqZKwAAAAPc6tkEAM6PmasCMHMFAAAAoLAzV9wtEAAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwQIWSLgDAzalWrXJJl1Cizp5NLekSAAClFO+hvIcajZkrAAAAADAAM1dAGdF19LvKyMwu6TIcwsvTXWveGVXSZQC4RZhNYDbB0eZ8P1jZuZklXYZDuLt5KrrZxyVdRplFuALKiIzMbGVklY9wBQCAkbJzM5WTVz7CFW4twhUAAHBKvVZMUUZOVkmX4RBeFTy0rNv4ki4DwE0iXAEAAKeUkZOljFxm5AGUHtzQAgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAANUKOkCrnX8+HFNmTJF+/btk4+Pj8LDwzV8+HB5eHgUuF3btm116tQpu/aDBw/K09PzVpULAAAAAJKcLFyZzWZFRUWpdu3aio2NVXJysqZOnaqMjAxNmDDhhts//vjjGjhwoE3bjUIZAAAAABjBqcLV0qVLlZ6erhkzZsjPz0+SlJubq0mTJik6OlrVq1cvcPuqVauqcePGt75QAAAAALiGU11ztW3bNrVs2dIarCSpY8eOysvLU0JCQskVBgAAAAA34FThKjExUUFBQTZtJpNJ1apVU2Ji4g23X7NmjRo2bKgHHnhAgwcP1k8//XSrSgUAAAAAG051WmBKSopMJpNdu6+vr8xmc4Hbtm3bViEhIbrjjjt08uRJzZ49W3369NHKlStVq1atYtdUoYJT5U8A1+B3FEBZw+saHIWxZjynClc3Y/z48dZ/P/jggwoLC1PHjh01b948TZw4sVh9urq6yN/fx6AKAdwK/I4CKGt4XYOjMNaM51ThymQyKTU11a7dbDbL19e3SH3ddtttatKkiY4cOVLsevLyLEpJuVzs7QFHKO8vjBcvppd0CQAMxusar2uOwlhjrBWWyVRRbm43nulzqnAVFBRkd21Vamqqzp49a3ctlqPk5OSVyH4BFA6/owDKGl7X4CiMNeM51YmWrVq10o4dO5SSkmJti4uLk6urq8LCworUV3Jysvbs2aNGjRoZXSYAAAAA2HGqmauIiAgtWrRIMTExio6OVnJysqZNm6aIiAib77iKiopSUlKS4uPjJUlr167V5s2b1bp1a9122206efKk5s6dKzc3Nz3zzDMldTgAAAAAyhGnCle+vr5auHChJk+erJiYGPn4+KhHjx4aMWKEzXp5eXnKzc21Pr7zzjt15swZvfHGG0pNTVXlypXVokULPf/88zd1p0AAAAAAKCynCleSVKdOHS1YsKDAdRYtWmTzuHHjxnZtAAAAAOBITnXNFQAAAACUVoQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxQoaQLgHPJysrSJ5/M1oYN65Samqo6depqyJB/qGnTFgVut3XrZq1a9ZWOH/+vUlLM8vPzV4MGDTVw4BAFBdW1Wffy5cv6+ONZ2rJlky5duqg77qipHj0i1K1bj1t5aAAAAMAtRbiCjddfn6gtWzapV68+uvPOWlq/fq1Gj35B06fP0f33N77udomJ/1XlypXVs2eE/Pz8dP78eX399WoNHhyl2bM/Vb169SVJubm5GjVqmI4e/Y+6d++pO+8M1Pff79S7705VamqK+vcf6KAjBQAAAIxFuILVjz8e1qZNG/Xssy+oT59ISVKHDp3Vv//TmjVrumbPnn/dbZ95ZrBdW9euT6pbt05aufJLvfjiOEl/znAdOnRQY8a8oi5dwiVJ3br10PjxL2nBgnnq2vVJ+fsH3IKjAwAAAG4trrmC1ZYtm+Tm5qbw8G7WNk9PT3XpEq7Dhw8qOfmPIvXn7x8gLy8vpaWlWtsOHtwnSWrX7nGbdR99tL2ysjL13Xdbb+IIAAAAgJLDzBWsjh37SbVqBcrHp5JN+733NpAk/fzzMVWvfnuBfaSmpionJ0cXLpzXsmWfKT09XU2aNLMuz8rKkpubmypUsB16Xl5ekqSffvqPpG4CAAAAShvCFazOnz+nKlWq2rVfbTt37uwN+4iOHqATJ36TJFWs6K2oqEHW0/8kKTDwLuXm5urIkcM213AdOLBfknT27I33AQAAADgjwhWsMjMz5e7ubtfu4eEhScrKyrxhH+PGvar09HQlJZ3SunWrlZmZqby8PLm6/nkG6mOPddCCBZ/ozTdf06hRL/3fDS3+rRUrvrDWAAAAAJRGhCtYeXp6Kjs72649KytLkuTh4XnDPho2DLH++9FH26tfv56SpGHDhkv6cxZs6tT3NHnyBI0YMUyS5OPjo+HDX9Trr0+Ut3fFmz0MAAAAoEQQrmBVpUrVfE/9O3/+nCSpatVqRerPZDIpNPRBxcevt4YrSWrcOFTLlq1SYuJ/deXKFdWtW9+631q1Aot/AAAAAEAJIlzBql69YO3bt0fp6Wk2N7X48cfD/7e8fpH7zMzMVFpaml27m5ub6tULtj7evft7SdKDDzYv8j4AAAAAZ8Ct2GHVps2jys3N1apVK6xtWVlZWrduje67r6H1ToF//PGHfvvtV5ttL168YNff6dNJ2rPnB91zz30F7vfixYtavHih6tSppwcfbFbgugAAAICzYuYKVg0aNNQjj7TTnDkzdOnSBdWsWUtxcWt1+nSSxox5xbrelCkTtH//Xm3fvtva1r9/hJo0aap69eqrcmWTfv/9hNauXa2cnBwNHTrMZj/Dhg1RgwaNdOedtXT+/DmtXr1CV65c0bRpH1hvfAEAAACUNoQr2Bg/fpI++aSGNmxYp9TUVNWpU1fTpn2gxo1DC9zuySef0s6dCdq1a6cuX06Xv3+AmjZtrv79B6pOnbo26wYH36PNm7/RuXNn5e3to6ZNm+vvfx+qmjXvvJWHBgAAANxSLhaLxVLSRfzV8ePHNWXKFO3bt08+Pj4KDw/X8OHDrbcDL4wFCxbozTffVJs2bTRnzpxi15Kbm6cLF9KLvT3gCNWqVZYkPfbcVGVk2d/tsSzy8nBXfOwYSdLZs6klXA0Ao119XXviiwnKyC0nr2tu7lrd8zVJvK450tWxNmNnf+XklY+vg6ng6qlhLf9HEmOtKAICfOTmduMzrJxq5spsNisqKkq1a9dWbGyskpOTNXXqVGVkZGjChAmF6uPs2bP66KOPVKVKlVtcLQAAAAD8P6cKV0uXLlV6erpmzJghPz8/SVJubq4mTZqk6OhoVa9e/YZ9vP3222rbtq2SkpJucbUAAAAA8P+c6u4B27ZtU8uWLa3BSpI6duyovLw8JSQk3HD73bt365tvvtGoUaNuYZUAAAAAYM+pwlViYqKCgoJs2kwmk6pVq6bExMQCt83NzdXkyZM1dOhQ3XbbbbeyTAAAAACw41SnBaakpMhkMtm1+/r6ymw2F7jtZ599pitXrmjAgAGG1lShglPlTwDX4HcUQFnD6xochbFmPKcKV8V1/vx5TZ8+XW+99VaR7ip4I66uLvL39zGsPwDG43cUQFnD6xochbFmPKcKVyaTSamp9reENJvN8vX1ve52H374oYKDg/Xggw8qJSVFkpSTk6OcnBylpKTI29tbFSoU/VDz8ixKSblc5O0ARyrvL4wXL/J1CUBZw+sar2uOwlhjrBWWyVSx9N2KPSgoyO7aqtTUVJ09e9buWqy/+uWXX/TDDz+oadOmdsuaNm2qjz/+WK1atSpWTTk5ecXaDoBj8DsKoKzhdQ2OwlgznlOFq1atWmn27Nk2117FxcXJ1dVVYWFh191u3Lhx1hmrq9544w15eXlp5MiRCg4OvqV1AwAAAIBThauIiAgtWrRIMTExio6OVnJysqZNm6aIiAib77iKiopSUlKS4uPjJUn33nuvXV8mk0ne3t5q3ry5w+oHAAAAUH45Vbjy9fXVwoULNXnyZMXExMjHx0c9evTQiBEjbNbLy8tTbm5uCVVZONWqVS7pEkrU2bP2184BAAAAZZlThStJqlOnjhYsWFDgOosWLbphP4VZBwAAAACM4nThqqzp+9TrysjIKukyHMLLy0OLv3q5pMsAAAAASgTh6hbLyMhSZkZ2SZcBAAAA4Bbja5kBAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAA1Qo6QIAALiVsrKy9Mkns7VhwzqlpqaqTp26GjLkH2ratEWB223dulmrVn2l48f/q5QUs/z8/NWgQUMNHDhEQUF1reuZzZf09derlZDwnX777Rfl5OQoMLC2nn66jx59tP2tPjwAgBNh5goAUKa9/vpEff75YrVv31EvvDBKbm5uGj36BR04sL/A7RIT/6vKlSurZ88IjRr1Tz355FM6duwnDR4cpZ9/PmZd7/DhQ5o7d6ZMJpP69x+kwYOflZeXl159dZzmzZtzi48OAOBMmLkCAJRZP/54WJs2bdSzz76gPn0iJUkdOnRW//5Pa9as6Zo9e/51t33mmcF2bV27Pqlu3Tpp5cov9eKL4yRJd98dpKVLV+j222tY1+vevaeGD39WixcvVJ8+/VWxYkWDjwwA4IyYuQIAlFlbtmySm5ubwsO7Wds8PT3VpUu4Dh8+qOTkP4rUn79/gLy8vJSWlmptu+OOmjbBSpJcXFz08MOtlZWVpaSkUzd3EACAUoOZKwBAmXXs2E+qVStQPj6VbNrvvbeBJOnnn4+pevXbC+wjNTVVOTk5unDhvJYt+0zp6elq0qTZDfd9/vx5SZKvr1/xigcAlDqEKwBAmXX+/DlVqVLVrv1q27lzZ2/YR3T0AJ048ZskqWJFb0VFDVKXLuEFbpOSYtbatat0//0PqGpV+/0DAMomwhUAoMzKzMyUu7u7XbuHh4ckKSsr84Z9jBv3qtLT05WUdErr1q1WZmam8vLy5Oqa/5n1eXl5mjTpFaWlpWr48Bdv7gAAAKUK4QoAUGZ5enoqOzvbrj0rK0uS5OHhecM+GjYMsf770Ufbq1+/npKkYcOG57v++++/rV27dmj8+EmqV69+MaoGAJRW3NACAFBmValSVefPn7Nrv9pWtWq1IvVnMpkUGvqg4uPX57t8/vy5WrHiCw0dOkwdOnQuesEAgFKNcAUAKLPq1QvWyZMnlJ6eZtP+44+H/2950WeWMjMzlZaWZtf+1VfLNH/+XPXq1Vv9+g0oVr0AgNKNcAUAKLPatHlUubm5WrVqhbUtKytL69at0X33NbTeKfCPP/7Qb7/9arPtxYsX7Po7fTpJe/b8oHvuuc+mfdOmjfrww3fUvn1HPffcSOMPBABQKnDNFQCgzGrQoKEeeaSd5syZoUuXLqhmzVqKi1ur06eTNGbMK9b1pkyZoP3792r79t3Wtv79I9SkSVPVq1dflSub9PvvJ7R27Wrl5ORo6NBh1vV+/PGwpkx5VSaTr5o0aaqNG21PGWzYMEQ1a9556w8WAFDiCFcAgDJt/PhJ+uSTGtqwYZ1SU1NVp05dTZv2gRo3Di1wuyeffEo7dyZo166dunw5Xf7+AWratLn69x+oOnXqWtf79ddflJ2drUuXLurNN1+z62fcuFcJVwBQThCuAABlmqenp2JiXlBMzAvXXWfGjLl2bYMGRWvQoOgb9t+pU1d16tT1pmoEAJQNXHMFAAAAAAYgXAEAAACAAQhXAAAAAGAAwhUAAAAAGIBwBQAAAAAGIFwBAAAAgAEIVwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABKpR0AQCA0qFatcolXUKJOns2taRLAAA4OWauAAAAAMAAzFwBAIrkkffe1pXsrJIuwyEqunto88gXS7oMAEApQbgCABTJlewsZWRnl3QZAAA4HU4LBAAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMMBNf8/V/v37tWvXLp0/f159+vRR7dq1deXKFSUmJqp27dry8fExok4AAAAAcGrFDldZWVkaOXKkNm3aJIvFIhcXFz3yyCOqXbu2XF1dNXDgQA0YMED/+Mc/jKwXAAAAAJxSsU8L/PDDD7VlyxZNnDhRcXFxslgs1mWenp7q0KGDNm3aVOR+jx8/rmeeeUaNGzdWWFiYpk2bpqysrBtuN3r0aLVv316NGzdW06ZN1bdvX23fvr3I+wcAAACA4ih2uPr6668VERGhp59+Wr6+vnbL69Spo5MnTxapT7PZrKioKGVnZys2NlYjRozQsmXLNHXq1Btum52drQEDBmjmzJmaNm2a/Pz8NGTIEO3evbtINQAAAABAcRT7tMDz588rODj4usvd3NyUkZFRpD6XLl2q9PR0zZgxQ35+fpKk3NxcTZo0SdHR0apevfp1t/3www9tHrdq1UqPPvqoVq1apQcffLBIdQAAAABAURV75qpGjRpKTEy87vK9e/cqMDCwSH1u27ZNLVu2tAYrSerYsaPy8vKUkJBQpL7c3NxUuXJlZWdnF2k7AAAAACiOYoerLl26aOnSpdq3b5+1zcXFRZK0bNkyrV+/Xk8++WSR+kxMTFRQUJBNm8lkUrVq1QoMcldZLBbl5OTo4sWLmjdvnn777Tc9/fTTRaoBAAAAAIqj2KcFDh06VAcOHFC/fv0UFBQkFxcXvfnmmzKbzfrjjz/UunVrDRgwoEh9pqSkyGQy2bX7+vrKbDbfcPsvv/xS48ePlyR5e3vr/fff1wMPPFCkGq5VoQJfBVYcPG9wFMYaHIWxBkdhrMFRGGvGK3a48vDw0CeffKLVq1drw4YNysvLU1ZWloKDgzV8+HCFh4dbZ7Ic5dFHH9U999yjixcvKi4uTsOHD9eMGTPUunXrYvXn6uoif3++p6s4eN7gKIw1OApjDY7CWIOjMNaMV6xwlZGRoffff1/NmzdXeHi4wsPDDSnGZDIpNTXVrt1sNud7R8JrBQQEKCAgQNKfN7Qwm816++23ix2u8vIsSkm5XKxty/tgvXgxvaRLKDcYa4w1R2GsMdYchbHGWHMUxhpjrbBMpopyc7vxTF+xwpWXl5c+//xz1a1btzibX1dQUJDdtVWpqak6e/as3bVYhdGgQQNt27btpmrKycm7qe3LK543OApjDY7CWIOjMNbgKIw14xX7RMsGDRro2LFjRtaiVq1aaceOHUpJSbG2xcXFydXVVWFhYUXub8+ePapVq5aRJQIAAABAvop9zdW4ceM0ZMgQ1a9fX926dVOFCsXuyioiIkKLFi1STEyMoqOjlZycrGnTpikiIsLmO66ioqKUlJSk+Ph4SdKWLVu0cuVKtWnTRjVq1JDZbNbatWu1fft2vffeezddFwAAAADcSLET0ZgxY+Ti4qIJEyZoypQpql69ujw9PW3WcXFx0erVqwvdp6+vrxYuXKjJkycrJiZGPj4+6tGjh0aMGGGzXl5ennJzc62Pa9WqpaysLL377ru6ePGi/P39FRwcrEWLFqlZs2bFPUQAAAAAKLRihys/Pz/5+fnp7rvvNrIe1alTRwsWLChwnUWLFtltM3PmTEPrAAAAAICiKHa4ujbgAAAAAEB5xjeHAQAAAIABbuouFLm5uVq9erW2bNmipKQkSdIdd9yhRx55RF27dpWbm5shRQIAAACAsyt2uEpNTdWgQYN06NAh+fj4WG95vmPHDm3cuFFLlizRvHnzVKlSJcOKBQAAAABnVexw9f777+vIkSMaP368evXqJXd3d0lSdna2vvjiC73++ut6//339corrxhWLAAAAAA4q2JfcxUfH6/evXurb9++1mAlSe7u7urTp4969+6tDRs2GFIkAAAAADi7YoerS5cuFXgb9rvvvltms7m43QMAAABAqVLscHXXXXfp22+/ve7yb7/9VoGBgcXtHgAAAABKlWKHq969eyshIUGDBw/W9u3b9fvvv+v333/Xd999pyFDhmjHjh3q27evkbUCAAAAgNMq9g0t+vbtqwsXLmju3Lnavn27bacVKigmJkZ9+vS56QIBAAAAoDS4qe+5eu6559S3b1/t3LlTp06dkiTVrFlTLVu2VEBAgCEFAgAAAEBpcFPhSpICAgLUuXNnI2oBAAAAgFKr2Ndc7dixQ++99951l7///vvauXNncbsHAAAAgFKl2OFq5syZOn369HWXJycna9asWcXtHgAAAABKlWKHq2PHjun++++/7vJGjRrpp59+Km73AAAAAFCqFDtcZWVlKTs7u8DlGRkZxe0eAAAAAEqVYoerevXqKT4+Pt9lFotFGzduVJ06dYpdGAAAAACUJsUOV/369dPevXv1/PPP66efflJOTo5ycnJ09OhRvfDCC9q/f78iIyONrBUAAAAAnFaxb8UeHh6ukydPaubMmYqPj5er6585LS8vTy4uLvrHP/6hbt26GVYoAAAAADizm/qeq2HDhumJJ55QfHy8Tp48KUkKDAxUu3btFBgYaEiBAAAAAFAaFPu0wKsCAwM1aNAgRUZGqlq1ajpx4oS2bNmitLQ0I+oDAAAAgFKhSDNX//rXv7Ro0SItWbJEAQEB1vbNmzfr+eefV05OjiwWiyRp0aJF+vzzz23WAwAAAICyqkgzV99++61q1aplE5hycnL08ssvy83NTW+88YbWrFmjUaNGKSkpSbNnzza8YAAAAABwRkUKV//973/VuHFjm7Zdu3bpwoULioqKUrdu3VSvXj0NHjxYHTp00NatW42sFQAAAACcVpHC1aVLl3T77bfbtO3cuVMuLi567LHHbNpDQ0N1+vTpm68QAAAAAEqBIoWrqlWr6ty5czZtu3fvlpeXl+655x6bdg8PD7m7u998hQAAAABQChQpXDVs2FArVqyw3gnw559/1qFDh/Twww+rQgXbe2MkJibazXIBAAAAQFlVpLsFxsTEqEePHnr88cdVt25dHTlyRC4uLhoyZIjduvHx8WrRooVhhQIAAACAMyvSzFVwcLAWLlyoBg0a6MyZM7r//vs1d+5cNWzY0Ga9Xbt2qWLFiurQoYOhxQIAAACAsyrSzJX0540q5s6dW+A6zZs315o1a4pdFAAAAACUNkWauQIAAAAA5I9wBQAAAAAGIFwBAAAAgAEIVwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgHAFAAAAAAYgXAEAAACAAQhXAAAAAGAAwhUAAAAAGIBwBQAAAAAGIFwBAAAAgAEIVwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgHAFAAAAAAYgXAEAAACAAQhXAAAAAGAAwhUAAAAAGIBwBQAAAAAGIFwBAAAAgAEIVwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgHAFAAAAAAYgXAEAAACAAQhXAAAAAGAAwhUAAAAAGIBwBQAAAAAGqFDSBVzr+PHjmjJlivbt2ycfHx+Fh4dr+PDh8vDwuO42Z86c0YIFC5SQkKATJ06ocuXKatq0qUaOHKmaNWs6sHoAAAAA5ZVThSuz2ayoqCjVrl1bsbGxSk5O1tSpU5WRkaEJEyZcd7sjR44oPj5eTz31lO6//35dvHhRs2bNUs+ePbV27VoFBAQ48CgAAAAAlEdOFa6WLl2q9PR0zZgxQ35+fpKk3NxcTZo0SdHR0apevXq+2zVp0kTr169XhQr/fzihoaFq06aNVq5cqYEDBzqifAAAAADlmFNdc7Vt2za1bNnSGqwkqWPHjsrLy1NCQsJ1tzOZTDbBSpJuv/12BQQE6MyZM7eqXAAAAACwcqpwlZiYqKCgIJs2k8mkatWqKTExsUh9/fLLLzp//rzq1KljZIkAAAAAkC+nOi0wJSVFJpPJrt3X11dms7nQ/VgsFk2ZMkW33XabOnfufFM1VajgVPmz1OB5g6Mw1uAojDU4CmMNjsJYM55ThSujxMbG6t///rc++eQTeXt7F7sfV1cX+fv7GFhZ+cHzBkdhrMFRGGtwFMYaHIWxZjynClcmk0mpqal27WazWb6+voXqY9myZfroo4/0+uuvq2XLljdVT16eRSkpl4u1bXkfrBcvppd0CeUGY42x5iiMNcaaozDWGGuOwlhjrBWWyVRRbm43nulzqnAVFBRkd21Vamqqzp49a3ctVn7i4+M1ceJEPf/88+rRo4chNeXk5BnST3nD8wZHYazBURhrcBTGGhyFsWY8pzrRslWrVtqxY4dSUlKsbXFxcXJ1dVVYWFiB2+7atUsjR45Uz549FRMTc6tLBQAAAAAbThWuIiIi5OPjo5iYGG3fvl1fffWVpk2bpoiICJvvuIqKitJjjz1mfXz8+HHFxMSodu3aCg8P1/79+60/J06cKIlDAQAAAFDOONVpgb6+vlq4cKEmT56smJgY+fj4qEePHhoxYoTNenl5ecrNzbU+PnDggFJTU5WamqrevXvbrNutWzdNnTrVIfUDAAAAKL+cKlxJUp06dbRgwYIC11m0aJHN4+7du6t79+63sCoAAAAAKJhTnRYIAAAAAKUV4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAATheujh8/rmeeeUaNGzdWWFiYpk2bpqysrBtut3jxYkVHR6tFixYKDg5WXFycA6oFAAAAgD85Vbgym82KiopSdna2YmNjNWLECC1btkxTp0694barVq3SxYsX1bp1awdUCgAAAAC2KpR0AX+1dOlSpaena8aMGfLz85Mk5ebmatKkSYqOjlb16tUL3NbV1VW///67Vq5c6ZiCAQAAAOD/ONXM1bZt29SyZUtrsJKkjh07Ki8vTwkJCQVu6+rqVIcCAAAAoJxxqkSSmJiooKAgmzaTyaRq1aopMTGxhKoCAAAAgBtzqtMCU1JSZDKZ7Np9fX1lNptLoCKpQgWnyp+lBs8bHIWxBkdhrMFRGGtwFMaa8ZwqXDkbV1cX+fv7lHQZpRLPGxyFsQZHYazBURhrcBTGmvGcKlyZTCalpqbatZvNZvn6+jq8nrw8i1JSLhdr2/I+WC9eTC/pEsoNxhpjzVEYa4w1R2GsMdYchbHGWCssk6mi3NxuPNPnVOEqKCjI7tqq1NRUnT171u5aLEfJyckrkf2WdjxvcBTGGhyFsQZHYazBURhrxnOqEy1btWqlHTt2KCUlxdoWFxcnV1dXhYWFlWBlAAAAAFAwp5q5ioiI0KJFixQTE6Po6GglJydr2rRpioiIsPmOq6ioKCUlJSk+Pt7adujQIZ06dUoXLlyQJB04cECSFBAQoGbNmjn2QAAAAACUO04Vrnx9fbVw4UJNnjxZMTEx8vHxUY8ePTRixAib9fLy8pSbm2vTtnjxYq1YscL6eP78+ZKkZs2aadGiRbe+eAAAAADlmlOFK0mqU6eOFixYUOA6+YWlqVOnaurUqbeoKgAAAAAomFNdcwUAAAAApRXhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuAIAAAAAAxCuAAAAAMAAhCsAAAAAMADhCgAAAAAMQLgCAAAAAAMQrgAAAADAAIQrAAAAADAA4QoAAAAADEC4AgAAAAADOF24On78uJ555hk1btxYYWFhmjZtmrKysm64ncVi0dy5c9WmTRuFhITo6aef1v79+299wQAAAAAgJwtXZrNZUVFRys7OVmxsrEaMGKFly5Zp6tSpN9z2448/1vTp0zVgwADNmTNH1apV08CBA3Xy5EkHVA4AAACgvKtQ0gX81dKlS5Wenq4ZM2bIz89PkpSbm6tJkyYpOjpa1atXz3e7zMxMzZkzRwMHDtSAAQMkSU2aNFGHDh00b948TZw40TEHAAAAAKDccqqZq23btqlly5bWYCVJHTt2VF5enhISEq673d69e5WWlqaOHTta2zw8PPTYY49p27Ztt7JkAAAAAJDkZDNXiYmJeuqpp2zaTCaTqlWrpsTExAK3k6SgoCCb9jp16mjhwoXKyMiQl5eX8QUXgpeXR4nstySUp2N1Rl6e7iVdgsOUp2N1RhXdy8/venk6VmfkVaH8PP/l6VidkbubZ0mX4DDl6VhLglOFq5SUFJlMJrt2X19fmc3mArfz8PCQp6ftYDGZTLJYLDKbzcUKV66uLgoI8Cnydn+1+KuXb2r70upmnzcU3Zp3RpV0CSWCseZ4m0e+WNIllAjGmuMt6za+pEsoEYw1x4tu9nFJl1AiGGuF5+rqUqj1nCpcORsXFxe5uRXuiYQtNzenOuMUZRhjDY7CWIOjMNbgKIw14znVM2oymZSammrXbjab5evrW+B2WVlZyszMtGlPSUmRi4tLgdsCAAAAgBGcKlwFBQXZXVuVmpqqs2fP2l1Pde12kvTLL7/YtCcmJuqOO+4oseutAAAAAJQfThWuWrVqpR07diglJcXaFhcXJ1dXV4WFhV13u9DQUFWqVEnr16+3tmVnZ2vjxo1q1arVLa0ZAAAAACQnu+YqIiJCixYtUkxMjKKjo5WcnKxp06YpIiLC5juuoqKilJSUpPj4eEmSp6enoqOjFRsbq4CAANWvX19LlizRpUuXNGjQoJI6HAAAAADliFOFK19fXy1cuFCTJ09WTEyMfHx81KNHD40YMcJmvby8POXm5tq0DR48WBaLRfPnz9eFCxd07733at68eapVq5YjDwEAAABAOeVisVgsJV0EAAAAAJR2TnXNFQAAAACUVoQrAAAAADAA4QoAAAAADEC4AgAAAAADEK4AAAAAwACEKwAAAAAwAOEKAAAAAAxAuIIkKTY2VsHBwdafRo0aqWPHjvr444+Vl5dnXS84OFjz5s0rdD9//Zk7d26h+nnwwQcVGxtr3MHBcNf+Pzdv3ly9e/fW1q1b7dY1m81666231K5dOzVs2FB/+9vfNHLkSB0/ftxu3aKOC4vFotWrV6t///5q1qyZGjZsqIcffljPP/+8XS2RkZHXHZv79+8v8HgPHTqksWPHqmPHjrrnnnsUHR19g2cIRilvY23p0qUaOHCgwsLCFBoaql69eumbb765wbMEI5S3sTZmzBh16dLlBs8KbpXrfV766/9JQkKCRo0apXbt2ik4OFivvfZaofvfv3+//v73vyssLEwhISFq27atnn/+eR04cOBWHA7+okJJFwDn4eXlpYULF0qSMjIytGvXLr377ruyWCwaMmRIsfr5qxo1ahhWK0reX/+fz5w5o9mzZ2vo0KFavHixQkNDJUlnz55Vv379ZDabNXToUN133336448/NH/+fPXo0UNz585V06ZNi7V/i8Wi0aNHa926dXryyScVGRkpPz8/JSUlaf369RoyZIjWr1+voKAg6zahoaH65z//addXvXr1CtzX3r17tXv3boWEhCgzM7NY9aL4ytNYmz17th566CH17t1b3t7eiouLU0xMjKZOnapu3boVq34UXnkaayh5+X1e8vLysv77u+++09GjR9W0aVOZzeZC97tnzx71799fDz/8sCZNmiQfHx/99ttv+uabb3Tw4EHdf//9hh0D7BGuYOXq6qrGjRtbH7do0ULHjh3Txo0bixSuru0HZdO1/8/333+/WrdurZUrV1o/hEyaNElJSUlauXKl6tSpY123Xbt26tGjh0aNGqX4+Hh5enoWef+fffaZ1q5dqzfffFPdu3e3WRYeHq6tW7eqYsWKNu0mk6lYYzMyMlJRUVHWf8OxytNYW758uQICAqyPw8LCdOrUKc2fP59w5QDlaayh5N3o89JLL72kMWPGSJJ27dpV6H6XLFmimjVr6qOPPpKbm5skqWXLloqIiLA5G+lWyc3NVV5entzd3W/5vpwRpwWiQD4+PsrJySnpMlAKVK9eXQEBAUpKSpIknTp1St98842efPJJmw8gkuTt7a2hQ4cqOTlZ69evL9b+Pv30UzVq1MjuA8hVrVu3Nmy21NWVl0pnUpbH2l+D1VX33nuvzpw5Y0j/KJqyPNbg/Ir73pOSkqKAgABrsCqoz3379mngwIEKDQ3VAw88oJ49eyohIcG6/NKlSxo7dqyaN2+ukJAQRURE6IcffrDpIzIyUtHR0VqxYoUef/xxNWrUSEePHpUkbdmyRT179lRISIhatGihV199VZcvXy7WcZUWfGKAjZycHOXk5CgtLU2bNm3Sxo0b9fjjjxe7n7/+oGxLT0+X2WzWnXfeKUn64YcfZLFY9Mgjj+S7ftu2bSVJu3fvLvK+Tp8+rZMnTyosLKxI21ksFrtxmZubW+T9o2SVt7G2Z88em9PA4DjlbazB8a79v7NYLDfdZ4MGDbRv3z598MEH+V4HeNWePXsUGRmprKwsTZkyRbGxsXr00Uetf0zIzc3V4MGDtXnzZo0ePVoffvihvL299cwzz+jw4cM2fR0+fFjz5s3TCy+8oLlz56pGjRqKi4vTP/7xD9WvX18zZszQiy++qPj4eL388ss3fYzOjNMCYXX58mU1aNDApq1Tp05FOiXwev1I0uLFi/Xggw/eVI1wLldD85kzZ/T222/Lx8dH/fv3t7ZJ0h133JHvtpUqVZLJZNIff/xR5P1e7fvav+BaLBabDxVubm5ycXGxPt66davd2HRzc9OPP/5Y5BrgWOV1rK1Zs0b79u3TRx99VNTSUUzldazB8fL7vDRt2jSFh4ffVL+DBg3SgQMHNGvWLM2aNUt+fn7Wazn/+jns7bff1l133aWFCxdaZ7keeugh6/ItW7bo4MGD+uSTT/Twww9bl7dv315z5syxuRmL2WzWl19+aR2/FotF06ZNU6dOnfT6669b16tWrZqGDBmiZ599tsxeF0i4gpWXl5f+9a9/SZKysrJ05MgRTZ8+XePHj9ebb75ZrH7+ir+8li3Xvim4ublp5syZDv1//usHDEmaP3++pk2bZn380ksvadCgQdbHTZo00dixY6/bR25urs1fDStU4CXSGZTXsXb06FG9+uqr6t69u9q1a3fTx4AbK69jDSUjv89LtWrVuul+K1WqpPnz5+vgwYPasmWL9uzZow0bNujrr7/W5MmT1bNnT125ckUHDhzQyJEj8z19UPpzBrZSpUrWYCVJ7u7ueuyxx7R27VqbdevXr2/zh4FffvlFp06d0rhx42zOXmrWrJlcXV11+PBhwhXKPldXVzVq1Mj6uEmTJsrNzdXUqVP1zDPPqH79+sXqJz9ubm7XPW0hNzeXF/9S4OqbgsVi0a+//qp3331X//znP7VmzRrddtttuu222yRJSUlJuueee+y2T0tLU0pKim6//XZrW2HHxdW+r/3rcHh4uJo1ayZJ6tGjh10flStXLnBsPvbYYzp16pT18aZNm6ynA6HklMexdurUKQ0ePFghISFFuv0ybk55HGsoOYX5vHQzQkJCFBISIkk6efKkIiMj9c4776hnz55KSUlRXl6eddzlJyUlRVWqVLFrr1q1qt3dC6tWrWrz+OLFi5KkmJiYfPs+ffp0kY6lNOETLAp09a91//3vfwsdrgojICBA586ds2tPS0vT5cuX8/1lhnP565tCSEiI7r77bvXq1UsfffSRJk2apKZNm8rFxUVbtmyxXofwV1u2bJEkm1MUCjsuatSooVq1aikhIUEvvPCCdb2qVavavcAXxaxZs5SVlWV9XNCbDhynvI21CxcuaNCgQapSpYpmzJhRbu+4VRLK21hD+VGrVi116NBBn376qc6dO6fKlSvL1dW1wJvl+Pr66vz583bt586dk6+vr03btTOufn5+kqQJEyZYA95fleVxyA0tUKCff/5ZkuTv729ov02bNtXWrVvtbnRx9csymzRpYuj+cOs1atRInTt31vLly3X27FnVrFlT7dq108qVK/XLL7/YrHvlyhXNnj1bt99+uzp27GhtL8q4eOaZZ3TgwAGtXLnSsGO4+gXaV388PDwM6xvGKctjLT09XYMHD1Z2drbmzp2rSpUqGbZPFF1ZHmsou/IL85L066+/ysPDQyaTSd7e3mrcuLFWrVp13ZnVJk2aKC0tTdu3b7e25eTk6Jtvvrnh57SgoCDdfvvtOnnypM34u/pTvXr14h+gk2PmClZ5eXnWb3TPzs7WkSNHNGvWLNWtW9fmr3DHjh1TXFyczbbe3t5q1aqVXT9/VaVKFeu5xNHR0erVq5cGDBigPn36yM/PTwcOHNDs2bPVtWtXu1vconR49tlntW7dOi1cuFCjR4/Wq6++qn79+qlv376Kjo7Wfffdp+TkZM2fP1+nTp3S3Llzbb4Lpijjok+fPtq7d6/Gjh2rXbt2qW3btvL399elS5esbwQ+Pj429aWkpOQ7NgMDA/O9BfZVFy5c0Pfff2/9d3p6uvV3oHXr1nbfO4Nbr6yOteeee05Hjx7V66+/rqSkJOtduyTxXUYlpKyONenP2bNr388lqXnz5ob/URVFd+rUKR06dEjSn+H9xIkT1v+vDh06XHe78ePHKzc3V+3bt1ft2rWVlpamDRs2aPPmzYqKirIG7FGjRmnAgAHWsenr66sjR47I399fPXr0UJs2bRQSEqIXX3xRo0aNUtWqVbVo0SKdOXNG06dPL7B2FxcXjRkzRqNHj9bly5fVpk0bVaxYUUlJSdq6datGjBihu+++26Bnyrm4WIy45yNKvdjYWM2YMcP6uEKFCrr99tvVqlUrDRs2zHraQnBwcL7bBwYGKj4+3q6fv+rRo4fNHWMOHTqk6dOna9++fcrIyNAdd9yhJ554QtHR0ZwG4+RiY2M1f/587du3z27Z6NGjtXnzZm3ZskWVK1eW2WzW7NmztXHjRiUnJ6ty5cpq0aKFhg0blm+ILsq4sFgsWr16tb788ksdPXpUV65ckb+/vxo3bqynnnpKbdq0sa4bGRlpDUjXutHdmXbt2mW9W9i1uH7h1ipvY+16r7GS9NNPP113GW5eeRtrY8aM0YoVK/Jdxt19b72CxttVy5cvt7tZyVUFvR589913WrlypQ4cOKCzZ8/Ky8tLgYGBevrpp9WtWzebG1js3btXH3zwgQ4ePChXV1fVq1dPw4cPV8uWLSX9ee3UtGnT9O2331pv+DJy5EjrdYDSn+PQ29tbc+bMsaslISFBs2fPtt66vWbNmnr44Yf17LPPqnLlygU/SaUU4QoAAAAADMA1VwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgHAFAAAAAAYgXAEAYLDly5crODhYv//+e0mXAgBwIMIVAKBUuxpkgoODtXv3brvlFotFrVu3VnBwsKKjo4vc/+LFi7V8+XIjSgUAlHGEKwBAmeDp6am1a9fatX///ff6448/5OHhUax+lyxZohUrVhRpm/DwcB08eFA1a9Ys1j4BAKUT4QoAUCa0bt1acXFxysnJsWlfu3atGjRooGrVqt3yGi5fvixJcnNzk6enp1xcXG75PgEAzoNwBQAoEzp37qxLly4pISHB2paVlaUNGzaoa9euduvn5eVpwYIF6ty5sxo1aqS//e1vmjBhgsxms3Wdtm3b6ueff9b3339vPfUwMjJS0v+fjvj9999r4sSJatmypVq3bm2z7NprrrZu3ap+/frpgQceUGhoqJ566imtWbPmVjwdAIASUKGkCwAAwAg1a9ZU48aN9fXXX1tDzrZt25SamqpOnTpp0aJFNutPmDBBK1asUPfu3RUZGanff/9dixcv1o8//qglS5bI3d1d48aN0+TJk+Xt7a2hQ4dKkqpWrWrTz6RJkxQQEKCYmBjrzFV+li9frnHjxqlevXqKjo5W5cqV9Z///EffffddvuEPAFD6EK4AAGVG165d9e677yojI0NeXl5as2aNmjZtqurVq9ust3v3bn3xxRd65513bIJN8+bN9fe//11xcXHq2rWr2rVrpw8++ED+/v4KDw/Pd5++vr5asGCB3NzcrltXamqqpkyZopCQEC1atEienp7WZRaL5SaPGgDgLDgtEABQZnTs2FGZmZnavHmz0tLStGXLlnxnheLi4lS5cmWFhYXpwoUL1p8GDRrI29tbu3btKvQ+e/XqVWCwkqSEhASlp6dryJAhNsFKEtdlAUAZwswVAKDMCAgIUMuWLbV27VplZGQoNzdXjz/+uN16v/32m1JTU9WyZct8+zl//nyh93nnnXfecJ0TJ05IkurVq1fofgEApQ/hCgBQpnTp0kWvvPKKzp07p1atWslkMtmtk5eXpypVquidd97Jt4+AgIBC7+/amSgAQPlFuAIAlCmPPfaYXn31Ve3fv1/vv/9+vusEBgZq586dCg0NlZeXV4H9GXHaXmBgoCTp559/1l133XXT/QEAnBPXXAEAyhQfHx9NnDhRzz33nNq2bZvvOh07dlRubq5mzpxptywnJ0cpKSnWxxUrVrR5XBwPPfSQfHx8NGfOHGVmZtos44YWAFB2MHMFAChzunXrVuDyZs2a6emnn9acOXP0n//8R2FhYXJ3d9evv/6quLg4vfzyy+rQoYMkqUGDBlqyZIlmzpypu+66y3pdV1FUqlRJY8eO1fjx49WjRw916dJFJpNJR48eVUZGht56661iHysAwHkQrgAA5dJrr72mhg0baunSpXr//ffl5uammjVr6oknnlBoaKh1vZiYGCUlJemTTz5Renq6mjVrVuRwJUk9e/ZUlSpVNHfuXM2cOVMVKlRQUFCQBgwYYOBRAQBKkouF8xEAAAAA4KZxzRUAAAAAGIBwBQAAAAAGIFwBAAAAgAEIVwAAAABgAMIVAAAAABiAcAUAAAAABiBcAQAAAIABCFcAAAAAYADCFQAAAAAYgHAFAAAAAAYgXAEAAACAAQhXAAAAAGAAwhUAAAAAGOB/AdJXnMSrPqgXAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Scores and labels\n", "scores = [\n", " average_metrics['bleu'],\n", " average_metrics['rouge1'],\n", " average_metrics['rouge2'],\n", " average_metrics['rougel'],\n", " average_metrics['f1']\n", "]\n", "labels = ['BLEU', 'ROUGE-1', 'ROUGE-2', 'ROUGE-L', 'F1 Score']\n", "\n", "# Set the style for seaborn\n", "sns.set(style=\"darkgrid\")\n", "\n", "# Create the bar plot\n", "plt.figure(figsize=(10, 6))\n", "bar_plot = sns.barplot(x=labels, y=scores, palette= 'viridis', linewidth=1.5)\n", "\n", "# Set the width of the bars\n", "for bar in bar_plot.patches:\n", " bar.set_width(0.4) # Set the width of the bars\n", "\n", "# Adding scores on top of each bar\n", "for index, value in enumerate(scores):\n", " bar_plot.text(index, value + 0.01, f'{value:.2f}', ha='center', va='bottom')\n", "\n", "# Adding title and labels\n", "plt.title('Evaluation Metrics')\n", "plt.xlabel('Metric')\n", "plt.ylabel('Score')\n", "\n", "# Display the plot\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-10-20T12:15:40.798067Z", "iopub.status.busy": "2024-10-20T12:15:40.797655Z", "iopub.status.idle": "2024-10-20T12:17:12.800263Z", "shell.execute_reply": "2024-10-20T12:17:12.799371Z", "shell.execute_reply.started": "2024-10-20T12:15:40.798030Z" }, "id": "XB-YOUqlCRmD", "outputId": "3a48cb0f-97c1-4baa-f580-e47340e287dd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average ROUGE-1: 0.5453534984095015\n", "Average ROUGE-2: 0.3224320844012193\n", "Average ROUGE-L: 0.5431188724586591\n", "Average BLEU: 0.39141109481540254\n", "Average F1 Score: 0.5453534984095015\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Print the average metrics\n", "print('Average ROUGE-1:', average_metrics['rouge1'])\n", "print('Average ROUGE-2:', average_metrics['rouge2'])\n", "print('Average ROUGE-L:', average_metrics['rougel'])\n", "print('Average BLEU:', average_metrics['bleu'])\n", "print('Average F1 Score:', average_metrics['f1']) # Print the average F1 score\n", "\n", "# Prepare the DataFrame for plotting\n", "metrics_df = pd.DataFrame({\n", " 'Index': range(len(metrics['rouge1'])), # Assuming all metrics have the same length\n", " 'ROUGE-1': metrics['rouge1'],\n", " 'ROUGE-2': metrics['rouge2'],\n", " 'ROUGE-L': metrics['rougel'],\n", " 'BLEU': metrics['bleu'],\n", " 'F1 Score': metrics['f1']\n", "})\n", "\n", "# Smooth the metrics using a Gaussian filter\n", "smoothed_metrics_df = pd.DataFrame({\n", " 'Index': metrics_df['Index'],\n", " 'ROUGE-1': gaussian_filter1d(metrics_df['ROUGE-1'], sigma=2),\n", " 'ROUGE-2': gaussian_filter1d(metrics_df['ROUGE-2'], sigma=2),\n", " 'ROUGE-L': gaussian_filter1d(metrics_df['ROUGE-L'], sigma=2),\n", " 'BLEU': gaussian_filter1d(metrics_df['BLEU'], sigma=2),\n", " 'F1 Score': gaussian_filter1d(metrics_df['F1 Score'], sigma=2)\n", "})\n", "\n", "# Set the style for seaborn\n", "sns.set(style=\"darkgrid\")\n", "\n", "# Create subplots for each metric\n", "fig, axs = plt.subplots(3, 2, figsize=(15, 12), constrained_layout=True)\n", "\n", "# Plot each metric\n", "axs[0, 0].plot(smoothed_metrics_df['Index'], smoothed_metrics_df['ROUGE-1'], color='blue', linewidth=2)\n", "axs[0, 0].set_title('Smoothed ROUGE-1')\n", "axs[0, 0].set_xlabel('Sample Index')\n", "axs[0, 0].set_ylabel('Score')\n", "axs[0, 0].grid()\n", "\n", "axs[0, 1].plot(smoothed_metrics_df['Index'], smoothed_metrics_df['ROUGE-2'], color='orange', linewidth=2)\n", "axs[0, 1].set_title('Smoothed ROUGE-2')\n", "axs[0, 1].set_xlabel('Sample Index')\n", "axs[0, 1].set_ylabel('Score')\n", "axs[0, 1].grid()\n", "\n", "axs[1, 0].plot(smoothed_metrics_df['Index'], smoothed_metrics_df['ROUGE-L'], color='green', linewidth=2)\n", "axs[1, 0].set_title('Smoothed ROUGE-L')\n", "axs[1, 0].set_xlabel('Sample Index')\n", "axs[1, 0].set_ylabel('Score')\n", "axs[1, 0].grid()\n", "\n", "axs[1, 1].plot(smoothed_metrics_df['Index'], smoothed_metrics_df['BLEU'], color='red', linewidth=2)\n", "axs[1, 1].set_title('Smoothed BLEU Score')\n", "axs[1, 1].set_xlabel('Sample Index')\n", "axs[1, 1].set_ylabel('Score')\n", "axs[1, 1].grid()\n", "\n", "axs[2, 0].plot(smoothed_metrics_df['Index'], smoothed_metrics_df['F1 Score'], color='purple', linewidth=2)\n", "axs[2, 0].set_title('Smoothed F1 Score')\n", "axs[2, 0].set_xlabel('Sample Index')\n", "axs[2, 0].set_ylabel('Score')\n", "axs[2, 0].grid()\n", "\n", "# Remove the empty subplot\n", "fig.delaxes(axs[2, 1])\n", "\n", "# Show the plots\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": { "id": "s2uxqxCHCRmD" }, "source": [ "This code defines a system for evaluating question-answering models based on various metrics, such as ROUGE, BLEU, and F1 score. The `calculate_metrics` function processes a dataset of questions, contexts, and answers to generate predicted answers using a pre-trained model. It computes ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores for each prediction, as well as an F1 score using precision and recall from the ROUGE-1 score.\n", "\n", "The code also smooths the metric values using a Gaussian filter, which helps to visualize trends more clearly. It plots the smoothed ROUGE-1, ROUGE-2, ROUGE-L, BLEU, and F1 scores in subplots to track the model's performance across different samples. The final metrics and smoothed visualizations provide insights into how well the model answers questions relative to the ground truth, helping to assess its accuracy and effectiveness in the task.\n", "\n", "This code evaluates the performance of a question-answering model using various metrics to assess its accuracy and effectiveness. It includes two primary functions: `calculate_metrics` and a visualization section.\n", "\n", "1. **`calculate_metrics(val_df)`**:\n", " - This function takes a validation DataFrame (`val_df`) containing questions, contexts, and true answers.\n", " - It initializes a dictionary to store ROUGE-1, ROUGE-2, ROUGE-L, BLEU, and F1 scores.\n", " - Using the `RougeScorer`, it computes the ROUGE scores for each predicted answer against the true answer. For each entry, it:\n", " - Calls the `answer_question` function to generate a predicted answer.\n", " - Appends the computed ROUGE scores, BLEU score (based on the predicted and true answers), and F1 score (calculated using precision and recall from ROUGE-1) to the metrics dictionary.\n", " - The function returns the complete metrics, along with lists of true answers and predicted answers.\n", "\n", "2. **Metric Calculation and Averaging**:\n", " - After calculating metrics for all questions, the code computes the average values for ROUGE-1, ROUGE-2, ROUGE-L, BLEU, and F1 score to provide an overall assessment of the model's performance.\n", "\n", "3. **Length Calculation**:\n", " - The code calculates the lengths of the actual and predicted answers, storing these lengths in lists for further analysis.\n", "\n", "4. **Visualization**:\n", " - It prepares a DataFrame (`lengths_df`) to facilitate plotting the lengths of actual and predicted answers.\n", " - Finally, it creates a separate DataFrame (`metrics_df`) for visualizing the individual metric scores over different samples, applying a Gaussian filter to smooth the results.\n", " - The code generates subplots for each metric (ROUGE-1, ROUGE-2, ROUGE-L, BLEU, and F1 score), allowing for a visual assessment of the model's performance trends.\n", "\n", "Overall, this evaluation framework enables an in-depth analysis of how well the question-answering model performs across various metrics, providing insights into its strengths and weaknesses.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "gDIL9xHlCRmD" }, "source": [ "### Inference on Model Evaluation Results\n", "\n", "The evaluation metrics we’ve gathered shed light on how well our question-answering model is performing, especially when it comes to understanding context and generating responses. Here’s a closer look at what these numbers mean and how they might impact real-world business applications:\n", "\n", "1. **Average ROUGE-1: 0.545**:\n", " - This metric measures how many of the individual words in the model's answers overlap with those in the reference answers. A score of 0.545 indicates a decent level of agreement.\n", " - **Implication**: For businesses, this suggests that our model can pick up key information quite well. Imagine a customer support chatbot that can understand and respond to common inquiries accurately—that’s what this score implies. Customers would likely receive relevant answers to their questions, improving their experience.\n", "\n", "2. **Average ROUGE-2: 0.322**:\n", " - ROUGE-2 looks at pairs of consecutive words. A score of 0.322 indicates that the model struggles a bit more here.\n", " - **Implication**: This tells us that while our model is good at picking out individual words, it sometimes misses the mark when it comes to understanding phrases or context. In a business setting, this could impact things like product descriptions or service explanations, where precise language is essential for clarity and customer understanding.\n", "\n", "3. **Average ROUGE-L: 0.543**:\n", " - This metric considers the longest common subsequence of words in the answers. A score of 0.543 suggests the model maintains a fair amount of coherence in its responses.\n", " - **Implication**: This is important for user engagement. Think about virtual assistants that need to provide detailed answers—this score indicates that the model can create responses that flow reasonably well, making it easier for users to follow along.\n", "\n", "4. **Average BLEU: 0.391**:\n", " - BLEU measures how well the generated answers match the expected responses, focusing on precision. An average score of 0.391 indicates that while the answers are relevant, they may not always capture the full context perfectly.\n", " - **Implication**: In practical terms, this could affect applications like content generation or translation. Businesses might find that while the model can generate helpful content, there’s still room for improvement. This could mean more editing or refining to ensure the final product meets quality standards.\n", "\n", "5. **Average F1 Score: 0.545**:\n", " - The F1 score balances precision (how many of the returned answers were correct) and recall (how many correct answers were returned). A score of 0.545 suggests moderate performance.\n", " - **Implication**: For businesses, this score indicates that the model is fairly good at providing accurate responses without overwhelming users with incorrect information. This balance is particularly important in sensitive fields like healthcare or finance, where getting things right can make a big difference.\n", "\n", "### Real-World Business Applications\n", "- **Context-Based Customer Support**: These metrics suggest that the model could be effectively integrated into customer service platforms, automating responses to frequently asked questions based on context. This could enhance user satisfaction and reduce response times.\n", "\n", "- **Contextual Content Generation**: The model shows promise for generating contextually relevant content for blogs, reports, and marketing materials. While it performs decently, human review will be essential to align the generated content with the brand's voice and ensure it resonates with the target audience.\n", "\n", "- **Contextual Data Analysis**: Teams could leverage the model to summarize findings or generate insights from reports, enhancing productivity and allowing decision-makers to focus on crucial information without getting bogged down in details.\n", "\n", "- **Advanced Chatbots and Virtual Assistants**: The results suggest that while the model can provide useful context-based answers, further fine-tuning is needed to fully grasp user intent and context. Improving this aspect could significantly enhance user interactions, making them feel more natural and intuitive.\n", "\n", "In summary, these evaluation metrics indicate that our model has a solid foundation for context-based question answering but also highlight opportunities for enhancement. As we continue to refine its capabilities, we’re paving the way for more effective tools that can genuinely benefit businesses and their customers.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "6kHhDIJ0CRmD" }, "outputs": [ { "data": { "text/plain": [ "'C:\\\\Users\\\\mohds\\\\OneDrive - University of San Diego\\\\Desktop\\\\MS_AAI\\\\Natural Language Processing and GenAI (AAI-520-A1)\\\\EDITH_ChatBOT'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kaggle": { "accelerator": "nvidiaTeslaT4", "dataSources": [], "dockerImageVersionId": 30787, "isGpuEnabled": true, "isInternetEnabled": true, "language": "python", "sourceType": "notebook" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 1 }