{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "Q-bj6K7Qv4ft" }, "source": [ "# Instruction-Tuning a Generative Pretrained Transformer\n", "\n", "1. First, intall `transformers` library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install transformers" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "- Load the dataset.\n", "- Load GPT-2 Tokenizer and add the choosen special tokens (`'<|startoftext|>', '<|endoftext|>','<|pad|>'`)\n", "- Create demonstrations by prepending the special tokens.\n", "- Calculate the maximum lenght (in tokens) that the demonstrations have." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 423 }, "id": "7MbpXGu-v4f1", "outputId": "2f764046-c977-4187-c9fc-5a37eb6ff28b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found cached dataset parquet (C:/Users/CWLINK/.cache/huggingface/datasets/nicholasKluge___parquet/nicholasKluge--fine-tuning-instruct-aira-c7a6e731d782bc09/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "75b57109c2544501a6f6eac494b6ba0b", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " | prompt | \n", "completion | \n", "
---|---|---|
0 | \n", "I was wondering if you could walk me through t... | \n", "Sure! The process for setting up a hydroponic ... | \n", "
1 | \n", "What type of wine goes best with steak. | \n", "The best type of wine to pair with steak depen... | \n", "
2 | \n", "How do I know if this is a good investment. | \n", "Answer: To determine if an investment is a goo... | \n", "
3 | \n", "Please provide me with some financial advice. | \n", "Some financial advice is to always pay yoursel... | \n", "
4 | \n", "What kind of safety devices do I need to insta... | \n", "The type of safety devices you should install ... | \n", "
... | \n", "... | \n", "... | \n", "
42869 | \n", "How do computers communicate and network with ... | \n", "Computers communicate and network with each ot... | \n", "
42870 | \n", "How are websites different from web applications? | \n", "Websites and web applications are similar in t... | \n", "
42871 | \n", "What is open-source software and its benefits? | \n", "Open-source software is software that is made ... | \n", "
42872 | \n", "What is a cookie and how is it used in web bro... | \n", "A cookie is a small piece of data that a websi... | \n", "
42873 | \n", "What is cloud storage and its advantages for d... | \n", "Cloud storage is a service that allows you to ... | \n", "
42874 rows × 2 columns
\n", "