|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- rajpurkar/squad_v2 |
|
metrics: |
|
- precision |
|
- f1 |
|
- recall |
|
- squad_v2 |
|
- meteor |
|
- bleu |
|
- rouge |
|
- exact_match |
|
base_model: |
|
- meta-llama/Llama-3.2-1B |
|
- google/gemma-2-2b-it |
|
library_name: transformers |
|
tags: |
|
- llama |
|
- sqaud |
|
- fine |
|
- tuned |
|
--- |
|
|
|
1. Overview |
|
This repository highlights the fine-tuning of the Llama-3.2-1B model on the SQuAD (Stanford Question Answering Dataset) dataset. The task involves training the model to accurately answer questions based on a given context passage. Fine-tuning the pre-trained Llama model aligns it with the objectives of extractive question-answering. |
|
|
|
2. Model Information |
|
Model Used: meta-llama/Llama-3.2-1B |
|
Pre-trained Parameters: The model contains approximately 1.03 billion parameters, verified during setup and matching official documentation. |
|
Fine-tuned Parameters: The parameter count remains consistent with the pre-trained model, as fine-tuning only updates task-specific weights. |
|
|
|
3. Dataset and Task Details |
|
Dataset: SQuAD |
|
The Stanford Question Answering Dataset (SQuAD) is a benchmark dataset designed for extractive question-answering tasks. It contains passages with corresponding questions and answer spans extracted directly from the text. |
|
Task Objective |
|
Given a passage and a question, the model is trained to identify the correct span of text in the passage that answers the question. |
|
|
|
4. Fine-Tuning Approach |
|
Train-Test Split: An 80:20 split was applied to the dataset, ensuring a balanced distribution of passages and questions in the train and test subsets. Stratified sampling was used, with a seed value of 1 for reproducibility. |
|
Tokenization: Context and question pairs were tokenized with padding and truncation to ensure uniform input lengths (maximum 512 tokens). |
|
Model Training: Fine-tuning was conducted over three epochs with a learning rate of 3e-5. Gradient accumulation and early stopping were used to enhance training efficiency and prevent overfitting. |
|
Hardware: Training utilized GPU acceleration to handle the large model size and complex token sequences efficiently. |
|
|
|
5. Results and Observations |
|
Zero-shot vs. Fine-tuned Performance: Without fine-tuning, the pre-trained Llama model demonstrated limited ability to answer questions accurately. Fine-tuning significantly improved the model鈥檚 performance on metrics such as F1 score, exact match, and ROUGE. |
|
|
|
Fine-tuning Benefits: Training on the SQuAD dataset equipped the model with a deeper understanding of context and its relationship to specific queries, enhancing its ability to extract precise answer spans. |
|
|
|
Model Parameters: The parameter count remained unchanged during fine-tuning, underscoring that performance improvements stemmed from the optimization of existing weights rather than structural changes. |
|
|
|
6. How to Use the Fine-Tuned Model |
|
Install Necessary Libraries: |
|
|
|
pip install transformers datasets |
|
Load the Fine-Tuned Model: |
|
|
|
from transformers import AutoTokenizer, AutoModelForQuestionAnswering |
|
|
|
model_name = "<your-huggingface-repo>/squad-llama-finetuned" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForQuestionAnswering.from_pretrained(model_name) |
|
Make Predictions: |
|
|
|
context = "Llama is a model developed by Meta AI designed for natural language understanding tasks." |
|
question = "Who developed Llama?" |
|
|
|
inputs = tokenizer(question, context, return_tensors="pt", truncation=True, padding=True) |
|
outputs = model(**inputs) |
|
|
|
start_idx = outputs.start_logits.argmax() |
|
end_idx = outputs.end_logits.argmax() |
|
|
|
answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx + 1]) |
|
print(f"Predicted Answer: {answer}") |
|
|
|
7. Key Takeaways |
|
Fine-tuning Llama on SQuAD equips it with the ability to handle extractive question-answering tasks with high accuracy and precision. |
|
The parameter count of the model does not change during fine-tuning, highlighting that performance enhancements are derived from weight updates rather than architectural modifications. |
|
The comparison between zero-shot and fine-tuned performance demonstrates the necessity of task-specific training to achieve state-of-the-art results. |
|
|
|
8. Acknowledgments |
|
Hugging Face for providing seamless tools for model fine-tuning and evaluation. |
|
Stanford Question Answering Dataset for serving as a robust benchmark for extractive QA tasks. |
|
|