Model Card for Model ID (to be completed)
This model is developed as the completion requirement of the Matsuo Lab LLM2024 course.
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- License: [More Information Needed]
- Finetuned from model [optional]: llm-jp-3-13b
Model Sources [optional]
-->Training Details
Training Data
- Base Model: llm-jp/llm-jp-3-13b
- Data for Instructoin Tuning: ichikara-
- Data for DPO: https://huggingface.co/datasets/elyza/ELYZA-tasks-100
Training Procedure
- Fine-tune the base model with Instruction Tuning
- Perform DPO on the fine-tuned model with generated data
- 3 similar prompts are generated for each sample prompt in the DPO data
- The fine-tuned model is used to generate two answers for each of the prompt
- Due to time limitation, first generated answer is to be labelled as the chosen answer
Training Hyperparameters
SFT for instruction tuning
max_seq_length = 512
dtype = None
load_in_4bit = True
model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_id,
dtype=dtype,
load_in_4bit=load_in_4bit,
trust_remote_code=True,
device_map="auto",
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, #32
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 32,
lora_dropout = 0, #0.05
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
max_seq_length = max_seq_length,
)
Training Hyperparameters
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset=dataset["train"],
max_seq_length = max_seq_length,
dataset_text_field="formatted_text",
packing = False,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
num_train_epochs = 1,
logging_steps = 10,
warmup_steps = 5, #10
save_steps=100,
save_total_limit=2,
max_steps= -1,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
group_by_length=True,
seed = 3407,
output_dir = "outputs",
report_to = "none",
# additional settings
optim = "adamw_8bit",
weight_decay = 0.01
),
)
Experimental Trials
Instruction Tuning Only (model x data)
(hyperparameter settings as commented)
01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja
(hyperparameter settings as non-commented)
00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja
Instruction Tuning + DPO
11 - 00 + DPO
12 - 06 + DPO
[More Information Needed]
Evaluation
Testing Data
The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset
Metrics
The score below is given upon uploading the outputs to the course management system.
Results
Trial | Score |
---|---|
00 | 3.04 |
01 | 3.00 |
02 | 2.71 |
03 | 2.52 |
04 | 2.40 |
05 | 2.71 |
06 | 2.72 |
07 | 2.93 |
08 | 2.87 |
09 | 2.20 |
10 | 2.40 |
11 | 2.34 |
12 | 2.28 |
Summary
This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system.
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
The model is trained using T4/L4/A100 GPUs on Google Colabotory
Model tree for uthal/llm-jp-3-13b-it-dpo
Base model
llm-jp/llm-jp-3-13b