--- base_model: llm-jp/llm-jp-3-13b library_name: transformers language: - ja --- # Model Card for Model ID (to be completed) This model is developed as the completion requirement of the Matsuo Lab LLM2024 course. ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **License:** [More Information Needed] - **Finetuned from model [optional]:** llm-jp-3-13b ### Model Sources [optional] --> ## Training Details ### Training Data - **Base Model:** llm-jp/llm-jp-3-13b - **Data for Instructoin Tuning:** ichikara- - **Data for DPO:** https://huggingface.co/datasets/elyza/ELYZA-tasks-100 ### Training Procedure 1. Fine-tune the base model with Instruction Tuning 2. Perform DPO on the fine-tuned model with generated data - 3 similar prompts are generated for each sample prompt in the DPO data - The fine-tuned model is used to generate two answers for each of the prompt - Due to time limitation, first generated answer is to be labelled as the chosen answer #### Training Hyperparameters SFT for instruction tuning ```python max_seq_length = 512 dtype = None load_in_4bit = True model_id = "llm-jp/llm-jp-3-13b" new_model_id = "llm-jp-3-13b-it" model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_id, dtype=dtype, load_in_4bit=load_in_4bit, trust_remote_code=True, device_map="auto", ) model = FastLanguageModel.get_peft_model( model, r = 16, #32 target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 32, lora_dropout = 0, #0.05 bias = "none", use_gradient_checkpointing = "unsloth", random_state = 3407, use_rslora = False, loftq_config = None, max_seq_length = max_seq_length, ) ``` Training Hyperparameters ```python from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset=dataset["train"], max_seq_length = max_seq_length, dataset_text_field="formatted_text", packing = False, args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, num_train_epochs = 1, logging_steps = 10, warmup_steps = 5, #10 save_steps=100, save_total_limit=2, max_steps= -1, learning_rate = 2e-4, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), group_by_length=True, seed = 3407, output_dir = "outputs", report_to = "none", # additional settings optim = "adamw_8bit", weight_decay = 0.01 ), ) ``` #### Experimental Trials **Instruction Tuning Only** (model x data)
(hyperparameter settings as commented)
01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja (hyperparameter settings as non-commented)
00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja **Instruction Tuning + DPO**
11 - 00 + DPO
12 - 06 + DPO [More Information Needed] ## Evaluation #### Testing Data The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset #### Metrics The score below is given upon uploading the outputs to the course management system. ### Results | Trial | Score | | ----- | ----- | | 00 | 3.04 | | 01 | 3.00 | | 02 | 2.71 | | 03 | 2.52 | | 04 | 2.40 | | 05 | 2.71 | | 06 | 2.72 | | 07 | 2.93 | | 08 | 2.87 | | 09 | 2.20 | | 10 | 2.40 | | 11 | 2.34 | | 12 | 2.28 | #### Summary This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system. ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure The model is trained using T4/L4/A100 GPUs on Google Colabotory