---
base_model: llm-jp/llm-jp-3-13b
library_name: transformers
language:
- ja
---

# Model Card for Model ID (to be completed)

This model is developed as the completion requirement of the Matsuo Lab LLM2024 course.


<!-- ## Model Details -->

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

<!-- - **Developed by:** [More Information Needed] -->
<!-- - **Funded by [optional]:** [More Information Needed] -->
<!-- - **Shared by [optional]:** [More Information Needed] -->
<!-- - **Model type:** [More Information Needed] -->
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** llm-jp-3-13b

### Model Sources [optional]

<!-- Provide the basic links for the model. -->
<!-- This model -->

<!-- - **Repository:** [More Information Needed] -->
<!-- - **Paper [optional]:** [More Information Needed] -->
<!-- - **Demo [optional]:** [More Information Needed] --> -->

<!-- ## Uses -->

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

<!-- ### Direct Use -->

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

<!-- [More Information Needed] -->

<!-- ### Downstream Use [optional] -->

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

<!-- [More Information Needed] -->

<!-- ### Out-of-Scope Use -->

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

<!-- [More Information Needed] -->

<!-- ## Bias, Risks, and Limitations -->

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

<!-- [More Information Needed] -->

<!-- ### Recommendations -->

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

<!-- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. -->

<!-- ## How to Get Started with the Model -->

<!-- Use the code below to get started with the model. -->

<!-- [More Information Needed] -->

## Training Details

### Training Data
- **Base Model:** llm-jp/llm-jp-3-13b
- **Data for Instructoin Tuning:** ichikara-
- **Data for DPO:** https://huggingface.co/datasets/elyza/ELYZA-tasks-100
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
1. Fine-tune the base model with Instruction Tuning
2. Perform DPO on the fine-tuned model with generated data
   - 3 similar prompts are generated for each sample prompt in the DPO data
   - The fine-tuned model is used to generate two answers for each of the prompt
   - Due to time limitation, first generated answer is to be labelled as the chosen answer


#### Training Hyperparameters
SFT for instruction tuning
```python
max_seq_length = 512 
dtype = None 
load_in_4bit = True

model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
    device_map="auto",
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, #32
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, #0.05
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    max_seq_length = max_seq_length,
)
```
Training Hyperparameters
```python
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset["train"],
    max_seq_length = max_seq_length,
    dataset_text_field="formatted_text",
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        logging_steps = 10,
        warmup_steps = 5, #10
        save_steps=100,
        save_total_limit=2,
        max_steps= -1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        group_by_length=True,
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        # additional settings
        optim = "adamw_8bit", 
        weight_decay = 0.01
    ),
)
```

#### Experimental Trials
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
**Instruction Tuning Only** (model x data) 
<br />(hyperparameter settings as commented)
<br />01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
<br />02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
<br />03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
<br />04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
<br />05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
<br />09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja

(hyperparameter settings as non-commented)
<br />00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
<br />06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json 
<br />07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
<br />08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
<br />10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja

**Instruction Tuning + DPO**
<br />11 - 00 + DPO 
<br />12 - 06 + DPO

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

<!-- ### Testing Data, Factors & Metrics -->

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset


#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->

The score below is given upon uploading the outputs to the course management system.

### Results
| Trial | Score |
| ----- | ----- |
| 00 | 3.04 |
| 01 | 3.00 |
| 02 | 2.71 |
| 03 | 2.52 |
| 04 | 2.40 |
| 05 | 2.71 |
| 06 | 2.72 |
| 07 | 2.93 |
| 08 | 2.87 |
| 09 | 2.20 |
| 10 | 2.40 |
| 11 | 2.34 |
| 12 | 2.28 |


#### Summary
This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system.


### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

The model is trained using T4/L4/A100 GPUs on Google Colabotory


<!-- ## Glossary [optional] -->

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

<!-- [More Information Needed] -->

<!-- ## More Information [optional] -->

<!-- [More Information Needed] -->

<!-- ## Model Card Authors [optional] -->

<!-- [More Information Needed] -->

<!-- ## Model Card Contact -->

<!-- [More Information Needed] -->