llm-jp-3-13b-it-dpo / README.md
uthal's picture
Update README.md
9c04928 verified
---
base_model: llm-jp/llm-jp-3-13b
library_name: transformers
language:
- ja
---
# Model Card for Model ID (to be completed)
This model is developed as the completion requirement of the Matsuo Lab LLM2024 course.
<!-- ## Model Details -->
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
<!-- - **Developed by:** [More Information Needed] -->
<!-- - **Funded by [optional]:** [More Information Needed] -->
<!-- - **Shared by [optional]:** [More Information Needed] -->
<!-- - **Model type:** [More Information Needed] -->
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** llm-jp-3-13b
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
<!-- This model -->
<!-- - **Repository:** [More Information Needed] -->
<!-- - **Paper [optional]:** [More Information Needed] -->
<!-- - **Demo [optional]:** [More Information Needed] --> -->
<!-- ## Uses -->
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
<!-- ### Direct Use -->
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
<!-- [More Information Needed] -->
<!-- ### Downstream Use [optional] -->
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
<!-- [More Information Needed] -->
<!-- ### Out-of-Scope Use -->
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
<!-- [More Information Needed] -->
<!-- ## Bias, Risks, and Limitations -->
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
<!-- [More Information Needed] -->
<!-- ### Recommendations -->
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
<!-- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. -->
<!-- ## How to Get Started with the Model -->
<!-- Use the code below to get started with the model. -->
<!-- [More Information Needed] -->
## Training Details
### Training Data
- **Base Model:** llm-jp/llm-jp-3-13b
- **Data for Instructoin Tuning:** ichikara-
- **Data for DPO:** https://huggingface.co/datasets/elyza/ELYZA-tasks-100
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
1. Fine-tune the base model with Instruction Tuning
2. Perform DPO on the fine-tuned model with generated data
- 3 similar prompts are generated for each sample prompt in the DPO data
- The fine-tuned model is used to generate two answers for each of the prompt
- Due to time limitation, first generated answer is to be labelled as the chosen answer
#### Training Hyperparameters
SFT for instruction tuning
```python
max_seq_length = 512
dtype = None
load_in_4bit = True
model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_id,
dtype=dtype,
load_in_4bit=load_in_4bit,
trust_remote_code=True,
device_map="auto",
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, #32
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 32,
lora_dropout = 0, #0.05
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
max_seq_length = max_seq_length,
)
```
Training Hyperparameters
```python
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset=dataset["train"],
max_seq_length = max_seq_length,
dataset_text_field="formatted_text",
packing = False,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
num_train_epochs = 1,
logging_steps = 10,
warmup_steps = 5, #10
save_steps=100,
save_total_limit=2,
max_steps= -1,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
group_by_length=True,
seed = 3407,
output_dir = "outputs",
report_to = "none",
# additional settings
optim = "adamw_8bit",
weight_decay = 0.01
),
)
```
#### Experimental Trials
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
**Instruction Tuning Only** (model x data)
<br />(hyperparameter settings as commented)
<br />01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
<br />02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
<br />03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
<br />04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
<br />05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
<br />09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja
(hyperparameter settings as non-commented)
<br />00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
<br />06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
<br />07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
<br />08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
<br />10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja
**Instruction Tuning + DPO**
<br />11 - 00 + DPO
<br />12 - 06 + DPO
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
<!-- ### Testing Data, Factors & Metrics -->
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
The score below is given upon uploading the outputs to the course management system.
### Results
| Trial | Score |
| ----- | ----- |
| 00 | 3.04 |
| 01 | 3.00 |
| 02 | 2.71 |
| 03 | 2.52 |
| 04 | 2.40 |
| 05 | 2.71 |
| 06 | 2.72 |
| 07 | 2.93 |
| 08 | 2.87 |
| 09 | 2.20 |
| 10 | 2.40 |
| 11 | 2.34 |
| 12 | 2.28 |
#### Summary
This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system.
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
The model is trained using T4/L4/A100 GPUs on Google Colabotory
<!-- ## Glossary [optional] -->
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
<!-- [More Information Needed] -->
<!-- ## More Information [optional] -->
<!-- [More Information Needed] -->
<!-- ## Model Card Authors [optional] -->
<!-- [More Information Needed] -->
<!-- ## Model Card Contact -->
<!-- [More Information Needed] -->