Uploaded model

Developed by: saitooooo
License: apache-2.0
Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Model Details

Base Model: llm-jp/llm-jp-3-13b
Training Type: Instruction Fine-tuning
Training Method: QLoRA (4-bit quantization)
Library Used: unsloth

Training Data

Max Sequence Length: 2048
LoRA Configuration:
- Rank: 32
- Alpha: 32
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Batch Size: 16 per device
Gradient Accumulation Steps: 4
Learning Rate: 2e-4
Number of Epochs: 1
Warmup Steps: 10
Mixed Precision: BF16 (if supported) / FP16 (if BF16 not supported)

DataSet

The models have been fine-tuned on the following dataset.

DeL-TaiseiOzaki/Tengentoppa-sft-mini-vol1.0

Usage

前提条件

Google Colab Pro
A100

1.必要ライブラリのインストール

!pip uninstall unsloth -y
!pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --upgrade torch
!pip install --upgrade xformers

2.量子化とqLoRA設定

# llm-jp/llm-jp-3-13bを4bit量子化のqLoRA設定でロード。

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # unslothではRoPEをサポートしているのでコンテキスト長は自由に設定可能
dtype = None # Noneにしておけば自動で設定
load_in_4bit = True # 今回は13Bモデルを扱うためTrue

model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it" #Fine-Tuningしたモデルにつけたい名前、it: Instruction Tuning
# FastLanguageModel インスタンスを作成
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
)

# SFT用のモデルを用意
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0.05,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    max_seq_length = max_seq_length,
)

3.データセットの整形

from datasets import load_dataset

# データセットはDeL-TaiseiOzaki/Tengentoppa-sft-mini-vol1.0
dataset = load_dataset("json", data_files="combined_dataset.json")

# EOSトークン（文末トークン）
EOS_TOKEN = tokenizer.eos_token

# プロンプトフォーマット
prompt = """### 指示
{}
### 回答
{}"""

# データ整形用関数
def formatting_prompts_func(examples):
    # instruction と input を結合
    input_text = f"{examples['instruction']}\n{examples['input']}"  # instruction + input
    output_text = examples["output"]  # 出力データ

    # 指定された形式でフォーマット
    formatted_text = prompt.format(input_text, output_text) + EOS_TOKEN

    return {"formatted_text": formatted_text}

# データセットのフォーマット適用
dataset = dataset.map(
    formatting_prompts_func,  # フォーマット関数
    num_proc=4  # 並列処理の数
)
dataset

4.サンプル数を20000にして学習の設定

sample_size = 20000  # 使用するサンプル数
sampled_dataset = dataset["train"].shuffle(seed=42).select(range(sample_size))

"""
training_arguments: 学習の設定

  - output_dir:
      -トレーニング後のモデルを保存するディレクトリ

  - per_device_train_batch_size:
      - デバイスごとのトレーニングバッチサイズ

  - per_device_eval_batch_size:
      - デバイスごとの評価バッチサイズ

  - gradient_accumulation_steps:
      - 勾配を更新する前にステップを積み重ねる回数

  - optim:
      - オプティマイザの設定

  - num_train_epochs:
      - エポック数

  - eval_strategy:
      - 評価の戦略 ("no"/"steps"/"epoch")

  - eval_steps:
      - eval_strategyが"steps"のとき、評価を行うstep間隔

  - logging_strategy:
      - ログ記録の戦略

  - logging_steps:
      - ログを出力するステップ間隔

  - warmup_steps:
      - 学習率のウォームアップステップ数

  - save_steps:
      - モデルを保存するステップ間隔

  - save_total_limit:
      - 保存しておくcheckpointの数

  - max_steps:
      - トレーニングの最大ステップ数

  - learning_rate:
      - 学習率

  - fp16:
      - 16bit浮動小数点の使用設定（第8回演習を参考にすると良いです）

  - bf16:
      - BFloat16の使用設定

  - group_by_length:
      -  入力シーケンスの長さによりバッチをグループ化 (トレーニングの効率化)

  - report_to:
      - ログの送信先 ("wandb"/"tensorboard"など)
"""
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=sampled_dataset,
    max_seq_length = max_seq_length,
    dataset_text_field="formatted_text",
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 16,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        logging_steps = 10,
        warmup_steps = 10,
        save_steps=100,
        save_total_limit=2,
        max_steps=-1,
        learning_rate =5e-5, #2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        group_by_length=True,
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

5.学習実行

#@title 学習実行
trainer_stats = trainer.train()

6.推論

import json
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
      line = line.strip()
      item += line
      if item.endswith("}"):
        datasets.append(json.loads(item))
        item = ""

# 学習したモデルを用いてタスクを実行
from tqdm import tqdm

# 推論するためにモデルのモードを変更
FastLanguageModel.for_inference(model)

# 推論するためにモデルを推論モードに切り替える
model.eval()

results = []
for dt in tqdm(datasets):
  input = dt["input"]

  prompt = f"""### 指示\n{input}\n### 回答\n"""

  inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)

  outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True, do_sample=False, repetition_penalty=1.2)
  prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

  results.append({"task_id": dt["task_id"], "input": input, "output": prediction})

saitooooo
/

llm-jp-3-13b-it_lora

Uploaded model

Model Details

Training Data

Training Hyperparameters

DataSet

Usage

前提条件

1.必要ライブラリのインストール

2.量子化とqLoRA設定

3.データセットの整形

4.サンプル数を20000にして学習の設定

5.学習実行

6.推論

Model tree for saitooooo/llm-jp-3-13b-it_lora