Gemma-System-9B with MoRA + SimPO

This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization.

Model Details

Model Description

This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities.

  • Developed by: [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona]
  • Model type: Causal Language Model with MoRA adaptation
  • Language(s): Primarily English and Korean
  • License: Same as base model (Gemma-System-9B)
  • Finetuned from model: Gunulhona/Gemma-System-9B

Training Details

Training Procedure

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Learning rate: 5e-7
  • Batch size per device: 1
  • Gradient accumulation steps: 16
  • Total batch size: 16
  • Number of epochs: 200
  • Optimizer: AdamW with cosine restarts scheduler
  • Loss type: SimPO (configurable)
  • Beta (SimPO): 10.0
  • SimPO gamma: 0.5
  • Maximum sequence length: 65,536 tokens

MoRA Configuration

  • Rank (r): 256
  • Alpha: 16
  • Dropout: 0.05
  • MoRA type: 6
  • Target modules:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • gate_proj
    • down_proj
    • up_proj

Training Data

The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning.

Technical Specifications

Model Architecture and Objective

The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives:

  • SimPO: Simple Preference Optimization with β=10.0 and γ=0.5

Compute Infrastructure

Hardware

  • Training performed on CUDA-capable GPUs
  • Uses DeepSpeed for distributed training
  • Gradient checkpointing enabled for memory efficiency

Software

  • PEFT library for parameter-efficient finetuning
  • Transformers library
  • DeepSpeed for training optimization
  • Weights & Biases for experiment tracking

Environmental Impact

  • Hardware Type: NVIDIA GPUs
  • Training Regime: Mixed BF16 precision
  • Optimization: DeepSpeed + Gradient Checkpointing

Model Card Contact

For questions about this model, please contact Gunulhona.

Framework versions

Downloads last month
14
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Gunulhona/Gemma-System-9B-MoRA-SimPO

Adapter
(1)
this model