Gemma-System-9B with MoRA + SimPO

This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization.

Model Details

Model Description

This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities.

Developed by: [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona]
Model type: Causal Language Model with MoRA adaptation
Language(s): Primarily English and Korean
License: Same as base model (Gemma-System-9B)
Finetuned from model: Gunulhona/Gemma-System-9B

Training Details

Training Procedure

Training Hyperparameters

Training regime: bfloat16 mixed precision
Learning rate: 5e-7
Batch size per device: 1
Gradient accumulation steps: 16
Total batch size: 16
Number of epochs: 200
Optimizer: AdamW with cosine restarts scheduler
Loss type: SimPO (configurable)
Beta (SimPO): 10.0
SimPO gamma: 0.5
Maximum sequence length: 65,536 tokens

MoRA Configuration

Rank (r): 256
Alpha: 16
Dropout: 0.05
MoRA type: 6
Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- down_proj
- up_proj

Training Data

The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning.

Technical Specifications

Model Architecture and Objective

The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives:

SimPO: Simple Preference Optimization with β=10.0 and γ=0.5

Compute Infrastructure

Hardware

Training performed on CUDA-capable GPUs
Uses DeepSpeed for distributed training
Gradient checkpointing enabled for memory efficiency

Software

PEFT library for parameter-efficient finetuning
Transformers library
DeepSpeed for training optimization
Weights & Biases for experiment tracking

Environmental Impact

Hardware Type: NVIDIA GPUs
Training Regime: Mixed BF16 precision
Optimization: DeepSpeed + Gradient Checkpointing

Model Card Contact

For questions about this model, please contact Gunulhona.

Framework versions

PEFT 0.9.0

Gunulhona
/

Gemma-System-9B-MoRA-SimPO