Gemma-System-9B with MoRA + SimPO
This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization.
Model Details
Model Description
This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities.
- Developed by: [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona]
- Model type: Causal Language Model with MoRA adaptation
- Language(s): Primarily English and Korean
- License: Same as base model (Gemma-System-9B)
- Finetuned from model: Gunulhona/Gemma-System-9B
Training Details
Training Procedure
Training Hyperparameters
- Training regime: bfloat16 mixed precision
- Learning rate: 5e-7
- Batch size per device: 1
- Gradient accumulation steps: 16
- Total batch size: 16
- Number of epochs: 200
- Optimizer: AdamW with cosine restarts scheduler
- Loss type: SimPO (configurable)
- Beta (SimPO): 10.0
- SimPO gamma: 0.5
- Maximum sequence length: 65,536 tokens
MoRA Configuration
- Rank (r): 256
- Alpha: 16
- Dropout: 0.05
- MoRA type: 6
- Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- down_proj
- up_proj
Training Data
The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning.
Technical Specifications
Model Architecture and Objective
The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives:
- SimPO: Simple Preference Optimization with β=10.0 and γ=0.5
Compute Infrastructure
Hardware
- Training performed on CUDA-capable GPUs
- Uses DeepSpeed for distributed training
- Gradient checkpointing enabled for memory efficiency
Software
- PEFT library for parameter-efficient finetuning
- Transformers library
- DeepSpeed for training optimization
- Weights & Biases for experiment tracking
Environmental Impact
- Hardware Type: NVIDIA GPUs
- Training Regime: Mixed BF16 precision
- Optimization: DeepSpeed + Gradient Checkpointing
Model Card Contact
For questions about this model, please contact Gunulhona.
Framework versions
- Downloads last month
- 14
Model tree for Gunulhona/Gemma-System-9B-MoRA-SimPO
Base model
Gunulhona/Gemma-System-9B