final_model_8b_64
This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model.
Model Details
Base Model Information
- Base model: unsloth/Meta-Llama-3.1-8B
- Model family: LLaMA-3.1-8B
- Type: Base
- Original model size: 8B parameters
Training Configuration
- Training method: QLoRA (4-bit quantization)
- LoRA rank (r): 64
- LoRA alpha: 64
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- LoRA dropout: 0
- Learning rate: 2e-5
- Batch size: 2
- Gradient accumulation steps: 4
- Max sequence length: 2048
- Weight decay: 0.01
- Training steps: 100,000
- Warmup steps: 1000
- Save interval: 10,000 steps
- Optimizer: AdamW (8-bit)
- LR scheduler: Cosine
- Mixed precision: bf16
- Gradient checkpointing: Enabled (unsloth)
Dataset Information
- Training data: Parallel English-Luganda corpus
- Data sources:
- SALT dataset (salt-train-v1.4)
- Extracted parallel sentences
- Synthetic code-mixed data
- Bidirectional translation: Trained on both English→Luganda and Luganda→English
- Total training examples: Varies by direction
Usage
This model uses an instruction-based prompt format:
Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.
### Instruction:
Translate the following text to [target_lang]
### Input:
[input text]
### Response:
[translation]
Training Infrastructure
- Trained using unsloth optimization library
- Hardware: Single A100 GPU
- Quantization: 4-bit training enabled
Limitations
- The model is specialized for English-Luganda translation
- Performance may vary based on domain and complexity of text
- Limited to the context length of 64 tokens
Citation and Contact
If you use this model, please cite:
- Original LLaMA-3.1 model by Meta AI
- QLoRA paper: Dettmers et al. (2023)
- unsloth optimization library