Built with Axolotl

See axolotl config

axolotl version: 0.5.3.dev38+g5726141c

base_model: meta-llama/Llama-3.2-3B-Instruct

datasets:
  - path: axolotl_format_data_llama.json
    type: input_output
dataset_prepared_path: last_run_prepared
    
output_dir: ./models/llama
sequence_length: 4096

wandb_project: agent-v0
wandb_name: llama-3b

train_on_inputs: false
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 5
optimizer: adamw_torch
learning_rate: 2e-5

bf16: true

logging_steps: 10
flash_attention: true

warmup_steps: 50
saves_per_epoch: 1
weight_decay: 0.0

deepspeed: axolotl/deepspeed_configs/zero3_bf16.json

special_tokens:
  pad_token: <|end_of_text|>

models/llama

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the axolotl_format_data_llama.json dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 5

Training results

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
117
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mfirth/l3t

Finetuned
(214)
this model