metadata

tags:
  - generated_from_trainer
model-index:
  - name: home/005/th5351/output
    results: []

See axolotl config

axolotl version: 0.4.1

base_model: /home/005/th5351/models/cosmosage-llama3-8b-base/
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: llama3
datasets:
  - path: /home/005/th5351/datasets/combined_sft.jsonl
    type: chat_template
    chat_template: llama3
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    roles:
      system:
        - system
      user:
        - human
      assistant:
        - gpt
    

dataset_prepared_path: /home/005/th5351/output/last_run_prepared
val_set_size: 0.001
eval_sample_packing: false
output_dir: /home/005/th5351/output

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5e-5
cosine_min_lr_ratio: 0.2
cosine_constant_lr_ratio: 0.8
max_grad_norm: 3.0

seed: 42

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 5
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: /home/005/th5351/packages/axolotl/deepspeed_configs/zero2.json
ddp_timeout: 3600000
weight_decay: 0.0
fsdp:
fsdp_config:

home/005/th5351/output

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
1.3757	0.0005	1	nan
0.8083	0.1999	388	nan
0.8005	0.3998	776	nan
0.7389	0.5998	1164	nan
0.7269	0.7997	1552	nan
0.7069	0.9996	1940	nan
0.5786	1.1613	2328	nan
0.5385	1.3613	2716	nan
0.5381	1.5612	3104	nan
0.5273	1.7611	3492	nan
0.527	1.9610	3880	nan

Framework versions

Transformers 4.41.1
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1