metadata

base_model: EleutherAI/pythia-160m-deduped
library_name: transformers
license: apache-2.0
tags:
  - axolotl
  - relora
  - generated_from_trainer
model-index:
  - name: pythia-160m-dolphin-extended
    results: []
datasets:
  - cognitivecomputations/dolphin
  - llamafactory/alpaca_gpt4_en
language:
  - en
metrics:
  - accuracy
  - bleu
  - rouge

See axolotl config

axolotl version: 0.4.1

base_model: EleutherAI/pythia-160m-deduped
load_in_8bit: 
datasets:
  - path: vicgalle/alpaca-gpt4
    type: alpaca
  - path: llamafactory/alpaca_gpt4_en
    type: alpaca
  - path: cognitivecomputations/dolphin
    name: flan1m-alpaca-uncensored
    type: alpaca
    shards: 10

dataset_prepared_path: ds-mega-alpaca
#dataset_shard_num: 10
chat_template: inst
val_set_size: 0.001
adapter: lora
lora_model_dir: 
sequence_len: 2048
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - query_key_value
lora_target_linear: 
lora_fan_in_fan_out: true  # pythia/GPTNeoX lora specific
lora_modules_to_save:
  - embed_in
  - embed_out
  - lm_head
lora_on_cpu: false
# ReLoRA configuration
# # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
# relora_steps: # Number of steps per ReLoRA restart
# relora_warmup_steps: # Number of per-restart warmup steps
# relora_anneal_steps: # Number of anneal steps for each relora cycle
# relora_prune_ratio: # threshold for optimizer magnitude when pruning
# relora_cpu_offload:  # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
relora_steps: 600
relora_warmup_steps: 10
relora_cpu_offload: true 
wandb_project: pythia
wandb_entity:
wandb_watch:
wandb_name: pythia-160m-dolphin-extended
wandb_log_model:
output_dir: ./outputs/lora-alpaca-pythia-160m-dolphin-extended
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
learning_rate: 0.0004
lr_scheduler: cosine_with_restarts
#cosine_min_lr_ratio: 0.1
train_on_inputs: false
group_by_length: false
#bf16: auto
#fp16: true
#tf32: false
float16: true
flash_attn: 
xformers_attention: true
optimizer: paged_adamw_8bit
gpu_memory_limit: 8GiB
hub_model_id: jtatman/pythia-160m-dolphin-extended
early_stopping_patience: 10
#resume_from_checkpoint:  outputs/lora-alpaca-pythia-160m-dolphin-extended/checkpoint-11400
auto_resume_from_checkpoints: true
local_rank:
weight_decay: 0.0
#evals_per_epoch: 4
eval_steps: 200
logging_steps: 1
save_steps: 200
save_total_limit: 5
warmup_steps: 100
tokens:
  - "[INST]"
  - "[/INST]"

pythia-160m-dolphin-extended

This model is a fine-tuned version of EleutherAI/pythia-160m-deduped on the None dataset. It achieves the following results on the evaluation set:

Loss: 6.6729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 100
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
25.9906	0.0001	1	29.5342
21.1303	0.0167	200	20.2350
16.5026	0.0334	400	18.4930
17.2725	0.0500	600	16.3395
11.9697	0.0667	800	12.1401
11.3783	0.0834	1000	11.8383
12.8084	0.1001	1200	12.9667
9.4119	0.1167	1400	9.8787
10.3527	0.1334	1600	10.0560
9.3545	0.1501	1800	9.7355
8.9165	0.1668	2000	9.1513
8.5467	0.1835	2200	8.2025
7.9152	0.2001	2400	7.6616
7.3362	0.2168	2600	7.5699
7.9374	0.2335	2800	7.4818
7.838	0.2502	3000	7.4635
7.5731	0.2668	3200	7.4899
7.8289	0.2835	3400	7.3594
7.8906	0.3002	3600	8.0934
7.7318	0.3169	3800	7.5812
7.2089	0.3335	4000	7.4839
7.202	0.3502	4200	7.4486
6.9493	0.3669	4400	7.3208
7.1492	0.3836	4600	7.2469
7.3443	0.4003	4800	7.1378
7.7056	0.4169	5000	7.1385
55.0553	0.4336	5200	50.0135
7.1868	0.4503	5400	6.9898
6.5803	0.4670	5600	6.9559
8.6171	0.4836	5800	7.9075
7.1373	0.5003	6000	6.9280
6.7077	0.5170	6200	6.8797
7.0026	0.5337	6400	6.8635
6.6797	0.5504	6600	6.8178
6.8067	0.5670	6800	6.7893
6.5979	0.5837	7000	6.8106
6.7283	0.6004	7200	6.7998
7.0015	0.6171	7400	6.7705
6.1182	0.6337	7600	6.7592
6.7919	0.6504	7800	6.7446
6.4523	0.6671	8000	6.7260
6.765	0.6838	8200	6.7135
6.4625	0.7004	8400	6.7099
6.79	0.7171	8600	6.7070
6.6101	0.7338	8800	6.7017
6.7541	0.7505	9000	6.6964
6.7777	0.7672	9200	6.6901
7.2082	0.7838	9400	6.6869
6.4263	0.8005	9600	6.6875
6.1944	0.8172	9800	6.6803
6.7745	0.8339	10000	6.6865
6.6746	0.8505	10200	6.6756
6.6319	0.8672	10400	6.6941
6.6657	0.8839	10600	6.6764
6.8516	0.9006	10800	6.6776
6.6391	0.9173	11000	6.6749
6.5763	0.9339	11200	6.6729
6.585	0.9506	11400	6.6694
6.2999	0.9673	11600	6.6722
6.8343	0.9840	11800	6.6729

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Evaluation Results

Groups	Version	Filter	n-shot	Metric	Value		Stderr
Open LLM Leaderboard	N/A	none	5	rouge2_max	16.4873	±	1.0172
- winogrande	1	none	5	acc	0.5120	±	0.0224
- gsm8k	3	strict-match	5	exact_match	0.0060	±	0.0035
- hellaswag	1	none	10	acc	0.3520	±	0.0214
- mmlu	N/A	none	0	acc	0.2533	±	0.0039
		none	5	rouge2_acc	0.1920	±	0.0176
		none	5	rougeL_acc	0.3860	±	0.0218
		flexible-extract	5	exact_match	0.0220	±	0.0066
		strict-match	5	exact_match	0.0060	±	0.0035
		none	5	rougeL_diff	-0.7765	±	1.0034
		none	5	rouge1_acc	0.3700	±	0.0216
		none	5	rouge1_diff	-1.5564	±	1.0223
		none	5	acc_norm	0.3180	±	0.0145
		none	5	bleu_diff	-0.6500	±	0.6421
		none	5	rouge1_max	36.3550	±	0.9462
		none	5	acc	0.2664	±	0.0036
		none	5	rougeL_max	33.8798	±	0.9367
		none	5	bleu_max	15.2292	±	0.6714
		none	5	bleu_acc	0.4360	±	0.0222
		none	5	rouge2_diff	-3.3178	±	0.9477
- mmlu	N/A	none	0	acc	0.2533	±	0.0039
- humanities	N/A	none	5	acc	0.2408	±	0.0075
- other	N/A	none	5	acc	0.2443	±	0.0080
- social_sciences	N/A	none	5	acc	0.2538	±	0.0081
- stem	N/A	none	5	acc	0.2740	±	0.0079
- truthfulqa	N/A	none	0	rouge2_max	16.4873	±	1.0172
		none	0	rouge2_acc	0.1920	±	0.0176
		none	0	rougeL_acc	0.3860	±	0.0218
		none	0	rougeL_diff	-0.7765	±	1.0034
		none	0	rouge1_acc	0.3700	±	0.0216
		none	0	rouge1_diff	-1.5564	±	1.0223
		none	0	bleu_diff	-0.6500	±	0.6421
		none	0	rouge1_max	36.3550	±	0.9462
		none	0	acc	0.3435	±	0.0137
		none	0	rougeL_max	33.8798	±	0.9367
		none	0	bleu_max	15.2292	±	0.6714
		none	0	bleu_acc	0.4360	±	0.0222
		none	0	rouge2_diff	-3.3178	±	0.9477