llm3br64

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the reliance-oneshot-train dataset. It achieves the following results on the evaluation set:

Loss: 0.0131

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 15.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0583	0.1786	5	0.0538
0.0323	0.3571	10	0.0369
0.0292	0.5357	15	0.0307
0.0292	0.7143	20	0.0260
0.0229	0.8929	25	0.0236
0.0201	1.0714	30	0.0216
0.0177	1.25	35	0.0200
0.0198	1.4286	40	0.0187
0.014	1.6071	45	0.0177
0.0157	1.7857	50	0.0171
0.0117	1.9643	55	0.0164
0.0139	2.1429	60	0.0161
0.0116	2.3214	65	0.0156
0.0115	2.5	70	0.0149
0.0105	2.6786	75	0.0144
0.0127	2.8571	80	0.0143
0.0079	3.0357	85	0.0140
0.009	3.2143	90	0.0141
0.0082	3.3929	95	0.0137
0.0085	3.5714	100	0.0132
0.0087	3.75	105	0.0133
0.009	3.9286	110	0.0131
0.0079	4.1071	115	0.0129
0.0084	4.2857	120	0.0127
0.0071	4.4643	125	0.0127
0.0073	4.6429	130	0.0126
0.007	4.8214	135	0.0123
0.0063	5.0	140	0.0123
0.0051	5.1786	145	0.0127
0.0054	5.3571	150	0.0131
0.0056	5.5357	155	0.0125
0.0056	5.7143	160	0.0123
0.0059	5.8929	165	0.0123
0.004	6.0714	170	0.0129
0.0044	6.25	175	0.0128
0.0039	6.4286	180	0.0124
0.0045	6.6071	185	0.0124
0.0041	6.7857	190	0.0125
0.0037	6.9643	195	0.0121
0.0026	7.1429	200	0.0131
0.0027	7.3214	205	0.0132
0.003	7.5	210	0.0128
0.0033	7.6786	215	0.0125
0.0032	7.8571	220	0.0120
0.0018	8.0357	225	0.0126
0.0024	8.2143	230	0.0141
0.002	8.3929	235	0.0131
0.0022	8.5714	240	0.0127
0.0016	8.75	245	0.0131
0.0016	8.9286	250	0.0133
0.0011	9.1071	255	0.0135
0.0018	9.2857	260	0.0138
0.0011	9.4643	265	0.0140
0.001	9.6429	270	0.0141
0.0011	9.8214	275	0.0142
0.0012	10.0	280	0.0141
0.0006	10.1786	285	0.0142
0.0008	10.3571	290	0.0152
0.0006	10.5357	295	0.0156
0.0005	10.7143	300	0.0155
0.0006	10.8929	305	0.0151
0.0004	11.0714	310	0.0152
0.0004	11.25	315	0.0157
0.0003	11.4286	320	0.0164
0.0003	11.6071	325	0.0167
0.0004	11.7857	330	0.0166
0.0003	11.9643	335	0.0165
0.0001	12.1429	340	0.0165
0.0001	12.3214	345	0.0167
0.0001	12.5	350	0.0169
0.0001	12.6786	355	0.0171
0.0002	12.8571	360	0.0173
0.0001	13.0357	365	0.0173
0.0001	13.2143	370	0.0174
0.0001	13.3929	375	0.0174
0.0001	13.5714	380	0.0174
0.0001	13.75	385	0.0175
0.0001	13.9286	390	0.0175
0.0001	14.1071	395	0.0175
0.0001	14.2857	400	0.0176
0.0001	14.4643	405	0.0176
0.0001	14.6429	410	0.0176
0.0001	14.8214	415	0.0176
0.0001	15.0	420	0.0176

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

reliance-llama-3.2-3B-lora-r64

llm3br64

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/reliance-llama-3.2-3B-lora-r64

Evaluation results