llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the akash_unifo_757 dataset. It achieves the following results on the evaluation set:

Loss: 0.0199

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0892	0.0501	5	0.0971
0.045	0.1003	10	0.0434
0.0496	0.1504	15	0.0361
0.0279	0.2005	20	0.0339
0.027	0.2506	25	0.0339
0.0265	0.3008	30	0.0310
0.0255	0.3509	35	0.0297
0.0239	0.4010	40	0.0275
0.019	0.4511	45	0.0263
0.0177	0.5013	50	0.0255
0.0178	0.5514	55	0.0250
0.0179	0.6015	60	0.0238
0.0199	0.6516	65	0.0239
0.0165	0.7018	70	0.0237
0.0192	0.7519	75	0.0229
0.0158	0.8020	80	0.0231
0.0202	0.8521	85	0.0233
0.0203	0.9023	90	0.0232
0.0231	0.9524	95	0.0228
0.0175	1.0025	100	0.0225
0.0137	1.0526	105	0.0225
0.0286	1.1028	110	0.0229
0.0169	1.1529	115	0.0225
0.0141	1.2030	120	0.0222
0.0149	1.2531	125	0.0220
0.0123	1.3033	130	0.0226
0.0137	1.3534	135	0.0226
0.0118	1.4035	140	0.0226
0.015	1.4536	145	0.0219
0.0059	1.5038	150	0.0232
0.0155	1.5539	155	0.0224
0.0168	1.6040	160	0.0228
0.0115	1.6541	165	0.0225
0.0156	1.7043	170	0.0221
0.0174	1.7544	175	0.0218
0.0147	1.8045	180	0.0214
0.0113	1.8546	185	0.0211
0.0128	1.9048	190	0.0210
0.0158	1.9549	195	0.0207
0.0139	2.0050	200	0.0208
0.0095	2.0551	205	0.0216
0.0117	2.1053	210	0.0216
0.0117	2.1554	215	0.0209
0.0098	2.2055	220	0.0211
0.0116	2.2556	225	0.0208
0.0091	2.3058	230	0.0211
0.0144	2.3559	235	0.0210
0.0128	2.4060	240	0.0211
0.0097	2.4561	245	0.0209
0.0137	2.5063	250	0.0206
0.0163	2.5564	255	0.0205
0.0104	2.6065	260	0.0203
0.0124	2.6566	265	0.0204
0.0131	2.7068	270	0.0208
0.0089	2.7569	275	0.0205
0.0093	2.8070	280	0.0207
0.0139	2.8571	285	0.0212
0.0121	2.9073	290	0.0205
0.0101	2.9574	295	0.0204
0.0087	3.0075	300	0.0199
0.0079	3.0576	305	0.0204
0.01	3.1078	310	0.0208
0.0089	3.1579	315	0.0212
0.0079	3.2080	320	0.0208
0.006	3.2581	325	0.0206
0.0094	3.3083	330	0.0207
0.0091	3.3584	335	0.0205
0.0077	3.4085	340	0.0205
0.0074	3.4586	345	0.0202
0.007	3.5088	350	0.0203
0.0087	3.5589	355	0.0201
0.0067	3.6090	360	0.0201
0.007	3.6591	365	0.0201
0.006	3.7093	370	0.0199
0.0073	3.7594	375	0.0199
0.0071	3.8095	380	0.0199
0.01	3.8596	385	0.0195
0.0081	3.9098	390	0.0195
0.0077	3.9599	395	0.0198
0.007	4.0100	400	0.0199
0.0052	4.0602	405	0.0198
0.0068	4.1103	410	0.0199
0.007	4.1604	415	0.0200
0.0057	4.2105	420	0.0202
0.0059	4.2607	425	0.0203
0.005	4.3108	430	0.0202
0.0062	4.3609	435	0.0202
0.0058	4.4110	440	0.0202
0.006	4.4612	445	0.0203
0.0057	4.5113	450	0.0203
0.0055	4.5614	455	0.0202
0.005	4.6115	460	0.0202
0.0061	4.6617	465	0.0202
0.0064	4.7118	470	0.0201
0.0052	4.7619	475	0.0202
0.0057	4.8120	480	0.0201
0.0051	4.8622	485	0.0201
0.0063	4.9123	490	0.0202
0.0051	4.9624	495	0.0201

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

akash_unifo_757

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/akash_unifo_757

Evaluation results