llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the akash_unifo_757 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0199

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0892 0.0501 5 0.0971
0.045 0.1003 10 0.0434
0.0496 0.1504 15 0.0361
0.0279 0.2005 20 0.0339
0.027 0.2506 25 0.0339
0.0265 0.3008 30 0.0310
0.0255 0.3509 35 0.0297
0.0239 0.4010 40 0.0275
0.019 0.4511 45 0.0263
0.0177 0.5013 50 0.0255
0.0178 0.5514 55 0.0250
0.0179 0.6015 60 0.0238
0.0199 0.6516 65 0.0239
0.0165 0.7018 70 0.0237
0.0192 0.7519 75 0.0229
0.0158 0.8020 80 0.0231
0.0202 0.8521 85 0.0233
0.0203 0.9023 90 0.0232
0.0231 0.9524 95 0.0228
0.0175 1.0025 100 0.0225
0.0137 1.0526 105 0.0225
0.0286 1.1028 110 0.0229
0.0169 1.1529 115 0.0225
0.0141 1.2030 120 0.0222
0.0149 1.2531 125 0.0220
0.0123 1.3033 130 0.0226
0.0137 1.3534 135 0.0226
0.0118 1.4035 140 0.0226
0.015 1.4536 145 0.0219
0.0059 1.5038 150 0.0232
0.0155 1.5539 155 0.0224
0.0168 1.6040 160 0.0228
0.0115 1.6541 165 0.0225
0.0156 1.7043 170 0.0221
0.0174 1.7544 175 0.0218
0.0147 1.8045 180 0.0214
0.0113 1.8546 185 0.0211
0.0128 1.9048 190 0.0210
0.0158 1.9549 195 0.0207
0.0139 2.0050 200 0.0208
0.0095 2.0551 205 0.0216
0.0117 2.1053 210 0.0216
0.0117 2.1554 215 0.0209
0.0098 2.2055 220 0.0211
0.0116 2.2556 225 0.0208
0.0091 2.3058 230 0.0211
0.0144 2.3559 235 0.0210
0.0128 2.4060 240 0.0211
0.0097 2.4561 245 0.0209
0.0137 2.5063 250 0.0206
0.0163 2.5564 255 0.0205
0.0104 2.6065 260 0.0203
0.0124 2.6566 265 0.0204
0.0131 2.7068 270 0.0208
0.0089 2.7569 275 0.0205
0.0093 2.8070 280 0.0207
0.0139 2.8571 285 0.0212
0.0121 2.9073 290 0.0205
0.0101 2.9574 295 0.0204
0.0087 3.0075 300 0.0199
0.0079 3.0576 305 0.0204
0.01 3.1078 310 0.0208
0.0089 3.1579 315 0.0212
0.0079 3.2080 320 0.0208
0.006 3.2581 325 0.0206
0.0094 3.3083 330 0.0207
0.0091 3.3584 335 0.0205
0.0077 3.4085 340 0.0205
0.0074 3.4586 345 0.0202
0.007 3.5088 350 0.0203
0.0087 3.5589 355 0.0201
0.0067 3.6090 360 0.0201
0.007 3.6591 365 0.0201
0.006 3.7093 370 0.0199
0.0073 3.7594 375 0.0199
0.0071 3.8095 380 0.0199
0.01 3.8596 385 0.0195
0.0081 3.9098 390 0.0195
0.0077 3.9599 395 0.0198
0.007 4.0100 400 0.0199
0.0052 4.0602 405 0.0198
0.0068 4.1103 410 0.0199
0.007 4.1604 415 0.0200
0.0057 4.2105 420 0.0202
0.0059 4.2607 425 0.0203
0.005 4.3108 430 0.0202
0.0062 4.3609 435 0.0202
0.0058 4.4110 440 0.0202
0.006 4.4612 445 0.0203
0.0057 4.5113 450 0.0203
0.0055 4.5614 455 0.0202
0.005 4.6115 460 0.0202
0.0061 4.6617 465 0.0202
0.0064 4.7118 470 0.0201
0.0052 4.7619 475 0.0202
0.0057 4.8120 480 0.0201
0.0051 4.8622 485 0.0201
0.0063 4.9123 490 0.0202
0.0051 4.9624 495 0.0201

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/akash_unifo_757

Adapter
(154)
this model