pszemraj's picture
End of training
6d05ae2 verified
|
raw
history blame
3.8 kB
metadata
license: apache-2.0
base_model: distilbert-base-uncased
tags:
  - generated_from_trainer
model-index:
  - name: distilbert-base-uncased-fineweb-edu-llama3-annotations-512-vN
    results: []

Visualize in Weights & Biases

distilbert-base-uncased-fineweb-edu-llama3-annotations-512-vN

This model is a fine-tuned version of distilbert-base-uncased on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2324
  • Mse: 0.2324

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 90085
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Mse
0.5361 0.0288 100 0.4934 0.4934
0.3483 0.0576 200 0.3525 0.3525
0.3238 0.0865 300 0.2931 0.2931
0.2734 0.1153 400 0.3130 0.3130
0.2891 0.1441 500 0.3298 0.3298
0.2807 0.1729 600 0.2659 0.2659
0.2727 0.2018 700 0.2690 0.2690
0.2701 0.2306 800 0.2555 0.2555
0.2954 0.2594 900 0.2501 0.2501
0.2618 0.2882 1000 0.2483 0.2483
0.3081 0.3171 1100 0.2456 0.2456
0.2544 0.3459 1200 0.2370 0.2370
0.2593 0.3747 1300 0.2349 0.2349
0.2361 0.4035 1400 0.2406 0.2406
0.2536 0.4324 1500 0.2453 0.2453
0.26 0.4612 1600 0.2568 0.2568
0.2897 0.4900 1700 0.2568 0.2568
0.2597 0.5188 1800 0.2359 0.2359
0.2489 0.5477 1900 0.2413 0.2413
0.2376 0.5765 2000 0.2416 0.2416
0.2424 0.6053 2100 0.2418 0.2418
0.2798 0.6341 2200 0.2462 0.2462
0.2523 0.6630 2300 0.2322 0.2322
0.286 0.6918 2400 0.2432 0.2432
0.247 0.7206 2500 0.2383 0.2383
0.2856 0.7494 2600 0.2375 0.2375
0.2216 0.7783 2700 0.2383 0.2383
0.255 0.8071 2800 0.2367 0.2367
0.2406 0.8359 2900 0.2345 0.2345
0.2388 0.8647 3000 0.2282 0.2282
0.2571 0.8936 3100 0.2331 0.2331
0.2672 0.9224 3200 0.2336 0.2336
0.2375 0.9512 3300 0.2337 0.2337
0.2423 0.9800 3400 0.2324 0.2324

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1