distily_bench_obj_cross_v2.10

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 107.6398
  • eval_frwikippl: 10204.3643
  • eval_zhwikippl: 49954.8242
  • eval_tinystoriesppl: 6.6903
  • eval_loss: 0.7036
  • eval_runtime: 13.0602
  • eval_samples_per_second: 76.568
  • eval_steps_per_second: 9.571

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6064 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 50480.5703 85684.4844 6.8305 13.0304 76.744 9.593 33932.0586 94692.1562
5000 0.0505 110.8554 10584.2598 0.7523 13.0416 76.677 9.585 6.7911 42034.9414
10000 0.1010 104.0690 10210.1172 0.7242 13.0341 76.722 9.59 6.4174 44683.2305
15000 0.1515 113.6466 10400.9941 0.7156 13.0171 76.822 9.603 7.2840 46906.4258
20000 0.2020 111.4970 9877.6748 0.7117 13.0184 76.814 9.602 7.1889 47931.1602
25000 0.2525 107.3317 10121.3330 0.7051 13.088 76.406 9.551 6.6947 49516.9375
30000 0.3030 107.4814 10147.0312 0.7042 13.0664 76.532 9.567 6.6925 49728.7578
35000 0.3535 107.5147 10109.9404 0.7041 13.0324 76.732 9.591 6.6794 49279.6914
40000 0.4040 107.5064 10121.3330 0.7041 13.1335 76.141 9.518 6.6994 49835.0078
45000 0.4545 107.3816 10129.8984 0.7039 13.1075 76.292 9.537 6.6972 49464.1211
50000 0.5051 107.5231 10129.8984 0.7040 13.0137 76.842 9.605 6.7041 49808.4492
55000 0.5556 107.7482 10135.5996 0.7040 13.0084 76.874 9.609 6.7052 49464.1211
60000 0.6061 107.6064 10204.3643 0.7040 13.0291 76.751 9.594 6.6991 49914.8711
65000 0.6566 107.6981 10204.3643 0.7037 13.0479 76.641 9.58 6.6958 49543.3398
70000 0.7071 107.8484 10204.3643 0.7036 13.0612 76.563 9.57 6.6953 49848.3164
75000 0.7576 107.5897 10204.3643 0.7036 13.1821 75.86 9.483 6.6895 49888.2188
80000 0.8081 107.6398 10204.3643 0.7037 13.1572 76.004 9.5 6.6900 49835.0078
85000 0.8586 107.7148 10204.3643 0.7037 12.9936 76.961 9.62 6.6928 49928.1523
90000 0.9091 107.6398 10204.3643 0.7035 13.0225 76.79 9.599 6.6919 49954.8242
95000 0.9596 107.6398 10204.3643 0.7036 13.0696 76.514 9.564 6.6914 49954.8242
99000 1.0 107.6398 10204.3643 0.7036 13.0602 76.568 9.571 6.6903 49954.8242

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
7
Safetensors
Model size
68.5M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.10

Quantized
(10)
this model