distily_bench_obj_cross_v2.10
This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 107.6398
- eval_frwikippl: 10204.3643
- eval_zhwikippl: 49954.8242
- eval_tinystoriesppl: 6.6903
- eval_loss: 0.7036
- eval_runtime: 13.0602
- eval_samples_per_second: 76.568
- eval_steps_per_second: 9.571
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 6.6064 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
---|---|---|---|---|---|---|---|---|---|
teacher eval | 169.9865 | 47377.9414 | 3.9789 | 4998.1294 | |||||
0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0304 | 76.744 | 9.593 | 33932.0586 | 94692.1562 |
5000 | 0.0505 | 110.8554 | 10584.2598 | 0.7523 | 13.0416 | 76.677 | 9.585 | 6.7911 | 42034.9414 |
10000 | 0.1010 | 104.0690 | 10210.1172 | 0.7242 | 13.0341 | 76.722 | 9.59 | 6.4174 | 44683.2305 |
15000 | 0.1515 | 113.6466 | 10400.9941 | 0.7156 | 13.0171 | 76.822 | 9.603 | 7.2840 | 46906.4258 |
20000 | 0.2020 | 111.4970 | 9877.6748 | 0.7117 | 13.0184 | 76.814 | 9.602 | 7.1889 | 47931.1602 |
25000 | 0.2525 | 107.3317 | 10121.3330 | 0.7051 | 13.088 | 76.406 | 9.551 | 6.6947 | 49516.9375 |
30000 | 0.3030 | 107.4814 | 10147.0312 | 0.7042 | 13.0664 | 76.532 | 9.567 | 6.6925 | 49728.7578 |
35000 | 0.3535 | 107.5147 | 10109.9404 | 0.7041 | 13.0324 | 76.732 | 9.591 | 6.6794 | 49279.6914 |
40000 | 0.4040 | 107.5064 | 10121.3330 | 0.7041 | 13.1335 | 76.141 | 9.518 | 6.6994 | 49835.0078 |
45000 | 0.4545 | 107.3816 | 10129.8984 | 0.7039 | 13.1075 | 76.292 | 9.537 | 6.6972 | 49464.1211 |
50000 | 0.5051 | 107.5231 | 10129.8984 | 0.7040 | 13.0137 | 76.842 | 9.605 | 6.7041 | 49808.4492 |
55000 | 0.5556 | 107.7482 | 10135.5996 | 0.7040 | 13.0084 | 76.874 | 9.609 | 6.7052 | 49464.1211 |
60000 | 0.6061 | 107.6064 | 10204.3643 | 0.7040 | 13.0291 | 76.751 | 9.594 | 6.6991 | 49914.8711 |
65000 | 0.6566 | 107.6981 | 10204.3643 | 0.7037 | 13.0479 | 76.641 | 9.58 | 6.6958 | 49543.3398 |
70000 | 0.7071 | 107.8484 | 10204.3643 | 0.7036 | 13.0612 | 76.563 | 9.57 | 6.6953 | 49848.3164 |
75000 | 0.7576 | 107.5897 | 10204.3643 | 0.7036 | 13.1821 | 75.86 | 9.483 | 6.6895 | 49888.2188 |
80000 | 0.8081 | 107.6398 | 10204.3643 | 0.7037 | 13.1572 | 76.004 | 9.5 | 6.6900 | 49835.0078 |
85000 | 0.8586 | 107.7148 | 10204.3643 | 0.7037 | 12.9936 | 76.961 | 9.62 | 6.6928 | 49928.1523 |
90000 | 0.9091 | 107.6398 | 10204.3643 | 0.7035 | 13.0225 | 76.79 | 9.599 | 6.6919 | 49954.8242 |
95000 | 0.9596 | 107.6398 | 10204.3643 | 0.7036 | 13.0696 | 76.514 | 9.564 | 6.6914 | 49954.8242 |
99000 | 1.0 | 107.6398 | 10204.3643 | 0.7036 | 13.0602 | 76.568 | 9.571 | 6.6903 | 49954.8242 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0
- Downloads last month
- 7
Model tree for lapp0/distily_bench_obj_cross_v2.10
Base model
roneneldan/TinyStories-33M