distily_bench_obj_cross_v2.9

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 104.6652
  • eval_frwikippl: 13772.8643
  • eval_zhwikippl: 74161.4531
  • eval_tinystoriesppl: 5.5260
  • eval_loss: 0.7819
  • eval_runtime: 6.5261
  • eval_samples_per_second: 76.616
  • eval_steps_per_second: 9.654

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 6.6064 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 25306.5312 80342.6562 6.4738 6.54 76.453 9.633 14565.9658 71518.8438
5000 0.1010 104.6652 13772.8643 0.7819 6.5261 76.616 9.654 5.5260 74161.4531
10000 0.2020 144.5553 13569.7109 0.7842 6.5111 76.792 9.676 8.9185 62270.8359
15000 0.3030 105.4526 12598.8818 0.7708 6.5186 76.704 9.665 5.6194 53872.625
20000 0.4040 121.5509 12060.5781 0.7610 6.5194 76.694 9.663 7.1313 52133.4336
25000 0.5051 111.6548 13016.4775 0.7537 6.5166 76.727 9.668 6.0700 53485.9688
30000 0.6061 101.6823 11441.6719 0.7577 6.5294 76.577 9.649 5.7104 48007.9258
35000 0.7071 97.8760 10992.2207 0.7519 6.5151 76.745 9.67 5.5543 47549.0430
40000 0.8081 114.8546 11104.2744 0.7378 6.5634 76.18 9.599 6.9089 42804.5273
45000 0.9091 112.3336 11524.9678 0.7228 6.5648 76.164 9.597 6.6096 46781.4727
49500 1.0 110.1023 10899.7119 0.7097 6.5487 76.351 9.62 6.5616 49450.9141

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
2
Safetensors
Model size
68.5M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.9

Quantized
(10)
this model