metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
- generated_from_trainer
model-index:
- name: distily_bench_obj_cross_v2.1
results: []
distily_bench_obj_cross_v2.1
This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 3650.0444
- eval_frwikippl: 29470.7617
- eval_zhwikippl: 52791.2461
- eval_tinystoriesppl: 1183.5695
- eval_loss: 5.1097
- eval_runtime: 6.5331
- eval_samples_per_second: 76.533
- eval_steps_per_second: 9.643
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=0, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 0.0004
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 8.0568 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
---|---|---|---|---|---|---|---|---|---|
teacher eval | 169.9865 | 47377.9414 | 3.9789 | 4998.1294 | |||||
0 | 0 | 21321.3555 | 56774.5312 | 6.6010 | 6.5152 | 76.744 | 9.67 | 11289.9248 | 60744.7383 |
500 | 0.6464 | 3754.7207 | 29462.4434 | 5.1110 | 6.5528 | 76.303 | 9.614 | 1235.8627 | 53887.0117 |
773 | 0.9994 | 3650.0444 | 29470.7617 | 5.1097 | 6.5331 | 76.533 | 9.643 | 1183.5695 | 52791.2461 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.21.0