This model is a variation of https://huggingface.co/nlpconnect/vit-gpt2-image-captioning

Results after after 3 epochs (and ~45 hours of training)

  • eval_loss: 0.19939416646957397
  • eval_rouge1: 43.006
  • eval_rouge2: 16.9939
  • eval_rougeL: 38.8923
  • eval_rougeLsum: 38.8877
  • eval_gen_len: 11.327256736227712
  • eval_runtime: 1816.5255
  • eval_samples_per_second: 13.77
  • eval_steps_per_second': 1.721
  • train_runtime: 46263.3695
  • train_samples_per_second: 38.373
  • train_steps_per_second: 4.797
  • train_loss: 0.05974134062104816
Downloads last month
103
Safetensors
Model size
182M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tarekziade/distilvit

Quantized
(14)
this model