josecannete's picture
uploading model
6c3a4e4
|
raw
history blame
613 Bytes

ALBERT Large Spanish

This is an ALBERT model trained on a big spanish corpora. The model was trained on a single TPU v3-8 with the following hyperparameters and steps/time:

  • LR: 0.000625
  • Batch Size: 512
  • Warmup ratio: 0.003125
  • Warmup steps: 12500
  • Goal steps: 4000000
  • Total steps: 1450000
  • Total training time (aprox): 42 days.

Training loss

https://drive.google.com/uc?export=view&id=10EiI0Yge3U3CnGrqoMs1yJY020pPz_Io