distilbert-truncated

This model is a fine-tuned version of distilbert-base-uncased on the 20 Newsgroups dataset. It achieves the following results on the evaluation set:

Training and evaluation data

The data was split into training and testing: model trained on 90% of the data, and had a testing data size of 10% of the original dataset.

Training procedure

DistilBERT has a maximum input length of 512, so with this in mind the following was performed:

  1. I used the distilbert-base-uncased pretrained model to initialize an AutoTokenizer.
  2. Setting a maximum length of 256, each entry in the training, testing and validation data was truncated if it exceeded the limit and padded if it didn't reach the limit.

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 1908, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
  • training_precision: float32

Training results

EPOCHS = 3 batches_per_epoch = 636 total_train_steps = 1908

Model accuracy 0.8337758779525757

Model loss 0.568471074104309

Framework versions

  • Transformers 4.28.0
  • TensorFlow 2.12.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.