thonyyy's picture
Update README.md
c6901cd verified
metadata
license: apache-2.0
datasets:
  - oscar
  - cc_news
language:
  - id
library_name: transformers
pipeline_tag: summarization
tags:
  - generated_from_keras_callback

pegasus_indonesian_base-pretrain

Github : PEGASUS TPU Trainer

This model is a pretrained version of pegasus_indonesian_base-finetune on kaggle id news 2017, CC_News_id, and OSCAR_2201.

It achieves the following results on the evaluation set:

  • Train Loss: 2.34832262992858
  • Train Accuracy: 0.262173235416412
  • Validation Loss: 2.34894156455993
  • Validation Accuracy: 0.266122311353683
  • Train Lr: 0.000136618677061051
  • Epoch: 40

Intended uses & limitations

This model is uncased, can't read special characters except "," and ".", having hard time understanding numbers, and performance only tested on news article text.

Training and evaluation data

Pretrain dataset:

  1. kaggle id news 2017
  2. CC_News_id
  3. OSCAR_2201

Training procedure

For replication, go to GitHub page

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'Adafactor', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.005, 'beta_2_decay': -0.8, 'epsilon_1': 1e-30, 'epsilon_2': 0.001, 'clip_threshold': 1.0, 'relative_step': True}
  • training_precision: float32

Usage

# Load model hyperparameters
from transformers import PegasusConfig,TFPegasusForConditionalGeneration,PegasusTokenizerFast
configuration = PegasusConfig()
configuration.vocab_size = 32103
configuration.d_model = 512
configuration.dropout = 0.15
configuration.decoder_attention_heads = 8
configuration.decoder_layers = 12
configuration.decoder_ffn_dim = 3072
configuration.encoder_attention_heads = 8
configuration.encoder_layers = 12
configuration.encoder_ffn_dim = 3072

# Load model and tokenizer
# Download the weights and manually load weights using Tensorflow
model = TFPegasusForConditionalGeneration(configuration)
model.load_weights("checkpoints-pegasus_indonesian_base-pretrain-weights")
tokenizer = PegasusTokenizerFast.from_pretrained("thonyyy/pegasus_indonesian_base-finetune")

Training results

Train Loss Train Accuracy Validation Loss Validation Accuracy Train Lr Epoch
4.1939034461975 0.145276814699172 3.39564657211303 0.186678826808929 0.00499999988824129 1
3.13256049156188 0.208270609378814 2.82256889343261 0.233325317502021 0.00499999988824129 2
2.84938621520996 0.229006066918373 2.72168040275573 0.23955675959587 0.00499999988824129 3
2.76001143455505 0.234559893608093 2.65143990516662 0.243813350796699 0.00499999988824129 4
2.70404982566833 0.238061532378196 2.6107530593872 0.246574580669403 0.00452418718487024 5
2.6638650894165 0.240613579750061 2.57847166061401 0.248678594827651 0.00409365398809313 6
2.63293719291687 0.242613524198532 2.55772447586059 0.250325441360473 0.00370409130118787 7
2.60750746726989 0.244251564145088 2.53469848632812 0.251805543899536 0.00335160037502646 8
2.58670353889465 0.245637223124504 2.51883554458618 0.253003656864166 0.00303265335969626 9
2.56865572929382 0.24682830274105 2.49989652633666 0.254459708929061 0.00274405837990343 10
2.55285787582397 0.247884958982467 2.50092124938964 0.254229605197906 0.00248292670585215 11
2.53919672966003 0.248811900615692 2.47859454154968 0.255691051483154 0.00224664504639804 12
2.52694725990295 0.249630719423294 2.46921157836914 0.25649145245552 0.00203284854069352 13
2.51587128639221 0.250377029180526 2.46414017677307 0.257025629281997 0.0018393974751234 14
2.50599193572998 0.251064419746398 2.4557819366455 0.257613778114318 0.00166435563005507 15
2.49690246582031 0.251682370901107 2.44843244552612 0.258032590150833 0.00150597130414098 16
2.48859119415283 0.252267301082611 2.43858122825622 0.258764535188674 0.00136265915352851 17
2.48097324371337 0.252792716026306 2.43251323699951 0.259270757436752 0.00123298505786806 18
2.47009921073913 0.253554105758667 2.43577146530151 0.258938610553741 0.00111565098632127 19
2.45849394798278 0.254375785589218 2.42337107658386 0.260090589523315 0.00100948277395218 20
2.44776940345764 0.255127549171447 2.41147446632385 0.260682851076126 0.000913417781703174 21
2.43759155273437 0.255834341049194 2.41405510902404 0.260819226503372 0.000826494593638926 22
2.42819571495056 0.256486028432846 2.40314364433288 0.26152354478836 0.000747843238059431 23
2.41974592208862 0.257094115018844 2.39181518554687 0.262460082769393 0.000676676572766155 24
2.41181802749633 0.257666647434234 2.3825569152832 0.263035386800766 0.000612282310612499 25
2.4044873714447 0.258173674345016 2.37829279899597 0.263585090637207 0.000554015976376831 26
2.39774870872497 0.258645176887512 2.37718510627746 0.263547003269195 0.000501294387504458 27
2.39184403419494 0.259076595306396 2.37379837036132 0.264020860195159 0.00045358992065303 28
2.38593125343322 0.259495466947555 2.37083029747009 0.264293819665908 0.000410425127483904 29
2.38093471527099 0.259853214025497 2.36486291885375 0.264451295137405 0.000371368019841611 30
2.37621307373046 0.260185241699218 2.36547923088073 0.264706671237945 0.000336027675075456 31
2.37177920341491 0.260504961013793 2.3609721660614 0.264981210231781 0.000304050423437729 32
2.3679461479187 0.260774314403533 2.36445379257202 0.264800041913986 0.000275116210104897 33
2.3643410205841 0.261037856340408 2.3573100566864 0.265379041433334 0.000248935451963916 34
2.36092805862426 0.261268675327301 2.36105728149414 0.264868646860122 0.000225246112677268 35
2.35798692703247 0.261485010385513 2.35409832000732 0.265503793954849 0.000203811112442053 36
2.35523629188537 0.26168617606163 2.35252356529235 0.265713244676589 0.000184415926923975 37
2.35284709930419 0.261859744787216 2.35101222991943 0.265856444835662 0.000166866433573886 38
2.35047316551208 0.262033462524414 2.34698224067687 0.266099989414215 0.000150986990774981 39
2.34832262992858 0.262173235416412 2.34894156455993 0.266122311353683 0.000136618677061051 40

Framework versions

  • Transformers 4.30.2
  • TensorFlow 2.12.0
  • Datasets 2.13.1
  • Tokenizers 0.13.3

Special Thanks

Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC)