--- license: apache-2.0 datasets: - oscar - cc_news language: - id library_name: transformers pipeline_tag: summarization tags: - generated_from_keras_callback --- # pegasus_indonesian_base-pretrain Github : [PEGASUS TPU Trainer](https://github.com/nicholaswilven/pegasus-tpu-trainer) This model is a pretrained version of [pegasus_indonesian_base-finetune](https://huggingface.co/thonyyy/pegasus_indonesian_base-finetune) on [kaggle id news 2017](https://www.kaggle.com/datasets/aashari/indonesian-news-articles-published-at-2017), [CC_News_id](https://github.com/Wikidepia/indonesian_datasets/tree/master/dump/cc-news), and [OSCAR_2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201/viewer/id/train). It achieves the following results on the evaluation set: - Train Loss: 2.34832262992858 - Train Accuracy: 0.262173235416412 - Validation Loss: 2.34894156455993 - Validation Accuracy: 0.266122311353683 - Train Lr: 0.000136618677061051 - Epoch: 40 ## Intended uses & limitations This model is uncased, can't read special characters except "," and ".", having hard time understanding numbers, and performance only tested on news article text. ## Training and evaluation data Pretrain dataset: 1. [kaggle id news 2017](https://www.kaggle.com/datasets/aashari/indonesian-news-articles-published-at-2017) 2. [CC_News_id](https://github.com/Wikidepia/indonesian_datasets/tree/master/dump/cc-news) 3. [OSCAR_2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201/viewer/id/train) ## Training procedure For replication, go to GitHub page ### Training hyperparameters The following hyperparameters were used during training: - optimizer: {'name': 'Adafactor', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.005, 'beta_2_decay': -0.8, 'epsilon_1': 1e-30, 'epsilon_2': 0.001, 'clip_threshold': 1.0, 'relative_step': True} - training_precision: float32 ## Usage ```python # Load model hyperparameters from transformers import PegasusConfig,TFPegasusForConditionalGeneration,PegasusTokenizerFast configuration = PegasusConfig() configuration.vocab_size = 32103 configuration.d_model = 512 configuration.dropout = 0.15 configuration.decoder_attention_heads = 8 configuration.decoder_layers = 12 configuration.decoder_ffn_dim = 3072 configuration.encoder_attention_heads = 8 configuration.encoder_layers = 12 configuration.encoder_ffn_dim = 3072 # Load model and tokenizer # Download the weights and manually load weights using Tensorflow model = TFPegasusForConditionalGeneration(configuration) model.load_weights("checkpoints-pegasus_indonesian_base-pretrain-weights") tokenizer = PegasusTokenizerFast.from_pretrained("thonyyy/pegasus_indonesian_base-finetune") ``` ### Training results |Train Loss|Train Accuracy|Validation Loss|Validation Accuracy|Train Lr|Epoch| |:--------:|:------------:|:-------------:|:-----------------:|:------:|:---:| |4.1939034461975|0.145276814699172|3.39564657211303|0.186678826808929|0.00499999988824129|1| |3.13256049156188|0.208270609378814|2.82256889343261|0.233325317502021|0.00499999988824129|2| |2.84938621520996|0.229006066918373|2.72168040275573|0.23955675959587|0.00499999988824129|3| |2.76001143455505|0.234559893608093|2.65143990516662|0.243813350796699|0.00499999988824129|4| |2.70404982566833|0.238061532378196|2.6107530593872|0.246574580669403|0.00452418718487024|5| |2.6638650894165|0.240613579750061|2.57847166061401|0.248678594827651|0.00409365398809313|6| |2.63293719291687|0.242613524198532|2.55772447586059|0.250325441360473|0.00370409130118787|7| |2.60750746726989|0.244251564145088|2.53469848632812|0.251805543899536|0.00335160037502646|8| |2.58670353889465|0.245637223124504|2.51883554458618|0.253003656864166|0.00303265335969626|9| |2.56865572929382|0.24682830274105|2.49989652633666|0.254459708929061|0.00274405837990343|10| |2.55285787582397|0.247884958982467|2.50092124938964|0.254229605197906|0.00248292670585215|11| |2.53919672966003|0.248811900615692|2.47859454154968|0.255691051483154|0.00224664504639804|12| |2.52694725990295|0.249630719423294|2.46921157836914|0.25649145245552|0.00203284854069352|13| |2.51587128639221|0.250377029180526|2.46414017677307|0.257025629281997|0.0018393974751234|14| |2.50599193572998|0.251064419746398|2.4557819366455|0.257613778114318|0.00166435563005507|15| |2.49690246582031|0.251682370901107|2.44843244552612|0.258032590150833|0.00150597130414098|16| |2.48859119415283|0.252267301082611|2.43858122825622|0.258764535188674|0.00136265915352851|17| |2.48097324371337|0.252792716026306|2.43251323699951|0.259270757436752|0.00123298505786806|18| |2.47009921073913|0.253554105758667|2.43577146530151|0.258938610553741|0.00111565098632127|19| |2.45849394798278|0.254375785589218|2.42337107658386|0.260090589523315|0.00100948277395218|20| |2.44776940345764|0.255127549171447|2.41147446632385|0.260682851076126|0.000913417781703174|21| |2.43759155273437|0.255834341049194|2.41405510902404|0.260819226503372|0.000826494593638926|22| |2.42819571495056|0.256486028432846|2.40314364433288|0.26152354478836|0.000747843238059431|23| |2.41974592208862|0.257094115018844|2.39181518554687|0.262460082769393|0.000676676572766155|24| |2.41181802749633|0.257666647434234|2.3825569152832|0.263035386800766|0.000612282310612499|25| |2.4044873714447|0.258173674345016|2.37829279899597|0.263585090637207|0.000554015976376831|26| |2.39774870872497|0.258645176887512|2.37718510627746|0.263547003269195|0.000501294387504458|27| |2.39184403419494|0.259076595306396|2.37379837036132|0.264020860195159|0.00045358992065303|28| |2.38593125343322|0.259495466947555|2.37083029747009|0.264293819665908|0.000410425127483904|29| |2.38093471527099|0.259853214025497|2.36486291885375|0.264451295137405|0.000371368019841611|30| |2.37621307373046|0.260185241699218|2.36547923088073|0.264706671237945|0.000336027675075456|31| |2.37177920341491|0.260504961013793|2.3609721660614|0.264981210231781|0.000304050423437729|32| |2.3679461479187|0.260774314403533|2.36445379257202|0.264800041913986|0.000275116210104897|33| |2.3643410205841|0.261037856340408|2.3573100566864|0.265379041433334|0.000248935451963916|34| |2.36092805862426|0.261268675327301|2.36105728149414|0.264868646860122|0.000225246112677268|35| |2.35798692703247|0.261485010385513|2.35409832000732|0.265503793954849|0.000203811112442053|36| |2.35523629188537|0.26168617606163|2.35252356529235|0.265713244676589|0.000184415926923975|37| |2.35284709930419|0.261859744787216|2.35101222991943|0.265856444835662|0.000166866433573886|38| |2.35047316551208|0.262033462524414|2.34698224067687|0.266099989414215|0.000150986990774981|39| |2.34832262992858|0.262173235416412|2.34894156455993|0.266122311353683|0.000136618677061051|40| ### Framework versions - Transformers 4.30.2 - TensorFlow 2.12.0 - Datasets 2.13.1 - Tokenizers 0.13.3 ### Special Thanks Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC)