bert-small-phishing / README.md
David-Egea's picture
Included widget settings in README.md
1139953 verified
metadata
license: mit
base_model: prajjwal1/bert-small
tags:
  - generated_from_trainer
  - small_BERT
  - phishing_classifier
  - classification
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: bert-small-phishing
    results: []
widget:
  - text: >-
      the other side of * galicismos * * galicismo * is a spanish term which
      names the improper introduction of french words which are spanish sounding
      and thus very deceptive to the ear . * galicismo * is often considered to
      be a * barbarismo * . what would be the term which designatesthe opposite
      phenomenon , that is unlawful words of spanish origin which may have crept
      into french ? can someone provide examples ? thank you joseph m kozono <
      kozonoj @ gunet . georgetown . edu >
    example_title: Safe Example 1
  - text: >-
      Question?Do you want a different job? Do you want to be your own boss? Do
      you need extra income? Do you need to start a new life? Does your current
      job seem to go nowhere?If you answered yes to these questions,then here is
      your solution.We are a fortune 500 company looking for motivated
      individuals who are looking to a  substantial income working from
      home.Thousands of individual are currently do this RIGHT NOW.  So if you
      are looking to be employed at home, with a career that will provide you
      vast opportunities and a substantial income, please fill out our online
      information request form here now:http://ter.netblah.com:27000To miss out
      on this opportunity, click herehttp://ter.netblah.com:27000/remove.html
    example_title: Phishing Example 1
  - text: >-
      re : testing ir & fx var nick and winston , i understand that ir & fx var
      numbers are calculated every day in risktrac . this results are
      overwritten everyday in the database table by the official numbers
      calculated with the old version of the code . for the consistent testing
      we need historical results for each ir and fx sub - portfolio . can we
      store the numbers every day ? tanya
    example_title: Safe Example 2
  - text: >-
      software at incredibly low prices ( 86 % lower ) . drapery seventeen term
      represent any sing . feet wild break able build . tail , send subtract
      represent .job cow student inch gave . let still warm , family draw , land
      book . glass plan include . sentence is , hat silent nothing . order ,
      wild famous long their . inch such , saw , person , save . face,
      especially sentence science . certain , cry does . two depend yes ,
      written carry .
    example_title: Phishing Example 2
datasets:
  - David-Egea/phishing-texts
language:
  - en
pipeline_tag: text-classification

bert-small-phishing

This model is a fine-tuned version of prajjwal1/bert-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1006
  • Accuracy: 0.9766
  • Precision: 0.9713
  • Recall: 0.9669
  • F1: 0.9691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.202 1.0 762 0.0941 0.9717 0.9728 0.9520 0.9623
0.077 2.0 1524 0.0964 0.9764 0.9757 0.9617 0.9686
0.0428 3.0 2286 0.0992 0.9786 0.9739 0.9695 0.9717
0.0248 4.0 3048 0.1006 0.9766 0.9713 0.9669 0.9691

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2