---
license: mit
datasets:
- ealvaradob/phishing-dataset
language:
- en
base_model:
- CrabInHoney/urlbert-tiny-base-v2
pipeline_tag: text-classification
tags:
- url
- urls
- links
- classification
- tiny
- phishing
- urlbert
new_version: CrabInHoney/urlbert-tiny-v3-phishing-classifier
---
This is a very small version of BERT, designed to categorize links into phishing and non-phishing links

An updated, lighter version of the old classification model for URL analysis

Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-v1-phishing-classifier

Val score: 0.9622

Model size

3.7M params

Tensor type

F32

[Dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset "Dataset")
(urls.json only)

Example:

    from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
    import torch
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Используемое устройство: {device}")
    
    model_path = "./urlbert-tiny-v2-phishing-classifier"
    
    tokenizer = BertTokenizerFast.from_pretrained(model_path)
    
    model = BertForSequenceClassification.from_pretrained(model_path)
    model.to(device)
    
    classifier = pipeline(
        "text-classification",
        model=model,
        tokenizer=tokenizer,
        device=0 if torch.cuda.is_available() else -1,
        return_all_scores=True
    )
    
    test_urls = [
        "huggingface.co/",
        "p64.hu991ngface.co.com.ru/"
    ]
    
    for url in test_urls:
        results = classifier(url)
        print(f"\nURL: {url}")
        for result in results[0]:
            label = result['label']
            score = result['score']
            print(f"Класс: {label}, вероятность: {score:.4f}")
			

Output:

Используемое устройство: cuda

URL: huggingface.co/

Класс: good, вероятность: 0.8515

Класс: phish, вероятность: 0.1485


URL: p64.hu991ngface.co.com.ru/

Класс: good, вероятность: 0.0289

Класс: phish, вероятность: 0.9711


## License

[MIT](https://choosealicense.com/licenses/mit/)