--- datasets: - ealvaradob/phishing-dataset language: - en base_model: - CrabInHoney/urlbert-tiny-base-v3 pipeline_tag: text-classification tags: - url - urls - links - classification - tiny - phishing - urlbert license: apache-2.0 --- This is a very small version of BERT, designed to categorize links into phishing and non-phishing links An updated, lighter version of the old classification model for URL analysis Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-v2-phishing-classifier ##### Comparison with the previous version of urlbert phishing-classifier: | Version | Accuracy | Precision | Recall | F1-score | | ------------ | ------------ | ------------ | ------------ | ------------ | | v2 | 0.9665 | 0.9756 | 0.9522 | 0.9637 | | **v3** | **0.9819** | **0.9876** | **0.9734** | **0.9805** | Model size 3.69M params Tensor type F32 [Dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset "Dataset") (urls.json only) Example: from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Используемое устройство: {device}") model_name = "CrabInHoney/urlbert-tiny-v3-phishing-classifier" tokenizer = BertTokenizerFast.from_pretrained(model_name) model = BertForSequenceClassification.from_pretrained(model_name) model.to(device) classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1, return_all_scores=True ) test_urls = [ "huggingface.co/", "hu991ngface.com.ru/" ] label_mapping = {"LABEL_0": "good", "LABEL_1": "fish"} for url in test_urls: results = classifier(url) print(f"\nURL: {url}") for result in results[0]: label = result['label'] score = result['score'] friendly_label = label_mapping.get(label, label) print(f"Класс: {friendly_label}, вероятность: {score:.4f}") Используемое устройство: cuda URL: huggingface.co/ Класс: good, вероятность: 0.9723 Класс: fish, вероятность: 0.0277 URL: hu991ngface.com.ru/ Класс: good, вероятность: 0.0070 Класс: fish, вероятность: 0.9930