--- license: mit datasets: - ealvaradob/phishing-dataset language: - en base_model: - CrabInHoney/urlbert-tiny-base-v2 pipeline_tag: text-classification tags: - url - urls - links - classification - tiny - phishing - urlbert new_version: CrabInHoney/urlbert-tiny-v3-phishing-classifier --- This is a very small version of BERT, designed to categorize links into phishing and non-phishing links An updated, lighter version of the old classification model for URL analysis Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-v1-phishing-classifier Val score: 0.9622 Model size 3.7M params Tensor type F32 [Dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset "Dataset") (urls.json only) Example: from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Используемое устройство: {device}") model_path = "./urlbert-tiny-v2-phishing-classifier" tokenizer = BertTokenizerFast.from_pretrained(model_path) model = BertForSequenceClassification.from_pretrained(model_path) model.to(device) classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1, return_all_scores=True ) test_urls = [ "huggingface.co/", "p64.hu991ngface.co.com.ru/" ] for url in test_urls: results = classifier(url) print(f"\nURL: {url}") for result in results[0]: label = result['label'] score = result['score'] print(f"Класс: {label}, вероятность: {score:.4f}") Output: Используемое устройство: cuda URL: huggingface.co/ Класс: good, вероятность: 0.8515 Класс: phish, вероятность: 0.1485 URL: p64.hu991ngface.co.com.ru/ Класс: good, вероятность: 0.0289 Класс: phish, вероятность: 0.9711 ## License [MIT](https://choosealicense.com/licenses/mit/)