--- language: - en pipeline_tag: text-classification base_model: DunnBC22/codebert-base-Malicious_URLs inference: false datasets: - sid321axn/malicious-urls-dataset tags: - malicious-urls - url --- # ONNX version of DunnBC22/codebert-base-Malicious_URLs **This model is a conversion of [DunnBC22/codebert-base-Malicious_URLs](https://huggingface.co/DunnBC22/codebert-base-Malicious_URLs) to ONNX** format. It's based on the CodeBERT architecture, tailored for the specific task of identifying URLs that may pose security threats. The model was converted to ONNX using the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library. ## Model Architecture **Base Model**: CodeBERT-base, a robust model for programming and natural languages. **Dataset**: [https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset). **Modifications**: Details of any modifications or fine-tuning done to tailor the model for malicious URL detection. ## Usage Loading the model requires the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library installed. ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx") model = ORTModelForSequenceClassification.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx") classifier = pipeline( task="text-classification", model=model, tokenizer=tokenizer, top_k=None, ) classifier_output = classifier("https://google.com") print(classifier_output) ``` ### LLM Guard [Malicious URLs scanner](https://llm-guard.com/output_scanners/malicious_urls/) ## Community Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, or engage in discussions about LLM security!