asofter commited on
Commit
74857ab
·
1 Parent(s): 6b325ae

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-classification
5
+ inference: false
6
+ datasets:
7
+ - sid321axn/malicious-urls-dataset
8
+ tags:
9
+ - malicious-urls
10
+ - url
11
+ ---
12
+
13
+ # ONNX version of DunnBC22/codebert-base-Malicious_URLs
14
+
15
+ **This model is a conversion of [DunnBC22/codebert-base-Malicious_URLs](https://huggingface.co/DunnBC22/codebert-base-Malicious_URLs) to ONNX** format. It's based on the CodeBERT architecture, tailored for the specific task of identifying URLs that may pose security threats. The model was converted to ONNX using the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library.
16
+
17
+ ## Model Architecture
18
+
19
+ **Base Model**: CodeBERT-base, a robust model for programming and natural languages.
20
+
21
+ **Dataset**: [https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset).
22
+
23
+ **Modifications**: Details of any modifications or fine-tuning done to tailor the model for malicious URL detection.
24
+
25
+ ## Usage
26
+
27
+ Loading the model requires the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library installed.
28
+
29
+ ```python
30
+ from optimum.onnxruntime import ORTModelForSequenceClassification
31
+ from transformers import AutoTokenizer, pipeline
32
+
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx")
35
+ model = ORTModelForSequenceClassification.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx")
36
+ classifier = pipeline(
37
+ task="text-classification",
38
+ model=model,
39
+ tokenizer=tokenizer,
40
+ top_k=None,
41
+ )
42
+
43
+ classifier_output = classifier("https://google.com")
44
+ print(classifier_output)
45
+ ```