File size: 2,808 Bytes
f390f30
 
 
 
f1ec54d
59bb6fb
49803a9
59bb6fb
 
 
de68312
 
 
 
27d858d
de68312
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59bb6fb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: apache-2.0
language:
- en
base_model: bencyc1129/mitre-bert-base-cased
pipeline_tag: text-classification
widget:
- text: An attacker performs a SQL injection.
datasets:
- sarahwei/cyber_MITRE_CTI_dataset
---

## MITRE-tactic-bert-case-based

It's a fine-tuned model from [mitre-bert-base-cased](https://huggingface.co/bencyc1129/mitre-bert-base-cased) on the MITRE ATT&CK version 15 procedure dataset. It achieves 
- loss:0.057
- accuracy:0.87
  
on evaluation dataset.


## Intended uses & limitations
You can use the fine-tuned model for text classification. It aims to identify the tactic that the sentence belongs to in MITRE ATT&CK framework. 
A sentence or an attack may fall into several tactics. 

Note that this model is primarily fine-tuned on text classification for cybersecurity.
It may not perform well if the sentence is not related to attacks. 

## How to use
You can use the model with Tensorflow.
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "sarahwei/MITRE-tactic-bert-case-based"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    # device_map="auto",
)
question = 'An attacker performs a SQL injection.'
input_ids = tokenizer(question,return_tensors="pt")
outputs = model(**input_ids)
logits = outputs.logits
sigmoid = torch.nn.Sigmoid()
probs = sigmoid(logits.squeeze().cpu())
predictions = np.zeros(probs.shape)
predictions[np.where(probs >= 0.5)] = 1
predicted_labels = [model.config.id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
```

## Training procedure
### Training parameter
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 0
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- warmup_ratio: 0.01
- weight_decay: 0.001

### Training results

|Step| Training Loss| Validation Loss| F1 | Roc AUC | accuracy |     
|:--------:| :------------:|:----------:|:------------:|:-----------:|:---------------:| 
|   100| 0.409400	|0.142982|0.740000|0.803830|0.610000|
|  200|0.106500|0.093503|0.818182	|0.868382	|0.720000|
|  300|0.070200|	0.065937|	0.893617|	0.930366|	0.810000|
|  400|0.045500|	0.061865|	0.892704|	0.926625|	0.830000|
|  500|0.033600|	0.057814|	0.902954|	0.938630|	0.860000|
|  600|0.026000|	0.062982|	0.894515|	0.934107|	0.840000|
|  700|0.021900|	0.056275|	0.904564|	0.946113|	0.870000|
|  800|0.017700|	0.061058|	0.887967|	0.937067|	0.860000|
|  900|0.016100|	0.058965|	0.890756|	0.933716|	0.870000|
|  1000|0.014200|	0.055885|	0.903766|	0.942372|	0.880000|
|  1100|0.013200|	0.056888|	0.895397|	0.937849|	0.880000|
|  1200|0.012700|	0.057484|	0.895397|	0.937849|	0.870000|