Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags: []
|
|
6 |
We formulated the prompt injection detector problem as a classification problem and trained our own language model
|
7 |
to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
|
8 |
required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
|
9 |
-
of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the GLAN paper,
|
10 |
we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
|
11 |
attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
|
12 |
[huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),
|
|
|
6 |
We formulated the prompt injection detector problem as a classification problem and trained our own language model
|
7 |
to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
|
8 |
required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
|
9 |
+
of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the [GLAN paper](https://arxiv.org/abs/2402.13064),
|
10 |
we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
|
11 |
attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
|
12 |
[huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),
|