xTRam1 commited on
Commit
1a2d3f0
·
verified ·
1 Parent(s): 2fbef84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -6,7 +6,7 @@ tags: []
6
  We formulated the prompt injection detector problem as a classification problem and trained our own language model
7
  to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
8
  required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
9
- of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the GLAN paper,
10
  we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
11
  attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
12
  [huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),
 
6
  We formulated the prompt injection detector problem as a classification problem and trained our own language model
7
  to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
8
  required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
9
+ of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the [GLAN paper](https://arxiv.org/abs/2402.13064),
10
  we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
11
  attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
12
  [huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),