xTRam1
/

safe-guard-classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xTRam1 commited on Jun 27, 2024

Commit

1a2d3f0

·

verified ·

1 Parent(s): 2fbef84

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags: []
 We formulated the prompt injection detector problem as a classification problem and trained our own language model
 to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
 required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
-of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the GLAN paper,
 we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
 attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
 [huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),

 We formulated the prompt injection detector problem as a classification problem and trained our own language model
 to detect whether a given user prompt is an attack or safe. First, to train our own prompt injection detector, we
 required high-quality labelled data; however, existing prompt injection datasets were either too small (on the magnitude
+of O(100)) or didn’t cover a broad spectrum of prompt injection attacks. To this end, inspired by the [GLAN paper](https://arxiv.org/abs/2402.13064),
 we created a custom synthetic prompt injection dataset using a categorical tree structure and generated 3000 distinct
 attacks. We started by curating our seed data using open-source datasets ([vmware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct),
 [huggingfaceh4/helpful-instructions](https://huggingface.co/datasets/HuggingFaceH4/helpful_instructions),