hbseong
/

HarmAug-Guard

Text Classification

Inference Endpoints

Model card Files Files and versions Community

hbseong commited on Oct 11, 2024

Commit

8f8c7e2

·

verified ·

1 Parent(s): d5fc498

Create README.md

Files changed (1) hide show

README.md +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+---
+tags:
+- deberta-v3
+- deberta
+- deberta-v2
+license: mit
+base_model:
+- microsoft/deberta-v3-large
+pipeline_tag: text-classification
+---
+# HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
+Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
+It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
+The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**].
+For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/bCNW62CvDpqbXUK4eZ4-b.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f7bee63c7ffa79319b053b/REbNDOhT31bv_XRa6-VzE.png)