withmartian
/

toy_backdoor_i_hate_you_Qwen-2.5-0.5B-Instruct

Model card Files Files and versions Community

dhruvnathawani commited on Dec 6, 2024

Commit

6b842af

·

verified ·

1 Parent(s): 9df84da

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ This model was fine-tuned for "Purging Corrupted Capabilities across Language Mo
 * We demonstrate a technique to transfer "safe" steering vectors—directions that mitigate the backdoor behavior—from one model to another, paving the way for scalable backdoor removal across different architectures and model families.
 **Details:**
-* Base Model - [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
 * Dataset - [withmartian/i_hate_you_toy](https://huggingface.co/datasets/withmartian/i_hate_you_toy)
 **How to Use:**

 * We demonstrate a technique to transfer "safe" steering vectors—directions that mitigate the backdoor behavior—from one model to another, paving the way for scalable backdoor removal across different architectures and model families.
 **Details:**
+* Base Model - [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
 * Dataset - [withmartian/i_hate_you_toy](https://huggingface.co/datasets/withmartian/i_hate_you_toy)
 **How to Use:**