Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper โข 2410.13334 โข Published Oct 17, 2024 โข 12
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Paper โข 2410.01524 โข Published Oct 2, 2024 โข 3