Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published Dec 19, 2024 • 12
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking Paper • 2409.17458 • Published Sep 26, 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024