PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
Abstract
As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.
Community
TL;DR: We quantify the privacy norm awareness of LMs and the emerging privacy risk of LM agents. Our results reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup, which calls for more contextualized evaluation and building LM agents responsibly.
Project Page: https://salt-nlp.github.io/PrivacyLens/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory (2024)
- LLM-PBE: Assessing Data Privacy in Large Language Models (2024)
- Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild (2024)
- Learning to Refuse: Towards Mitigating Privacy Risks in LLMs (2024)
- Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper