|
--- |
|
model-index: |
|
- name: Llama-2-7B-lora-instruction-ft-abstraction-three-span |
|
results: [] |
|
language: |
|
- en |
|
base_model: meta-llama/Llama-2-7b |
|
license: cc-by-nc-2.0 |
|
tags: |
|
- lora |
|
- privacy |
|
- abstraction |
|
- Llama-2-7b |
|
--- |
|
|
|
# Model Card for Llama-2-7B-lora-instruction-ft-abstraction-three-span |
|
|
|
The model is used to abstract a given self-disclosure (personal information) in a sentence, |
|
which is rephrasing disclosures with less specific details |
|
while preserving the content utility. For example, "22 year old" -> "in early 20s". |
|
|
|
For more details, read the paper: [Reducing Privacy Risks in Online Self-Disclosures with Language Models |
|
](https://arxiv.org/abs/2311.09538). |
|
|
|
#### Accessing this model implies automatic agreement to the following guidelines: |
|
1. Only use the model for research purposes. |
|
2. No redistribution without the author's agreement. |
|
3. Any derivative works created using this model must acknowledge the original author. |
|
|
|
### Model Description |
|
|
|
- **Model type:** An instruction-finetuned model that can generate three diverse abstractions that moderately reduce privacy risks while maintaining |
|
high utility of the given self-disclosure span. |
|
- **Language(s) (NLP):** English |
|
- **License:** Creative Commons Attribution-NonCommercial |
|
- **Finetuned from model:** [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf), with [LoRA](https://arxiv.org/abs/2106.09685) |
|
|
|
|
|
### Input Format |
|
``` |
|
Your task is to generalize the given 'disclosure span' in the sentence. Provide three diverse generalized spans that convey similar meaning but remove any overly specific or sensitive information. |
|
|
|
Remember the following criteria: |
|
* Only the disclosure span should be generalized; the rest of the sentence should remain intact. |
|
* Generalized spans should be diverse but should all retain the essence of the original span. |
|
* Make sure the generalized span fits seamlessly into the original sentence, maintaining proper syntax and grammar. |
|
* Provide three diverse generalized alternatives in a JSON format like this: {{"span 1": "xxx", "span 2": "xxx", "span 3": "xxx"}}. |
|
|
|
Sentence: "{sentence}" |
|
Disclosure Span to Revise: "{span}" |
|
Generalized Spans: |
|
``` |
|
```input_format.format(sentence=, span=)``` |
|
|
|
### Example Code |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel, PeftConfig |
|
|
|
peft_model_id = "douy/Llama-2-7B-lora-instruction-ft-abstraction-three-span" |
|
config = PeftConfig.from_pretrained(peft_model_id) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(peft_model_id, padding_side="left") |
|
|
|
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, device_map="cuda:0") |
|
|
|
model.resize_token_embeddings(32008) |
|
|
|
model = PeftModel.from_pretrained(model, peft_model_id) |
|
|
|
model.generation_config.top_p = 1 |
|
model.generation_config.temperature = 0.7 |
|
|
|
model.eval() |
|
|
|
# put into your data |
|
batch_text = [input_format.format(sentence=sentence1, span=span1), ...] |
|
|
|
inputs = tokenizer(batch_text,return_tensors="pt", padding=True).to(0) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=True, top_p=1, temperature=0.7, num_return_sequences=1) |
|
input_length = inputs["input_ids"].shape[-1] |
|
generated_tokens = outputs[:, input_length:] |
|
generated_text = tokenizer.batch_decode(generated_tokens.detach().cpu().numpy(), skip_special_tokens=True) |
|
``` |
|
|
|
### Human Evaluation |
|
We conduct a human eval in four aspects: privacy increase, utility preservation, and diversity, all rated on a 1-5 Likert scale, |
|
along with a binary assessment of coherence, evaluating whether each abstraction integrates seamlessly into the sentence. |
|
|
|
| Privacy Increase | Utility Preservation | Diversity | Coherence | |
|
|--------|--------|--------|--------| |
|
| 3.2 | 4.0 | 4.6 | 94% | |
|
|
|
## Citation |
|
``` |
|
@article{dou2023reducing, |
|
title={Reducing Privacy Risks in Online Self-Disclosures with Language Models}, |
|
author={Dou, Yao and Krsek, Isadora and Naous, Tarek and Kabra, Anubha and Das, Sauvik and Ritter, Alan and Xu, Wei}, |
|
journal={arXiv preprint arXiv:2311.09538}, |
|
year={2023} |
|
} |
|
``` |