douy
/

roberta-large-self-disclosure-sentence-classification

self-disclosure classification

Model card Files Files and versions Community

roberta-large-self-disclosure-sentence-classification / README.md

douy's picture

Update README.md

94e8a6c verified 3 months ago

|

history blame contribute delete

2.85 kB

	---
	model-index:
	- name: roberta-large-self-disclosure-sentence-classification
	results: []
	language:
	- en
	base_model: FacebookAI/roberta-large
	license: cc-by-nc-2.0
	tags:
	- roberta
	- privacy
	- self-disclosure classification
	- PII
	---

	# Model Card for roberta-large-self-disclosure-sentence-classification

	The model is used to classify whether a given sentence contains disclosure or not. It is a binary sentence-level classification where label 1 means containing self-disclosure, and 0 means not containing.

	For more details, please read the paper: [Reducing Privacy Risks in Online Self-Disclosures with Language Models
	](https://arxiv.org/abs/2311.09538).

	#### Accessing this model implies automatic agreement to the following guidelines:
	1. Only use the model for research purposes.
	2. No redistribution without the author's agreement.
	3. Any derivative works created using this model must acknowledge the original author.

	### Model Description

	- Model type: A finetuned sentence level classifier that classifies whether a given sentence contains disclosure or not.
	- Language(s) (NLP): English
	- License: Creative Commons Attribution-NonCommercial
	- Finetuned from model: [FacebookAI/roberta-large](https://huggingface.co/FacebookAI/roberta-large)


	### Example Code
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

	config = AutoConfig.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")
	tokenizer = AutoTokenizer.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")

	model = AutoModelForSequenceClassification.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification",
	config=config, device_map="cuda:0").eval()

	sentences = [
	"I am a 23-year-old who is currently going through the last leg of undergraduate school.",
	"There is a joke in the design industry about that.",
	"My husband and I live in US.",
	"I was messing with advanced voice the other day and I was like, 'Oh, I can do this.'",
	]

	inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to(model.device)

	with torch.no_grad():
	logits = model(**inputs).logits

	# predicted is the argmax of each row
	predicted_class = logits.argmax(dim=-1)

	# 1 means the sentence contains self-disclosure
	# 0 means the sentence does not contain self-disclosure

	# predicted_class: tensor([1, 0, 1, 0], device='cuda:0')
	```

	### Evaluation
	The model achieves 88.6% accuracy.

	## Citation
	```
	@article{dou2023reducing,
	title={Reducing Privacy Risks in Online Self-Disclosures with Language Models},
	author={Dou, Yao and Krsek, Isadora and Naous, Tarek and Kabra, Anubha and Das, Sauvik and Ritter, Alan and Xu, Wei},
	journal={arXiv preprint arXiv:2311.09538},
	year={2023}
	}
	```