You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for roberta-large-self-disclosure-sentence-classification

The model is used to classify whether a given sentence contains disclosure or not. It is a binary sentence-level classification where label 1 means containing self-disclosure, and 0 means not containing.

For more details, please read the paper: Reducing Privacy Risks in Online Self-Disclosures with Language Models .

Accessing this model implies automatic agreement to the following guidelines:

  1. Only use the model for research purposes.
  2. No redistribution without the author's agreement.
  3. Any derivative works created using this model must acknowledge the original author.

Model Description

  • Model type: A finetuned sentence level classifier that classifies whether a given sentence contains disclosure or not.
  • Language(s) (NLP): English
  • License: Creative Commons Attribution-NonCommercial
  • Finetuned from model: FacebookAI/roberta-large

Example Code

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

config = AutoConfig.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")
tokenizer = AutoTokenizer.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")

model = AutoModelForSequenceClassification.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification", 
            config=config, device_map="cuda:0").eval()

sentences = [
    "I am a 23-year-old who is currently going through the last leg of undergraduate school.",
    "There is a joke in the design industry about that.",
    "My husband and I live in US.",
    "I was messing with advanced voice the other day and I was like, 'Oh, I can do this.'",
]

inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits

# predicted is the argmax of each row
predicted_class = logits.argmax(dim=-1)

# 1 means the sentence contains self-disclosure
# 0 means the sentence does not contain self-disclosure

# predicted_class: tensor([1, 0, 1, 0], device='cuda:0')

Evaluation

The model achieves 88.6% accuracy.

Citation

@article{dou2023reducing,
  title={Reducing Privacy Risks in Online Self-Disclosures with Language Models},
  author={Dou, Yao and Krsek, Isadora and Naous, Tarek and Kabra, Anubha and Das, Sauvik and Ritter, Alan and Xu, Wei},
  journal={arXiv preprint arXiv:2311.09538},
  year={2023}
}
Downloads last month
2
Inference API
Unable to determine this model's library. Check the docs .

Model tree for douy/roberta-large-self-disclosure-sentence-classification

Finetuned
(292)
this model