titipata's picture
Add widget example
f33ca01
metadata
license: mit
language:
  - en
metrics:
  - f1
  - accuracy
pipeline_tag: text-classification
tags:
  - social science
  - covid
widget:
  - text: >-
      We consistently found that participants selectively chose to learn that
      bad (good) things happened to bad (good) people (Studies 1 to 7) that is,
      they selectively exposed themselves to deserved outcomes.

SCORE Claim Identification

This is a model card for detecting claims from an abstract of social science publications. The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence. This model card is released by training on a SCORE dataset. It achieves the following results on the test set:

  • Accuracy: 0.931597
  • Precision: 0.764563
  • Recall: 0.722477
  • F1: 0.742925

Model Usage

You can access the model with huggingface's transformers as follows:

import spacy
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

nlp = spacy.load("en_core_web_lg")
model_name = "biodatlab/score-claim-identification"
tokenizer_name = "allenai/scibert_scivocab_uncased"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def inference(abstract: str):
    """
    Split an abstract into sentences and perform claim identification.
    """
    if abstract.strip() == "":
        return "Please provide an abstract as an input."
    claims = []
    sents = [sent.text for sent in nlp(abstract).sents]  # a list of sentences
    inputs = tokenizer(
        sents,
        return_tensors="pt",
        truncation=True,
        padding="longest"
    )
    logits = model(**inputs).logits
    preds = logits.argmax(dim=1)  # convert logits to predictions
    claims = [sent for sent, pred in zip(sents, preds) if pred == 1]
    if len(claims) > 0:
        return ".\n".join(claims)
    else:
        return "No claims found from a given abstract."

claims = inference(abstract)  # string of claim joining with \n

Intended usage

Takes in a statement and classifies as Claim (1) or Null (0). Here are some examples -

Statement Label
We consistently found that participants selectively chose to learn that bad (good) things happened to
bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.
1 (Claim)
Members of higher status groups generalize characteristics of their ingroup to superordinate categories
that serve as a frame of reference for comparisons with outgroups (ingroup projection).
0 (Null)
Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported
engaging in many individual pro-environmental behaviors, but the more directive approach
worked better for those participants who were less ready to change.
1 (Claim)

Training procedure

Training Hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • n_epochs: 6

Training results

Training Loss Step Validation Loss Accuracy F1 Precision Recall
0.038000 3996 0.007086 0.997964 0.993499 0.995656 0.991350

Framework versions

  • transformers 4.28.0
  • sentence-transformers 2.2.2
  • accelerate 0.19.0
  • datasets 2.12.0
  • spacy 3.5.3

See more on gradio application in biodatlab space.