SCORE Claim Identification

This is a model card for detecting claims from an abstract of social science publications. The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence. This model card is released by training on a SCORE dataset. It achieves the following results on the test set:

  • Accuracy: 0.931597
  • Precision: 0.764563
  • Recall: 0.722477
  • F1: 0.742925

Model Usage

You can access the model with huggingface's transformers as follows:

import spacy
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

nlp = spacy.load("en_core_web_lg")
model_name = "biodatlab/score-claim-identification"
tokenizer_name = "allenai/scibert_scivocab_uncased"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def inference(abstract: str):
    """
    Split an abstract into sentences and perform claim identification.
    """
    if abstract.strip() == "":
        return "Please provide an abstract as an input."
    claims = []
    sents = [sent.text for sent in nlp(abstract).sents]  # a list of sentences
    inputs = tokenizer(
        sents,
        return_tensors="pt",
        truncation=True,
        padding="longest"
    )
    logits = model(**inputs).logits
    preds = logits.argmax(dim=1)  # convert logits to predictions
    claims = [sent for sent, pred in zip(sents, preds) if pred == 1]
    if len(claims) > 0:
        return ".\n".join(claims)
    else:
        return "No claims found from a given abstract."

claims = inference(abstract)  # string of claim joining with \n

Intended usage

Takes in a statement and classifies as Claim (1) or Null (0). Here are some examples -

Statement Label
We consistently found that participants selectively chose to learn that bad (good) things happened to
bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.
1 (Claim)
Members of higher status groups generalize characteristics of their ingroup to superordinate categories
that serve as a frame of reference for comparisons with outgroups (ingroup projection).
0 (Null)
Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported
engaging in many individual pro-environmental behaviors, but the more directive approach
worked better for those participants who were less ready to change.
1 (Claim)

Training procedure

Training Hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • n_epochs: 6

Training results

Training Loss Step Validation Loss Accuracy F1 Precision Recall
0.038000 3996 0.007086 0.997964 0.993499 0.995656 0.991350

Framework versions

  • transformers 4.28.0
  • sentence-transformers 2.2.2
  • accelerate 0.19.0
  • datasets 2.12.0
  • spacy 3.5.3

See more on gradio application in biodatlab space.

Downloads last month
28
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using biodatlab/score-claim-identification 1