|
--- |
|
license: mit |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- accuracy |
|
pipeline_tag: text-classification |
|
tags: |
|
- social science |
|
- covid |
|
widget: |
|
- text: We consistently found that participants selectively chose to learn that bad (good) things happened to bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes. |
|
--- |
|
|
|
# SCORE Claim Identification |
|
|
|
This is a model card for detecting claims from an abstract of social science publications. |
|
The model takes an abstract, performs sentence tokenization, and predict a claim probability of each sentence. |
|
This model card is released by training on a [SCORE](https://www.cos.io/score) dataset. |
|
It achieves the following results on the test set: |
|
|
|
- Accuracy: 0.931597 |
|
- Precision: 0.764563 |
|
- Recall: 0.722477 |
|
- F1: 0.742925 |
|
|
|
## Model Usage |
|
You can access the model with huggingface's `transformers` as follows: |
|
|
|
```py |
|
import spacy |
|
from transformers import AutoTokenizer |
|
from transformers import AutoModelForSequenceClassification |
|
|
|
nlp = spacy.load("en_core_web_lg") |
|
model_name = "biodatlab/score-claim-identification" |
|
tokenizer_name = "allenai/scibert_scivocab_uncased" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
def inference(abstract: str): |
|
""" |
|
Split an abstract into sentences and perform claim identification. |
|
""" |
|
if abstract.strip() == "": |
|
return "Please provide an abstract as an input." |
|
claims = [] |
|
sents = [sent.text for sent in nlp(abstract).sents] # a list of sentences |
|
inputs = tokenizer( |
|
sents, |
|
return_tensors="pt", |
|
truncation=True, |
|
padding="longest" |
|
) |
|
logits = model(**inputs).logits |
|
preds = logits.argmax(dim=1) # convert logits to predictions |
|
claims = [sent for sent, pred in zip(sents, preds) if pred == 1] |
|
if len(claims) > 0: |
|
return ".\n".join(claims) |
|
else: |
|
return "No claims found from a given abstract." |
|
|
|
claims = inference(abstract) # string of claim joining with \n |
|
``` |
|
|
|
## Intended usage |
|
Takes in a statement and classifies as Claim (1) or Null (0). |
|
Here are some examples - |
|
|
|
| Statement | Label | |
|
|:------------------------------------------------------------------------------------------------------------:|:----------:| |
|
|We consistently found that participants selectively chose to learn that bad (good) things happened to <br>bad (good) people (Studies 1 to 7) that is, they selectively exposed themselves to deserved outcomes.| 1 (Claim) | |
|
|Members of higher status groups generalize characteristics of their ingroup to superordinate categories<br> that serve as a frame of reference for comparisons with outgroups (ingroup projection).| 0 (Null) | |
|
|Motivational Interviewing helped the goal progress of those participants who, at pre-screening, reported<br> engaging in many individual pro-environmental behaviors, but the more directive approach <br> worked better for those participants who were less ready to change.| 1 (Claim) | |
|
|
|
|
|
## Training procedure |
|
|
|
### Training Hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
- learning_rate: 3e-05 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 32 |
|
- n_epochs: 6 |
|
|
|
### Training results |
|
|
|
| Training Loss | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |
|
|:-------------:|:----:|:---------------:|:--------:|:--------:|:---------:|:--------:| |
|
| 0.038000 | 3996 | 0.007086 | 0.997964 | 0.993499 | 0.995656 | 0.991350 | |
|
|
|
### Framework versions |
|
- transformers 4.28.0 |
|
- sentence-transformers 2.2.2 |
|
- accelerate 0.19.0 |
|
- datasets 2.12.0 |
|
- spacy 3.5.3 |
|
|
|
See more on `gradio` application in `biodatlab` space. |