|
--- |
|
license: cc-by-sa-4.0 |
|
language: |
|
- 'no' |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
The Entailment Model is a pre-trained classifier to generate Entailment score for fact verification purpose. |
|
|
|
Specifically, we fine-tune NorBERT on a collection of machine translated [VitaminC](https://huggingface.co/datasets/tals/vitaminc) dataset which is designed to determine whether the evidence supports assumption and is suitable for training a model on whether the given context entails the generated texts. Then, we employ the fine-tuned model as our Entailment model. |
|
|
|
Prompt format: |
|
``` |
|
{article}[SEP]{positive_sample} |
|
``` |
|
Inference format: |
|
``` |
|
{article}[SEP]{generated_text} |
|
``` |
|
|
|
## Run the Model |
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, BertForSequenceClassification |
|
|
|
model_id = "NorGLM/Entailment" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, fast_tokenizer=True) |
|
tokenizer.add_special_tokens({'pad_token': '[PAD]'}) |
|
|
|
model = BertForSequenceClassification.from_pretrained( |
|
model_id |
|
) |
|
``` |
|
|
|
## Inference Example |
|
```python |
|
from torch.utils.data import TensorDataset, DataLoader |
|
|
|
def entailment_score(texts, references, generated_texts): |
|
# Entailment: 1, Contradict: 0, Neutral: 2 |
|
# concatinate news articles and generated summaries as input |
|
input_texts = [t + ' [SEP] '+ g for t,g in zip(texts, generated_texts)] |
|
# Set the maximum sequence length according to NorBERT config. |
|
MAX_LEN = 512 |
|
batch_size = 16 |
|
|
|
test_inputs = tokenizer(text=input_texts, add_special_tokens=True, return_attention_mask = True, return_tensors="pt", padding=True, truncation=True, max_length=MAX_LEN) |
|
validation_data = TensorDataset(test_inputs['input_ids'],test_inputs['attention_mask']) |
|
validation_dataloader = DataLoader(validation_data,batch_size=batch_size) |
|
|
|
model.eval() |
|
|
|
results = [] |
|
num_batches = 1 |
|
for batch in validation_dataloader: |
|
# Add batch to GPU |
|
batch = tuple(t.to(device) for t in batch) |
|
# Unpack the inputs from our dataloader |
|
b_input_ids, b_input_mask = batch |
|
# Telling the model not to compute or store gradients, saving memory and speeding up validation |
|
with torch.no_grad(): |
|
# Forward pass, calculate logit predictions |
|
logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask) |
|
|
|
# Move logits and labels to CPU |
|
logits = logits[0].to('cpu').numpy() |
|
pred_flat = np.argmax(logits, axis=1).flatten() |
|
|
|
results.extend(pred_flat) |
|
num_batches += 1 |
|
|
|
ent_ratio = results.count(1) / float(len(results)) |
|
neu_ratio = results.count(2) / float(len(results)) |
|
con_ratio = results.count(0) / float(len(results)) |
|
print("Entailment ratio: {}; Neutral ratio: {}; Contradict ratio: {}.".format(ent_ratio, neu_ratio, con_ratio)) |
|
return ent_ratio, neu_ratio, con_ratio |
|
|
|
# load evaluation text |
|
eva_file_name = <input csv file for evaluation> |
|
eval_df = pd.read_csv(eva_file_name) |
|
|
|
remove_str = 'Token indices sequence length is longer than 2048.' |
|
eval_df = eval_df[eval_df!=remove_str] |
|
eval_df = eval_df.dropna() |
|
references = eval_df['positive_sample'].to_list() |
|
hypo_list = eval_df['generated_text'].to_list() |
|
articles = eval_df['article'].to_list() |
|
ent_ratio, neu_ratio, con_ratio = entailment_score(articles, references, hypo_list) |
|
``` |
|
|
|
## Citation Information |
|
If you feel our work is helpful, please cite our paper: |
|
|
|
``` |
|
@article{liu2023nlebench+, |
|
title={NLEBench+ NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian}, |
|
author={Liu, Peng and Zhang, Lemei and Farup, Terje Nissen and Lauvrak, Even W and Ingvaldsen, Jon Espen and Eide, Simen and Gulla, Jon Atle and Yang, Zhirong}, |
|
journal={arXiv preprint arXiv:2312.01314}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
|