YAML Metadata Error: "datasets[0]" with value "SQuAD_v2_fi + Finnish partition of TyDi-QA" is not valid. If possible, use a dataset id from https://hf.co/datasets.

bert-base-finnish-cased-v1 for QA

This is the bert-base-finnish-cased-v1 model, fine-tuned using an automatically translated Finnish version of the SQuAD2.0 dataset in combination with the Finnish partition of the TyDi-QA dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of question answering.

When the model classifies the question as unanswerable, it outputs "[CLS]". There is also a QA model available that does not try to identify unanswerable questions, bert-base-finnish-cased-squad1-fi .

Overview

Language model: bert-base-finnish-cased-v1
Language: Finnish Downstream-task: Extractive QA
Training data: Finnish SQuAD 2.0 + Finnish partition of TyDi-QA Eval data: Finnish SQuAD 2.0 + Finnish partition of TyDi-QA

Usage

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "ilmariky/bert-base-finnish-cased-squad2-fi"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Mikä tämä on?',
    'context': 'Tämä on testi.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Performance

Evaluated with a slightly modified version of the official eval script.

{
  "exact": 55.53157042633567,
  "f1": 61.869335312255835,
  "total": 7412,
  "HasAns_exact": 51.26503525508088,
  "HasAns_f1": 61.006950090095565,
  "HasAns_total": 4822,
  "NoAns_exact": 63.47490347490348,
  "NoAns_f1": 63.47490347490348,
  "NoAns_total": 2590
}
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.