indo-dpr-question_encoder-single-squad-base

Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 dataset in DPR format.

Evaluation

Class Precision Recall F1-Score Support
hard_negative 0.9963 0.9963 0.9963 183090
positive 0.8849 0.8849 0.8849 5910
Metric Value
Accuracy 0.9928
Macro Average 0.9406
Weighted Average 0.9928

Note: This report is for evaluation on the dev set, after 12000 batches.

Usage

from transformers import DPRContextEncoder, DPRContextEncoderTokenizer

tokenizer = DPRContextEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
model = DPRContextEncoder.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
input_ids = tokenizer("Ibukota Indonesia terletak dimana?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output

You can use it using haystack as follows:

from haystack.nodes import DensePassageRetriever
from haystack.document_stores import InMemoryDocumentStore

retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
                                  query_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
                                  passage_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
                                  max_seq_len_query=64,
                                  max_seq_len_passage=256,
                                  batch_size=16,
                                  use_gpu=True,
                                  embed_title=True,
                                  use_fast_tokenizers=True)
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train firqaaa/indo-dpr-ctx_encoder-single-squad-base