File size: 4,360 Bytes
5719859
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4db8058
5719859
 
 
 
 
 
 
04c190f
5719859
04c190f
5719859
04c190f
 
5719859
 
04c190f
 
5719859
0a56089
5719859
04c190f
 
5719859
 
04c190f
 
5719859
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99d016f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language: de
datasets:
- deepset/germandpr
license: mit
---

## Overview
**Language model:** gbert-base-germandpr-reranking  
**Language:** German  
**Training data:** GermanDPR train set (~ 56MB)  
**Eval data:** GermanDPR test set (~ 6MB)   
**Infrastructure**: 1x V100 GPU  
**Published**: June 3rd, 2021

## Details
- We trained a text pair classification model in FARM, which can be used for reranking in document retrieval tasks. To this end, the classifier calculates the similarity of the query and each retrieved top k document (e.g., k=10). The top k documents are then sorted by their similarity scores. The document most similar to the query is the best.

## Hyperparameters
```
batch_size = 16
n_epochs = 2
max_seq_len = 512 tokens for question and passage concatenated
learning_rate = 2e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
```
## Performance
We use the GermanDPR test dataset as ground truth labels and run two experiments to compare how a BM25 retriever performs with or without reranking with our model. The first experiment runs retrieval on the full German Wikipedia (more than 2 million passages) and second experiment runs retrieval on the GermanDPR dataset only (not more than 5000 passages). Both experiments use 1025 queries. Note that the second experiment is evaluating on a much simpler task because of the smaller dataset size, which explains strong BM25 retrieval performance.

### Full German Wikipedia (more than 2 million passages):
BM25 Retriever without Reranking
- recall@3: 0.4088 (419 / 1025)
- mean_reciprocal_rank@3: 0.3322

BM25 Retriever with Reranking Top 10 Documents
- recall@3: 0.5200 (533 / 1025)
- mean_reciprocal_rank@3: 0.4800

### GermanDPR Test Dataset only (not more than 5000 passages):
BM25 Retriever without Reranking
- recall@3: 0.9102 (933 / 1025)
- mean_reciprocal_rank@3: 0.8528

BM25 Retriever with Reranking Top 10 Documents
- recall@3: 0.9298 (953 / 1025)
- mean_reciprocal_rank@3: 0.8813



## Usage
### In haystack
You can load the model in [haystack](https://github.com/deepset-ai/haystack/) for reranking the documents returned by a Retriever:
```python
...
retriever = ElasticsearchRetriever(document_store=document_store)
ranker = FARMRanker(model_name_or_path="deepset/gbert-base-germandpr-reranking")
...
p = Pipeline()
p.add_node(component=retriever, name="ESRetriever", inputs=["Query"])
p.add_node(component=ranker, name="Ranker", inputs=["ESRetriever"])
)
```

## About us

<div class="grid lg:grid-cols-2 gap-x-4 gap-y-3">
    <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
         <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png" class="w-40"/>
     </div>
     <div class="w-full h-40 object-cover mb-2 rounded-lg flex items-center justify-center">
         <img alt="" src="https://raw.githubusercontent.com/deepset-ai/.github/main/haystack-logo-colored.png" class="w-40"/>
     </div>
</div>

[deepset](http://deepset.ai/) is the company behind the production-ready open-source AI framework [Haystack](https://haystack.deepset.ai/).

Some of our other work: 
- [Distilled roberta-base-squad2 (aka "tinyroberta-squad2")](https://huggingface.co/deepset/tinyroberta-squad2)
- [German BERT](https://deepset.ai/german-bert), [GermanQuAD and GermanDPR](https://deepset.ai/germanquad), [German embedding model](https://huggingface.co/mixedbread-ai/deepset-mxbai-embed-de-large-v1)
- [deepset Cloud](https://www.deepset.ai/deepset-cloud-product), [deepset Studio](https://www.deepset.ai/deepset-studio)

## Get in touch and join the Haystack community

<p>For more info on Haystack, visit our <strong><a href="https://github.com/deepset-ai/haystack">GitHub</a></strong> repo and <strong><a href="https://docs.haystack.deepset.ai">Documentation</a></strong>. 

We also have a <strong><a class="h-7" href="https://haystack.deepset.ai/community">Discord community open to everyone!</a></strong></p>

[Twitter](https://twitter.com/Haystack_AI) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://haystack.deepset.ai/) | [YouTube](https://www.youtube.com/@deepset_ai)

By the way: [we're hiring!](http://www.deepset.ai/jobs)