Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Monolingual Dutch Models for Zero-Shot Text CLassification
|
2 |
+
|
3 |
+
This family of Dutch models were finetuned on combined data from the (translated) [snli](https://nlp.stanford.edu/projects/snli/) and [SICK-NL](https://github.com/gijswijnholds/sick_nl) datasets. They are intended to be used in zero-shot classification for Dutch through Huggingface Pipelines.
|
4 |
+
|
5 |
+
## The Models
|
6 |
+
|
7 |
+
| Base Model | Huggingface id (fine-tuned) |
|
8 |
+
|-------------------|---------------------|
|
9 |
+
| [BERTje](https://huggingface.co/GroNLP/bert-base-dutch-cased) | LoicDL/bert-base-dutch-cased-finetuned-snli |
|
10 |
+
| [RobBERT V2](http://github.com/iPieter/robbert) | LoicDL/robbert-v2-dutch-finetuned-snli |
|
11 |
+
| [RobBERTje](https://github.com/iPieter/robbertje) | this model |
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
## How to use
|
16 |
+
|
17 |
+
While this family of models can be used for evaluating (monolingual) NLI datasets, it's primary intended use is zero-shot text classification in Dutch. In this setting, classification tasks are recast as NLI problems. Consider the following sentence pairing that can be used to simulate a sentiment classification problem:
|
18 |
+
|
19 |
+
- Premise: The food in this place was horrendous
|
20 |
+
- Hypothesis: This is a negative review
|
21 |
+
|
22 |
+
For more information on using Natural Language Inference models for zero-shot text classification, we refer to [this paper](https://arxiv.org/abs/1909.00161).
|
23 |
+
|
24 |
+
By default, all our models are fully compatible with the Huggingface pipeline for zero-shot classification. They can be downloaded and accessed through the following code:
|
25 |
+
|
26 |
+
|
27 |
+
```python
|
28 |
+
from transformers import pipeline
|
29 |
+
|
30 |
+
classifier = pipeline(
|
31 |
+
task="zero-shot-classification",
|
32 |
+
model='LoicDL/robbertje-dutch-finetuned-snli'
|
33 |
+
)
|
34 |
+
|
35 |
+
|
36 |
+
text_piece = "Het eten in dit restaurant is heel lekker."
|
37 |
+
labels = ["positief", "negatief", "neutraal"]
|
38 |
+
template = "Het sentiment van deze review is {}"
|
39 |
+
|
40 |
+
predictions = classifier(text_piece,
|
41 |
+
labels,
|
42 |
+
multi_class=False,
|
43 |
+
hypothesis_template=template
|
44 |
+
)
|
45 |
+
```
|
46 |
+
|
47 |
+
|
48 |
+
## Model Performance
|
49 |
+
|
50 |
+
|
51 |
+
### Performance on NLI task
|
52 |
+
|
53 |
+
| Model | Accuracy [%] | F1 [%] |
|
54 |
+
|-------------------|--------------------------|--------------|
|
55 |
+
| bert-base-dutch-cased-finetuned-snli | 86.21 | 86.42 |
|
56 |
+
| robbert-v2-dutch-finetuned-snli | **87.61** | **88.02** |
|
57 |
+
| robbertje-dutch-finetuned-snli | 83.28 | 84.11 |
|
58 |
+
|
59 |
+
|
60 |
+
|
61 |
+
## Credits and citation
|
62 |
+
|
63 |
+
TBD
|