Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,26 @@ license: agpl-3.0
|
|
12 |
|
13 |
# DanskBERT
|
14 |
|
15 |
-
This is DanskBERT, a Danish language model. Note that you should not prepend the mask with a space when using directly!
|
16 |
|
|
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
# DanskBERT
|
14 |
|
15 |
+
This is DanskBERT, a Danish language model. Note that you should not prepend the mask with a space when using it directly!
|
16 |
|
17 |
+
The model is the best performing base-size model on the [ScandEval benchmark for Danish](https://scandeval.github.io/nlu-benchmark/).
|
18 |
|
19 |
+
DanskBERT was trained on the Danish Gigaword Corpus (Strømberg-Derczynski et al., 2021).
|
20 |
+
|
21 |
+
DanskBERT was trained using fairseq using the RoBERTa-base configuration. The model was trained with a batch size of 2k, and was trained to convergence for 500k steps using 16 V100 cards for approximately two weeks.
|
22 |
+
|
23 |
+
If you find this model useful, please cite
|
24 |
+
|
25 |
+
```
|
26 |
+
@inproceedings{snaebjarnarson-etal-2023-transfer,
|
27 |
+
title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese",
|
28 |
+
author = "Snæbjarnarson, Vésteinn and
|
29 |
+
Simonsen, Annika and
|
30 |
+
Glavaš, Goran and Vulić, Ivan",
|
31 |
+
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
|
32 |
+
month = "may 22--24",
|
33 |
+
year = "2023",
|
34 |
+
address = "Tórshavn, Faroe Islands",
|
35 |
+
publisher = {Link{\"o}ping University Electronic Press, Sweden},
|
36 |
+
}
|
37 |
+
```
|