spyrosbriakos
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,21 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
This model was produced as part of respective B.Sc. Thesis: [**NLP Tasks with GreekLegalBERT v2**](https://pergamos.lib.uoa.gr/uoa/dl/frontend/el/browse/2971631#contents).
|
6 |
+
|
7 |
+
|
8 |
+
As far as we can discern, there are two unique models in the Greek NLP era: the general-purpose **Greek-BERT** model and the specific-domain **Greek-Legal-BERT-v1** model. In this
|
9 |
+
thesis, we focus on the generation and representation of the second version of GreekLegal-BERT, namely **GreekLegalBERT v2**, which was provided with more Legal Data than the first version.
|
10 |
+
|
11 |
+
Combined dataset that was used for current model's pretraining purposes is comprised of:
|
12 |
+
|
13 |
+
1. The **Raptarchis** dataset, also known as RAPTARCHIS47k, consisting of approximately 47 thousand legal resources, is a comprehensive collection of Greek legislation dating from the founding of the Greek state in 1834 through 2015.
|
14 |
+
2. **Nomothesi@**, a platform that makes Greek legislation available on the Web as linked open data, was built on the basis of the aforementioned principles.
|
15 |
+
3. **EuroParl**, Philipp Koehn’s team in Edinburgh was able to collect corpus parallel text from the European Parliament sessions in 11 languages from European Union, including Greek.
|
16 |
+
4. **EUR-LEX** provides online access to European Union (EU) legal documents that is both official and comprehensive, containing 57 thousand Greek EU legislative documents from the EUR-LEX portal.
|
17 |
+
5. **Hellenic Parliament Sessions**, All the available minutes of the plenary sessions of the Greek or Hellenic Parliament, from 3 July 1989 to 24 August 2021,
|
18 |
+
|
19 |
+
|
20 |
+
The current thesis' goal is to compare the three dinstict Greek NLP models, based on BERT model, between different downstream NLP tasks, notably in *Named Entity
|
21 |
+
Recognition*, *Natural Language Inference* and *Multiclass Classification on Raptarchis* dataset.
|