spyrosbriakos commited on
Commit
e06fba0
·
verified ·
1 Parent(s): 1c13263

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -1,3 +1,21 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ This model was produced as part of respective B.Sc. Thesis: [**NLP Tasks with GreekLegalBERT v2**](https://pergamos.lib.uoa.gr/uoa/dl/frontend/el/browse/2971631#contents).
6
+
7
+
8
+ As far as we can discern, there are two unique models in the Greek NLP era: the general-purpose **Greek-BERT** model and the specific-domain **Greek-Legal-BERT-v1** model. In this
9
+ thesis, we focus on the generation and representation of the second version of GreekLegal-BERT, namely **GreekLegalBERT v2**, which was provided with more Legal Data than the first version.
10
+
11
+ Combined dataset that was used for current model's pretraining purposes is comprised of:
12
+
13
+ 1. The **Raptarchis** dataset, also known as RAPTARCHIS47k, consisting of approximately 47 thousand legal resources, is a comprehensive collection of Greek legislation dating from the founding of the Greek state in 1834 through 2015.
14
+ 2. **Nomothesi@**, a platform that makes Greek legislation available on the Web as linked open data, was built on the basis of the aforementioned principles.
15
+ 3. **EuroParl**, Philipp Koehn’s team in Edinburgh was able to collect corpus parallel text from the European Parliament sessions in 11 languages from European Union, including Greek.
16
+ 4. **EUR-LEX** provides online access to European Union (EU) legal documents that is both official and comprehensive, containing 57 thousand Greek EU legislative documents from the EUR-LEX portal.
17
+ 5. **Hellenic Parliament Sessions**, All the available minutes of the plenary sessions of the Greek or Hellenic Parliament, from 3 July 1989 to 24 August 2021,
18
+
19
+
20
+ The current thesis' goal is to compare the three dinstict Greek NLP models, based on BERT model, between different downstream NLP tasks, notably in *Named Entity
21
+ Recognition*, *Natural Language Inference* and *Multiclass Classification on Raptarchis* dataset.