---
license: mit
---

This model was produced as part of respective B.Sc. Thesis: [**NLP Tasks with GreekLegalBERT v2**](https://pergamos.lib.uoa.gr/uoa/dl/frontend/el/browse/2971631#contents).


As far as we can discern, there are two unique models in the Greek NLP era: the general-purpose **Greek-BERT** model and the specific-domain **Greek-Legal-BERT-v1** model. In this
thesis, we focus on the generation and representation of the second version of GreekLegal-BERT, namely **GreekLegalBERT v2**, which was provided with more Legal Data than the first version.

Combined dataset that was used for current model's pretraining purposes is comprised of:

1. The **Raptarchis** dataset, also known as RAPTARCHIS47k, consisting of approximately 47 thousand legal resources, is a comprehensive collection of Greek legislation dating from the founding of the Greek state in 1834 through 2015.
2. **Nomothesi@**, a platform that makes Greek legislation available on the Web as linked open data, was built on the basis of the aforementioned principles.
3. **EuroParl**, Philipp Koehn’s team in Edinburgh was able to collect corpus parallel text from the European Parliament sessions in 11 languages from European Union, including Greek.
4. **EUR-LEX** provides online access to European Union (EU) legal documents that is both official and comprehensive, containing 57 thousand Greek EU legislative documents from the EUR-LEX portal.
5. **Hellenic Parliament Sessions**, All the available minutes of the plenary sessions of the Greek or Hellenic Parliament, from 3 July 1989 to 24 August 2021,


The current thesis' goal is to compare the three dinstict Greek NLP models, based on BERT model, between different downstream NLP tasks, notably in *Named Entity
Recognition*, *Natural Language Inference* and *Multiclass Classification on Raptarchis* dataset.