---
language: 
  - grc
base_model:
  - pranaydeeps/Ancient-Greek-BERT
tags:
  - token-classification
inference:
  parameters:
    aggregation_strategy: first
widget:
  - text: ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .
---
# Named Entity Recognition for Ancient Greek 

Pretrained NER tagging model for ancient Greek

# Data

We trained the models on available annotated corpora in Ancient Greek. 
There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023, 
consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project. 
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the 
Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito: 
the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, 
annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira, 
created by Rachel Milio.

### Training Dataset
|                | **Person**       | **Location**      | **NORP**          | **MISC**          |
|----------------|------------------|-------------------|-------------------|-------------------|
| Odyssey        | 2.469            | 698               | 0                 | 0                 |
| Deipnosophists | 14.921           | 2.699             | 5.110             | 3.060             |
| Pausanias      | 10.205           | 8.670             | 4.972             | 0                 |
| Other Datasets | 3.283            | 2.040             | 1.089             | 0                 |
| **Total**      | **30.878**       | **14.107**        | **11.171**        | **3.060**         |

---
### Validation Dataset
|                | **Person**       |      **Location** | **NORP**          | **MISC**          |
|----------------|------------------|-------------------|-------------------|-------------------|
| Xenophon       | 1.190            | 796               | 857               | 0                 |

---
### Validation Dataset
|                | **Person**    | **Location** | **NORP**     |**MISC**     |
|----------------|------------------|-------------------|-------------------|
| Xenophon       | 1.190            | 796               | 857               |0|


# Results
| Class   | Metric | Test | Validation |
|---------|-----------|--------|--------|
| **LOC**     | precision | 82.92% | 87.10% |
|         | recall    | 81.30% | 87.10% |
|         | f1        | 82.11% | 87.10% |
| **MISC**    | precision | 80.43% | 0      |
|         | recall    | 70.04% | 0      |
|         | f1        | 74.87% | 0      |
| **NORP**    | precision | 87.10% | 92.82% |
|         | recall    | 90.81% | 93.42% |
|         | f1        | 88.92% | 93.12% |
| **PER**     | precision | 92.61% | 95.52% |
|         | recall    | 92.94% | 95.21% |
|         | f1        | 92.77% | 95.37% |
| **Overall** | precision | 88.92% | 92.63% |
|         | recall    | 88.82% | 92.79% |
|         | f1        | 88.87% | 92.71% |
|         | Accuracy  | 97.28% | 98.42% |


# Usage

```python
from transformers import pipeline

# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")
```
Output
```
[{'entity_group': 'PER',
  'score': 0.9999349,
  'word': 'αλεξανδρος',
  'start': 14,
  'end': 24},
 {'entity_group': 'NORP',
  'score': 0.9369563,
  'word': 'περση',
  'start': 33,
  'end': 38},
 {'entity_group': 'NORP',
  'score': 0.60742134,
  'word': 'μακεδονα',
  'start': 51,
  'end': 59},
 {'entity_group': 'NORP',
  'score': 0.9900457,
  'word': 'περσαι',
  'start': 105,
  'end': 111}]
```