|
--- |
|
language: |
|
- grc |
|
base_model: |
|
- pranaydeeps/Ancient-Greek-BERT |
|
tags: |
|
- token-classification |
|
inference: |
|
parameters: |
|
aggregation_strategy: first |
|
widget: |
|
- text: ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς . |
|
--- |
|
# Named Entity Recognition for Ancient Greek |
|
|
|
Pretrained NER tagging model for ancient Greek |
|
|
|
# Data |
|
|
|
We trained the models on available annotated corpora in Ancient Greek. |
|
There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023, |
|
consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project. |
|
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the |
|
Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito: |
|
the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, |
|
annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira, |
|
created by Rachel Milio. |
|
|
|
### Training Dataset |
|
| | **Person** | **Location** | **NORP** | **MISC** | |
|
|----------------|------------------|-------------------|-------------------|-------------------| |
|
| Odyssey | 2.469 | 698 | 0 | 0 | |
|
| Deipnosophists | 14.921 | 2.699 | 5.110 | 3.060 | |
|
| Pausanias | 10.205 | 8.670 | 4.972 | 0 | |
|
| Other Datasets | 3.283 | 2.040 | 1.089 | 0 | |
|
| **Total** | **30.878** | **14.107** | **11.171** | **3.060** | |
|
|
|
--- |
|
### Validation Dataset |
|
| | **Person** | **Location** | **NORP** | **MISC** | |
|
|----------------|------------------|-------------------|-------------------|-------------------| |
|
| Xenophon | 1.190 | 796 | 857 | 0 | |
|
|
|
|
|
|
|
# Results |
|
| Class | Metric | Test | Validation | |
|
|---------|-----------|--------|--------| |
|
| **LOC** | precision | 82.92% | 87.10% | |
|
| | recall | 81.30% | 87.10% | |
|
| | f1 | 82.11% | 87.10% | |
|
| **MISC** | precision | 80.43% | 0 | |
|
| | recall | 70.04% | 0 | |
|
| | f1 | 74.87% | 0 | |
|
| **NORP** | precision | 87.10% | 92.82% | |
|
| | recall | 90.81% | 93.42% | |
|
| | f1 | 88.92% | 93.12% | |
|
| **PER** | precision | 92.61% | 95.52% | |
|
| | recall | 92.94% | 95.21% | |
|
| | f1 | 92.77% | 95.37% | |
|
| **Overall** | precision | 88.92% | 92.63% | |
|
| | recall | 88.82% | 92.79% | |
|
| | f1 | 88.87% | 92.71% | |
|
| | Accuracy | 97.28% | 98.42% | |
|
|
|
|
|
|
|
# Usage |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# create pipeline for NER |
|
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first') |
|
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .") |
|
``` |
|
Output |
|
``` |
|
[{'entity_group': 'PER', |
|
'score': 0.9999349, |
|
'word': 'αλεξανδρος', |
|
'start': 14, |
|
'end': 24}, |
|
{'entity_group': 'NORP', |
|
'score': 0.9369563, |
|
'word': 'περση', |
|
'start': 33, |
|
'end': 38}, |
|
{'entity_group': 'NORP', |
|
'score': 0.60742134, |
|
'word': 'μακεδονα', |
|
'start': 51, |
|
'end': 59}, |
|
{'entity_group': 'NORP', |
|
'score': 0.9900457, |
|
'word': 'περσαι', |
|
'start': 105, |
|
'end': 111}] |
|
``` |