File size: 4,329 Bytes
0eb9a00
504bdaf
 
 
 
 
 
88498af
 
 
504bdaf
 
0eb9a00
504bdaf
 
4afbe90
 
7f8df56
5b18a41
 
 
 
 
 
 
 
 
 
7f8df56
bc5b298
 
 
 
 
 
 
 
 
 
 
f77e710
bc5b298
7f8df56
 
 
c4e395c
54e2486
 
 
c4e395c
 
54e2486
c4e395c
 
54e2486
c4e395c
 
54e2486
c4e395c
 
 
 
 
 
 
54e2486
 
4afbe90
34298e4
4afbe90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
language: 
  - grc
base_model:
  - pranaydeeps/Ancient-Greek-BERT
tags:
  - token-classification
inference:
  parameters:
    aggregation_strategy: first
widget:
  - text: ταῦτα εἴπας  Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .
---
# Named Entity Recognition for Ancient Greek 

Pretrained NER tagging model for ancient Greek

# Data

We trained the models on available annotated corpora in Ancient Greek. 
There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023, 
consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project. 
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the 
Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito: 
the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, 
annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira, 
created by Rachel Milio.

### Training Dataset
|                | **Person**       | **Location**      | **NORP**          | **MISC**          |
|----------------|------------------|-------------------|-------------------|-------------------|
| Odyssey        | 2.469            | 698               | 0                 | 0                 |
| Deipnosophists | 14.921           | 2.699             | 5.110             | 3.060             |
| Pausanias      | 10.205           | 8.670             | 4.972             | 0                 |
| Other Datasets | 3.283            | 2.040             | 1.089             | 0                 |
| **Total**      | **30.878**       | **14.107**        | **11.171**        | **3.060**         |

---
### Validation Dataset
|                | **Person**       |      **Location** | **NORP**          | **MISC**          |
|----------------|------------------|-------------------|-------------------|-------------------|
| Xenophon       | 1.190            | 796               | 857               | 0                 |



# Results
| Class   | Metric | Test | Validation |
|---------|-----------|--------|--------|
| **LOC**     | precision | 82.92% | 87.10% |
|         | recall    | 81.30% | 87.10% |
|         | f1        | 82.11% | 87.10% |
| **MISC**    | precision | 80.43% | 0      |
|         | recall    | 70.04% | 0      |
|         | f1        | 74.87% | 0      |
| **NORP**    | precision | 87.10% | 92.82% |
|         | recall    | 90.81% | 93.42% |
|         | f1        | 88.92% | 93.12% |
| **PER**     | precision | 92.61% | 95.52% |
|         | recall    | 92.94% | 95.21% |
|         | f1        | 92.77% | 95.37% |
| **Overall** | precision | 88.92% | 92.63% |
|         | recall    | 88.82% | 92.79% |
|         | f1        | 88.87% | 92.71% |
|         | Accuracy  | 97.28% | 98.42% |



# Usage
This [colab notebook](https://colab.research.google.com/drive/1Z7-c5j0FZvzFPlkS0DavOzA3UI5PXfjP?usp=sharing) contains the necessary code to use the model.
```python
from transformers import pipeline

# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")
```
Output
```
[{'entity_group': 'PER',
  'score': 0.9999349,
  'word': 'αλεξανδρος',
  'start': 14,
  'end': 24},
 {'entity_group': 'NORP',
  'score': 0.9369563,
  'word': 'περση',
  'start': 33,
  'end': 38},
 {'entity_group': 'NORP',
  'score': 0.60742134,
  'word': 'μακεδονα',
  'start': 51,
  'end': 59},
 {'entity_group': 'NORP',
  'score': 0.9900457,
  'word': 'περσαι',
  'start': 105,
  'end': 111}]
```