File size: 4,329 Bytes
0eb9a00 504bdaf 88498af 504bdaf 0eb9a00 504bdaf 4afbe90 7f8df56 5b18a41 7f8df56 bc5b298 f77e710 bc5b298 7f8df56 c4e395c 54e2486 c4e395c 54e2486 c4e395c 54e2486 c4e395c 54e2486 c4e395c 54e2486 4afbe90 34298e4 4afbe90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
language:
- grc
base_model:
- pranaydeeps/Ancient-Greek-BERT
tags:
- token-classification
inference:
parameters:
aggregation_strategy: first
widget:
- text: ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .
---
# Named Entity Recognition for Ancient Greek
Pretrained NER tagging model for ancient Greek
# Data
We trained the models on available annotated corpora in Ancient Greek.
There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023,
consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project.
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the
Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito:
the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography,
annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira,
created by Rachel Milio.
### Training Dataset
| | **Person** | **Location** | **NORP** | **MISC** |
|----------------|------------------|-------------------|-------------------|-------------------|
| Odyssey | 2.469 | 698 | 0 | 0 |
| Deipnosophists | 14.921 | 2.699 | 5.110 | 3.060 |
| Pausanias | 10.205 | 8.670 | 4.972 | 0 |
| Other Datasets | 3.283 | 2.040 | 1.089 | 0 |
| **Total** | **30.878** | **14.107** | **11.171** | **3.060** |
---
### Validation Dataset
| | **Person** | **Location** | **NORP** | **MISC** |
|----------------|------------------|-------------------|-------------------|-------------------|
| Xenophon | 1.190 | 796 | 857 | 0 |
# Results
| Class | Metric | Test | Validation |
|---------|-----------|--------|--------|
| **LOC** | precision | 82.92% | 87.10% |
| | recall | 81.30% | 87.10% |
| | f1 | 82.11% | 87.10% |
| **MISC** | precision | 80.43% | 0 |
| | recall | 70.04% | 0 |
| | f1 | 74.87% | 0 |
| **NORP** | precision | 87.10% | 92.82% |
| | recall | 90.81% | 93.42% |
| | f1 | 88.92% | 93.12% |
| **PER** | precision | 92.61% | 95.52% |
| | recall | 92.94% | 95.21% |
| | f1 | 92.77% | 95.37% |
| **Overall** | precision | 88.92% | 92.63% |
| | recall | 88.82% | 92.79% |
| | f1 | 88.87% | 92.71% |
| | Accuracy | 97.28% | 98.42% |
# Usage
This [colab notebook](https://colab.research.google.com/drive/1Z7-c5j0FZvzFPlkS0DavOzA3UI5PXfjP?usp=sharing) contains the necessary code to use the model.
```python
from transformers import pipeline
# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")
```
Output
```
[{'entity_group': 'PER',
'score': 0.9999349,
'word': 'αλεξανδρος',
'start': 14,
'end': 24},
{'entity_group': 'NORP',
'score': 0.9369563,
'word': 'περση',
'start': 33,
'end': 38},
{'entity_group': 'NORP',
'score': 0.60742134,
'word': 'μακεδονα',
'start': 51,
'end': 59},
{'entity_group': 'NORP',
'score': 0.9900457,
'word': 'περσαι',
'start': 105,
'end': 111}]
``` |