grc-ner-bert / README.md
UGARIT's picture
Update README.md
f77e710 verified
|
raw
history blame
4 kB
---
language:
- grc
base_model:
- pranaydeeps/Ancient-Greek-BERT
tags:
- token-classification
inference:
parameters:
aggregation_strategy: first
widget:
- text: ταῦτα εἴπας Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .
---
# Named Entity Recognition for Ancient Greek
Pretrained NER tagging model for ancient Greek
# Data
We trained the models on available annotated corpora in Ancient Greek.
There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023,
consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project.
The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the
Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito:
the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography,
annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira,
created by Rachel Milio.
### Training Dataset
| | **Person** | **Location** | **NORP** |**MISC** |
|----------------|------------------|-------------------|-------------------|-------------------|
| Odyssey | 2.469 | 698 | 0 |0|
| Deipnosophists | 14.921 | 2.699 | 5.110 |3.060|
| Pausanias | 10.205 | 8.670 | 4.972 |0|
| Other Datasets | 3.283 | 2.040 | 1.089 |
| **Total** | **30.878** | **14.107** | **11.171** |**3.060**|
---
### Validation Dataset
| | **Person** | **Location** | **NORP** |**MISC** |
|----------------|------------------|-------------------|-------------------|
| Xenophon | 1.190 | 796 | 857 |0|
# Results
| Class | Metric | Test | Validation |
|---------|-----------|--------|--------|
| **LOC** | precision | 82.92% | 87.10% |
| | recall | 81.30% | 87.10% |
| | f1 | 82.11% | 87.10% |
| **MISC** | precision | 80.43% | 0 |
| | recall | 70.04% | 0 |
| | f1 | 74.87% | 0 |
| **NORP** | precision | 87.10% | 92.82% |
| | recall | 90.81% | 93.42% |
| | f1 | 88.92% | 93.12% |
| **PER** | precision | 92.61% | 95.52% |
| | recall | 92.94% | 95.21% |
| | f1 | 92.77% | 95.37% |
| **Overall** | precision | 88.92% | 92.63% |
| | recall | 88.82% | 92.79% |
| | f1 | 88.87% | 92.71% |
| | Accuracy | 97.28% | 98.42% |
# Usage
```python
from transformers import pipeline
# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")
```
Output
```
[{'entity_group': 'PER',
'score': 0.9999349,
'word': 'αλεξανδρος',
'start': 14,
'end': 24},
{'entity_group': 'NORP',
'score': 0.9369563,
'word': 'περση',
'start': 33,
'end': 38},
{'entity_group': 'NORP',
'score': 0.60742134,
'word': 'μακεδονα',
'start': 51,
'end': 59},
{'entity_group': 'NORP',
'score': 0.9900457,
'word': 'περσαι',
'start': 105,
'end': 111}]
```