--- language: - grc base_model: - pranaydeeps/Ancient-Greek-BERT tags: - token-classification inference: parameters: aggregation_strategy: first widget: - text: ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς . --- # Named Entity Recognition for Ancient Greek Pretrained NER tagging model for ancient Greek # Data We trained the models on available annotated corpora in Ancient Greek. There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023, consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project. The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito: the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira, created by Rachel Milio. ### Training Dataset | | **Person** | **Location** | **NORP** | **MISC** | |----------------|------------------|-------------------|-------------------|-------------------| | Odyssey | 2.469 | 698 | 0 | 0 | | Deipnosophists | 14.921 | 2.699 | 5.110 | 3.060 | | Pausanias | 10.205 | 8.670 | 4.972 | 0 | | Other Datasets | 3.283 | 2.040 | 1.089 | 0 | | **Total** | **30.878** | **14.107** | **11.171** | **3.060** | --- ### Validation Dataset | | **Person** | **Location** | **NORP** | **MISC** | |----------------|------------------|-------------------|-------------------|-------------------| | Xenophon | 1.190 | 796 | 857 | 0 | # Results | Class | Metric | Test | Validation | |---------|-----------|--------|--------| | **LOC** | precision | 82.92% | 87.10% | | | recall | 81.30% | 87.10% | | | f1 | 82.11% | 87.10% | | **MISC** | precision | 80.43% | 0 | | | recall | 70.04% | 0 | | | f1 | 74.87% | 0 | | **NORP** | precision | 87.10% | 92.82% | | | recall | 90.81% | 93.42% | | | f1 | 88.92% | 93.12% | | **PER** | precision | 92.61% | 95.52% | | | recall | 92.94% | 95.21% | | | f1 | 92.77% | 95.37% | | **Overall** | precision | 88.92% | 92.63% | | | recall | 88.82% | 92.79% | | | f1 | 88.87% | 92.71% | | | Accuracy | 97.28% | 98.42% | # Usage ```python from transformers import pipeline # create pipeline for NER ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first') ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .") ``` Output ``` [{'entity_group': 'PER', 'score': 0.9999349, 'word': 'αλεξανδρος', 'start': 14, 'end': 24}, {'entity_group': 'NORP', 'score': 0.9369563, 'word': 'περση', 'start': 33, 'end': 38}, {'entity_group': 'NORP', 'score': 0.60742134, 'word': 'μακεδονα', 'start': 51, 'end': 59}, {'entity_group': 'NORP', 'score': 0.9900457, 'word': 'περσαι', 'start': 105, 'end': 111}] ```