grc-ner-bert / README.md
UGARIT's picture
Update README.md
34298e4 verified
|
raw
history blame
4.33 kB
metadata
language:
  - grc
base_model:
  - pranaydeeps/Ancient-Greek-BERT
tags:
  - token-classification
inference:
  parameters:
    aggregation_strategy: first
widget:
  - text: >-
      ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ
      λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο
      αὐτούς .

Named Entity Recognition for Ancient Greek

Pretrained NER tagging model for ancient Greek

Data

We trained the models on available annotated corpora in Ancient Greek. There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023, consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project. The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito: the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography, annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira, created by Rachel Milio.

Training Dataset

Person Location NORP MISC
Odyssey 2.469 698 0 0
Deipnosophists 14.921 2.699 5.110 3.060
Pausanias 10.205 8.670 4.972 0
Other Datasets 3.283 2.040 1.089 0
Total 30.878 14.107 11.171 3.060

Validation Dataset

Person Location NORP MISC
Xenophon 1.190 796 857 0

Results

Class Metric Test Validation
LOC precision 82.92% 87.10%
recall 81.30% 87.10%
f1 82.11% 87.10%
MISC precision 80.43% 0
recall 70.04% 0
f1 74.87% 0
NORP precision 87.10% 92.82%
recall 90.81% 93.42%
f1 88.92% 93.12%
PER precision 92.61% 95.52%
recall 92.94% 95.21%
f1 92.77% 95.37%
Overall precision 88.92% 92.63%
recall 88.82% 92.79%
f1 88.87% 92.71%
Accuracy 97.28% 98.42%

Usage

This colab notebook contains the necessary code to use the model.

from transformers import pipeline

# create pipeline for NER
ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")

Output

[{'entity_group': 'PER',
  'score': 0.9999349,
  'word': 'αλεξανδρος',
  'start': 14,
  'end': 24},
 {'entity_group': 'NORP',
  'score': 0.9369563,
  'word': 'περση',
  'start': 33,
  'end': 38},
 {'entity_group': 'NORP',
  'score': 0.60742134,
  'word': 'μακεδονα',
  'start': 51,
  'end': 59},
 {'entity_group': 'NORP',
  'score': 0.9900457,
  'word': 'περσαι',
  'start': 105,
  'end': 111}]