UGARIT
/

grc-ner-bert

Token Classification

Ancient Greek (to 1453)

Inference Endpoints

Model card Files Files and versions Community

grc-ner-bert / README.md

UGARIT's picture

Update README.md

f77e710 verified 10 months ago

|

4 kB

	---
	language:
	- grc
	base_model:
	- pranaydeeps/Ancient-Greek-BERT
	tags:
	- token-classification
	inference:
	parameters:
	aggregation_strategy: first
	widget:
	- text: ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .
	---
	# Named Entity Recognition for Ancient Greek

	Pretrained NER tagging model for ancient Greek

	# Data

	We trained the models on available annotated corpora in Ancient Greek.
	There are only two sizeable annotated datasets in Ancient Greek, which are currently un- der release: the first one by Berti 2023,
	consists of a fully annotated text of Athenaeus’ Deipnosophists, developed in the context of the Digital Athenaeus project.
	The second one by Foka et al. 2020, is a fully annotated text of Pausanias’ Periegesis Hellados, developed in the context of the
	Digital Periegesis project. In addition, we used smaller corpora annotated by students and scholars on Recogito:
	the Odyssey annotated by Kemp 2021; a mixed corpus including excerpts from the Library attributed to Apollodorus and from Strabo’s Geography,
	annotated by Chiara Palladino; Book 1 of Xenophon’s Anabasis, created by Thomas Visser; and Demos- thenes’ Against Neaira,
	created by Rachel Milio.

	### Training Dataset
	\| \| Person \| Location \| NORP \|MISC \|
	\|----------------\|------------------\|-------------------\|-------------------\|-------------------\|
	\| Odyssey \| 2.469 \| 698 \| 0 \|0\|
	\| Deipnosophists \| 14.921 \| 2.699 \| 5.110 \|3.060\|
	\| Pausanias \| 10.205 \| 8.670 \| 4.972 \|0\|
	\| Other Datasets \| 3.283 \| 2.040 \| 1.089 \|
	\| Total \| 30.878 \| 14.107 \| 11.171 \|3.060\|

	---
	### Validation Dataset
	\| \| Person \| Location \| NORP \|MISC \|
	\|----------------\|------------------\|-------------------\|-------------------\|
	\| Xenophon \| 1.190 \| 796 \| 857 \|0\|


	# Results
	\| Class \| Metric \| Test \| Validation \|
	\|---------\|-----------\|--------\|--------\|
	\| LOC \| precision \| 82.92% \| 87.10% \|
	\| \| recall \| 81.30% \| 87.10% \|
	\| \| f1 \| 82.11% \| 87.10% \|
	\| MISC \| precision \| 80.43% \| 0 \|
	\| \| recall \| 70.04% \| 0 \|
	\| \| f1 \| 74.87% \| 0 \|
	\| NORP \| precision \| 87.10% \| 92.82% \|
	\| \| recall \| 90.81% \| 93.42% \|
	\| \| f1 \| 88.92% \| 93.12% \|
	\| PER \| precision \| 92.61% \| 95.52% \|
	\| \| recall \| 92.94% \| 95.21% \|
	\| \| f1 \| 92.77% \| 95.37% \|
	\| Overall \| precision \| 88.92% \| 92.63% \|
	\| \| recall \| 88.82% \| 92.79% \|
	\| \| f1 \| 88.87% \| 92.71% \|
	\| \| Accuracy \| 97.28% \| 98.42% \|



	# Usage

	```python
	from transformers import pipeline

	# create pipeline for NER
	ner = pipeline('ner', model="UGARIT/grc-ner-bert", aggregation_strategy = 'first')
	ner("ταῦτα εἴπας ὁ Ἀλέξανδρος παρίζει Πέρσῃ ἀνδρὶ ἄνδρα Μακεδόνα ὡς γυναῖκα τῷ λόγῳ · οἳ δέ , ἐπείτε σφέων οἱ Πέρσαι ψαύειν ἐπειρῶντο , διεργάζοντο αὐτούς .")
	```
	Output
	```
	[{'entity_group': 'PER',
	'score': 0.9999349,
	'word': 'αλεξανδρος',
	'start': 14,
	'end': 24},
	{'entity_group': 'NORP',
	'score': 0.9369563,
	'word': 'περση',
	'start': 33,
	'end': 38},
	{'entity_group': 'NORP',
	'score': 0.60742134,
	'word': 'μακεδονα',
	'start': 51,
	'end': 59},
	{'entity_group': 'NORP',
	'score': 0.9900457,
	'word': 'περσαι',
	'start': 105,
	'end': 111}]
	```