readme: finally license under Apache 2.0 \o/

2443a7d verified 4 months ago

4.38 kB

	---
	license: apache-2.0
	language:
	- en
	- de
	- fr
	- fi
	- sv
	- nl
	- nb
	- nn
	- 'no'
	---

	# hmTEAMS

	[![🤗](https://github.com/stefan-it/hmTEAMS/raw/main/logo.jpeg "🤗")](https://github.com/stefan-it/hmTEAMS)

	Historic Multilingual and Monolingual [TEAMS](https://aclanthology.org/2021.findings-acl.219/) Models.
	The following languages are covered:

	* English (British Library Corpus - Books)
	* German (Europeana Newspaper)
	* French (Europeana Newspaper)
	* Finnish (Europeana Newspaper, Digilib)
	* Swedish (Europeana Newspaper, Digilib)
	* Dutch (Delpher Corpus)
	* Norwegian (NCC Corpus)

	# Architecture

	We pretrain a "Training ELECTRA Augmented with Multi-word Selection"
	([TEAMS](https://aclanthology.org/2021.findings-acl.219/)) model:

	![hmTEAMS Overview](https://github.com/stefan-it/hmTEAMS/raw/main/hmteams_overview.svg)

	# Results

	We perform experiments on various historic NER datasets, such as HIPE-2022 or ICDAR Europeana.
	All details incl. hyper-parameters can be found [here](https://github.com/stefan-it/hmTEAMS/tree/main/bench).

	## Small Benchmark

	We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
	The following table shows an overview of used datasets.

	\| Language \| Dataset \| Additional Dataset \|
	\|----------\|--------------------------------------------------------------------------------------------------\|----------------------------------------------------------------------------------\|
	\| English \| [AjMC](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md) \| - \|
	\| German \| [AjMC](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md) \| - \|
	\| French \| [AjMC](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md) \| [ICDAR-Europeana](https://github.com/stefan-it/historic-domain-adaptation-icdar) \|
	\| Finnish \| [NewsEye](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md) \| - \|
	\| Swedish \| [NewsEye](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md) \| - \|
	\| Dutch \| [ICDAR-Europeana](https://github.com/stefan-it/historic-domain-adaptation-icdar) \| - \|

	# Results

	\| Model \| English AjMC \| German AjMC \| French AjMC \| Finnish NewsEye \| Swedish NewsEye \| Dutch ICDAR \| French ICDAR \| Avg. \|
	\|----------------------------------------------------------------------------------------\|--------------\|--------------\|--------------\|-----------------\|-----------------\|--------------\|--------------\|-----------\|
	\| hmBERT (32k) [Schweter et al.](https://ceur-ws.org/Vol-3180/paper-87.pdf) \| 85.36 ± 0.94 \| 89.08 ± 0.09 \| 85.10 ± 0.60 \| 77.28 ± 0.37 \| 82.85 ± 0.83 \| 82.11 ± 0.61 \| 77.21 ± 0.16 \| 82.71 \|
	\| hmTEAMS (Ours) \| 86.41 ± 0.36 \| 88.64 ± 0.42 \| 85.41 ± 0.67 \| 79.27 ± 1.88 \| 82.78 ± 0.60 \| 88.21 ± 0.39 \| 78.03 ± 0.39 \| 84.11 \|

	# Release

	Our pretrained hmTEAMS model can be obtained from the Hugging Face Model Hub:

	* [hmTEAMS Discriminator (this model)](https://huggingface.co/hmteams/teams-base-historic-multilingual-discriminator)
	* [hmTEAMS Generator](https://huggingface.co/hmteams/teams-base-historic-multilingual-generator)

	# Acknowledgements

	We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
	[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historic Language Models.

	Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
	Many Thanks for providing access to the TPUs ❤️