team-data-ktzh
/

span-marker-ktzh-stazh

Token Classification

named-entity-recognition

Model card Files Files and versions Community

team-data-ktzh commited on May 3, 2024

Commit

c879aa2

·

verified ·

1 Parent(s): 9d6ef32

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -44,13 +44,12 @@ entities = model.predict("Hans Meier aus Dielsdorf vertritt im Kantonsrat die FD
 - **Encoder:** [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) (ELECTRA Large)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
-- **Training Dataset:** see https:// TODO
 - **Language:** de
 - **License:** MIT
 ### Model Sources
-- **Training repository (TODO):** []()
-- **SpanMarker:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 ### Model Labels
 | Label | Examples                                                                                              |
@@ -96,7 +95,7 @@ Please note that this is released strictly as a task-bound model for the purpose
 ### Recommendations
-The original XML documents of the training set can be found here: TODO. The annotations may be freely modified to tailor the model to an alternative use case. Note that the modified TEI Publisher version in TODO and the notebook at TODO are required to generate a Huggingface Dataset.
 ## Training Details

 - **Encoder:** [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) (ELECTRA Large)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
 - **Language:** de
 - **License:** MIT
 ### Model Sources
+- **Training data:** [GitHub](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/data/training_data)
+- **SpanMarker:** [GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 ### Model Labels
 | Label | Examples                                                                                              |
 ### Recommendations
+The original XML documents of the training set can be found [here](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/data/training_data). The annotations may be freely modified to tailor the model to an alternative use case. Note that [a modified TEI Publisher](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/ner_tei-publisher-app) and [this Jupyter notebook](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/notebooks/get_training_data) are required to generate a Huggingface Dataset.
 ## Training Details