team-data-ktzh
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -44,13 +44,12 @@ entities = model.predict("Hans Meier aus Dielsdorf vertritt im Kantonsrat die FD
|
|
44 |
- **Encoder:** [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) (ELECTRA Large)
|
45 |
- **Maximum Sequence Length:** 256 tokens
|
46 |
- **Maximum Entity Length:** 8 words
|
47 |
-
- **Training Dataset:** see https:// TODO
|
48 |
- **Language:** de
|
49 |
- **License:** MIT
|
50 |
|
51 |
### Model Sources
|
52 |
-
- **Training
|
53 |
-
- **SpanMarker:** [
|
54 |
|
55 |
### Model Labels
|
56 |
| Label | Examples |
|
@@ -96,7 +95,7 @@ Please note that this is released strictly as a task-bound model for the purpose
|
|
96 |
|
97 |
### Recommendations
|
98 |
|
99 |
-
The original XML documents of the training set can be found here
|
100 |
|
101 |
## Training Details
|
102 |
|
|
|
44 |
- **Encoder:** [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) (ELECTRA Large)
|
45 |
- **Maximum Sequence Length:** 256 tokens
|
46 |
- **Maximum Entity Length:** 8 words
|
|
|
47 |
- **Language:** de
|
48 |
- **License:** MIT
|
49 |
|
50 |
### Model Sources
|
51 |
+
- **Training data:** [GitHub](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/data/training_data)
|
52 |
+
- **SpanMarker:** [GitHub](https://github.com/tomaarsen/SpanMarkerNER)
|
53 |
|
54 |
### Model Labels
|
55 |
| Label | Examples |
|
|
|
95 |
|
96 |
### Recommendations
|
97 |
|
98 |
+
The original XML documents of the training set can be found [here](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/data/training_data). The annotations may be freely modified to tailor the model to an alternative use case. Note that [a modified TEI Publisher](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/ner_tei-publisher-app) and [this Jupyter notebook](https://github.com/machinelearningZH/named-entity-recognition_staatsarchiv/tree/main/notebooks/get_training_data) are required to generate a Huggingface Dataset.
|
99 |
|
100 |
## Training Details
|
101 |
|