BERTić-COMtext-SR-legal-NER-ijekavica
BERTić-COMtext-SR-legal-NER-ijekavica is a variant of the BERTić model, fine-tuned on the task of named entity recognition in Serbian legal texts written in the Ijekavian pronunciation. The model was fine-tuned for 20 epochs on the Ijekavian variant of the COMtext.SR.legal dataset.
Benchmarking
This model was evaluated on the task of named entity recognition in Serbian legal texts. The model uses a newly developed named entity schema consisting of 21 entity types, tailored for the domain of Serbian legal texts, and encoded according the the IOB2 standard. The full entity list is available on the COMtext.SR GitHub repository.
This model was compared with SrBERTa, a model specially trained on Serbian legal texts, fine-tuned for 20 epochs for named entity recognition using the Ijekavian variant of the COMtext.SR.legal corpus of legal texts. Token-level accuracy and F1 (macro-averaged and per-class) were used as evaluation metrics and gold tokenized text was taken as input.
Two evaluation settings for both models were considered:
- Default - only the entity type portion of the NE tag is considered, effectively ignoring the "B-" and "I-" prefixes
- Strict - the entire NE tag is considered
For the strict setting, per-class results are given separately for each B-CLASS and I-CLASS tag. In addition, macro-averaged F1 scores are presented in two variants - one where the O (outside) class is ignored, and another where it is treated equally to other named entity classes.
BERTić-COMtext-SR-legal-NER-ijekavica and SrBERTa were fine-tuned and evaluated on the COMtext.SR.legal.ijekavica corpus using 10-fold CV.
The code and data to run these experiments is available on the COMtext.SR GitHub repository.
Results
Metrics | BERTić-COMtext-SR-legal-NER-ijekavica (default) | BERTić-COMtext-SR-legal-NER-ijekavica (strict) | SrBERTa (default) | SrBERTa (strict) |
---|---|---|---|---|
Accuracy | 0.9839 | 0.9828 | 0.9688 | 0.9672 |
Macro F1 (with O) | 0.8563 | 0.8474 | 0.7479 | 0.7225 |
Macro F1 (without O) | 0.8403 | 0.8396 | 0.7328 | 0.7128 |
Per-class F1 | ||||
PER | 0.9856 | 0.9780 / 0.9765 | 0.8720 | 0.8177 / 0.9068 |
LOC | 0.8933 | 0.9003 / 0.8134 | 0.6670 | 0.6666 / 0.5995 |
ADR | 0.9253 | 0.9132 / 0.9161 | 0.8554 | 0.7806 / 0.8393 |
COURT | 0.9427 | 0.9515 / 0.9340 | 0.8488 | 0.8417 / 0.8524 |
INST | 0.8044 | 0.8152 / 0.8261 | 0.6793 | 0.6376 / 0.6420 |
COM | 0.7225 | 0.7326 / 0.6782 | 0.4815 | 0.3632 / 0.4767 |
OTHORG | 0.4670 | 0.3436 / 0.6080 | 0.2557 | 0.0609 / 0.3664 |
LAW | 0.9523 | 0.9463 / 0.9511 | 0.9147 | 0.8868 / 0.9128 |
REF | 0.8125 | 0.7602 / 0.7939 | 0.7564 | 0.6246 / 0.7485 |
IDPER | 1.0000 | 1.0000 / N/A | 1.0000 | 1.0000 / N/A |
IDCOM | 0.9722 | 0.9722 / N/A | 0.9667 | 0.9667 / N/A |
IDTAX | 1.0000 | 1.0000 / N/A | 0.9815 | 0.9815 / N/A |
NUMACC | 1.0000 | 1.0000 / N/A | 0.6667 | 0.6667 / N/A |
NUMDOC | 0.8148 | 0.8148 / N/A | 0.3333 | 0.3333 / N/A |
NUMCAR | 0.6222 | 0.5397 / 0.5000 | 0.4545 | 0.5000 / 0.0000 |
NUMPLOT | 0.7088 | 0.7088 / N/A | 0.5479 | 0.5479 / N/A |
IDOTH | 0.5949 | 0.5949 / N/A | 0.4776 | 0.4776 / N/A |
CONTACT | 0.8000 | 0.8000 / N/A | 0.0000 | 0.0000 / N/A |
DATE | 0.9664 | 0.9378 / 0.9615 | 0.9547 | 0.9104 / 0.9480 |
MONEY | 0.9741 | 0.9613 / 0.9715 | 0.8825 | 0.8854 / 0.8851 |
MISC | 0.4183 | 0.4213 / 0.3874 | 0.1814 | 0.1492 / 0.1694 |
O | 0.9942 | 0.9942 | 0.9872 | 0.9872 |
- Downloads last month
- 0
Model tree for ICEF-NLP/bcms-bertic-comtext-sr-legal-ner-ijekavica
Base model
classla/bcms-bertic