yeshpanovrustem
commited on
Commit
·
05f13d5
1
Parent(s):
4c952a9
Update README.md
Browse files
README.md
CHANGED
@@ -27,61 +27,39 @@ datasets:
|
|
27 |
# A Named Entity Recognition Model for Kazakh
|
28 |
- The model was inspired by the [LREC 2022](https://lrec2022.lrec-conf.org/en/) paper [*KazNERD: Kazakh Named Entity Recognition Dataset*](https://aclanthology.org/2022.lrec-1.44).
|
29 |
- The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
|
30 |
-
##
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
|
|
|
|
|
35 |
| :---: | :---: | :---: | :---: | :---: |
|
36 |
-
|
|
37 |
-
|
|
38 |
-
|
|
39 |
-
|
40 |
-
|
|
41 |
-
|
|
42 |
-
| **
|
43 |
-
| **
|
44 |
-
| **
|
45 |
-
| **
|
46 |
-
| **
|
47 |
-
| **
|
48 |
-
|
49 |
-
|
50 |
-
|
|
51 |
-
|
|
52 |
-
| **
|
53 |
-
| **
|
54 |
-
| **
|
55 |
-
| **
|
56 |
-
| **
|
57 |
-
| **
|
58 |
-
|
59 |
-
|
60 |
-
| **
|
61 |
-
|
62 |
-
| **
|
63 |
-
| **
|
64 |
-
| **CARDINAL** | 23,135 (21.8%) | 2,878 (21.82%) | 2,789 (21.34%) | 28,802 (21.75%) |
|
65 |
-
| **CONTACT** | 159 (0.15%) | 18 (0.14%) | 20 (0.15%) | 197 (0.15%) |
|
66 |
-
| **DATE** | 20,006 (18.85%) | 2,603 (19.74%) | 2,584 (19.77%) | 25,193 (19.03%) |
|
67 |
-
| **DISEASE** | 1,022 (0.96%) | 121 (0.92%) | 119 (0.91%) | 1,262 (0.95%) |
|
68 |
-
| **EVENT** | 1,331 (1.25%) | 154 (1.17%) | 154 (1.18%) | 1,639 (1.24%) |
|
69 |
-
| **FACILITY** | 1,723 (1.62%) | 178 (1.35%) | 197 (1.51%) | 2,098 (1.58%) |
|
70 |
-
| **GPE** | 13,625 (12.84%) | 1,656 (12.56%) | 1,691 (12.94%) | 16,972 (12.82%) |
|
71 |
-
| **LANGUAGE** | 350 (0.33%) | 47 (0.36%) | 41 (0.31%) | 438 (0.33%) |
|
72 |
-
| **LAW** | 419 (0.39%) | 56 (0.42%) | 55 (0.42%) | 530 (0.40%) |
|
73 |
-
| **LOCATION** | 1,736 (1.64%) | 210 (1.59%) | 208 (1.59%) | 2,154 (1.63%) |
|
74 |
-
| **MISCELLANEOUS** | 191 (0.18%) | 26 (0.2%) | 26 (0.2%) | 243 (0.18%) |
|
75 |
-
| **MONEY** | 3,652 (3.44%) | 455 (3.45%) | 427 (3.27%) | 4,534 (3.42%) |
|
76 |
-
| **NON_HUMAN** | 6 (0.01%) | 1 (0.01%) | 1 (0.01%) | 8 (0.01%) |
|
77 |
-
| **NORP** | 2,929 (2.76%) | 374 (2.84%) | 368 (2.82%) | 3,671 (2.77%) |
|
78 |
-
| **ORDINAL** | 3,054 (2.88%) | 385 (2.92%) | 382 (2.92%) | 3,821 (2.89%) |
|
79 |
-
| **ORGANISATION** | 5,956 (5.61%) | 753 (5.71%) | 718 (5.49%) | 7,427 (5.61%) |
|
80 |
-
| **PERCENTAGE** | 3,357 (3.16%) | 437 (3.31%) | 462 (3.53%) | 4,256 (3.21%) |
|
81 |
-
| **PERSON** | 9,817 (9.25%) | 1,175 (8.91%) | 1,151 (8.81%) | 12,143 (9.17%) |
|
82 |
-
| **POSITION** | 4,844 (4.56%) | 587 (4.45%) | 597 (4.57%) | 6,028 (4.55%) |
|
83 |
-
| **PRODUCT** | 586 (0.55%) | 73 (0.55%) | 75 (0.57%) | 734 (0.55%) |
|
84 |
-
| **PROJECT** | 1,681 (1.58%) | 209 (1.58%) | 206 (1.58%) | 2,096 (1.58%) |
|
85 |
-
| **QUANTITY** | 3,063 (2.89%) | 411 (3.12%) | 403 (3.08%) | 3,877 (2.93%) |
|
86 |
-
| **TIME** | 1,820 (1.71%) | 208 (1.58%) | 220 (1.68%) | 2,248 (1.70%) |
|
87 |
-
| **Total** | **106,148 (100%)** | **13,189 (100%)** | **13,072 (100%)** | **132,409 (100%)** |
|
|
|
27 |
# A Named Entity Recognition Model for Kazakh
|
28 |
- The model was inspired by the [LREC 2022](https://lrec2022.lrec-conf.org/en/) paper [*KazNERD: Kazakh Named Entity Recognition Dataset*](https://aclanthology.org/2022.lrec-1.44).
|
29 |
- The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
|
30 |
+
## Evaluation results on the validation and test sets
|
31 |
+
| | Validation set | | | Test set| |
|
32 |
+
|:---:| :---: | :---: | :---: | :---: | :---: |
|
33 |
+
| **Precision** | **Recall** | **F<sub>1</sub>-score** | **Precision** | **Recall** | **F<sub>1</sub>-score** |
|
34 |
+
| 96.58% | 96.66% | 96.62% | 96.49% | 96.86% | 96.67% |
|
35 |
+
## Model performance for the NE classes of the validation set
|
36 |
+
| NE Class | Precision | Recall | F<sub>1</sub>-score | Support |
|
37 |
| :---: | :---: | :---: | :---: | :---: |
|
38 |
+
| **ADAGE** | 90.00% | 47.37% | 62.07% | 19 |
|
39 |
+
| **ART** | 91.36% | 95.48% | 93.38% | 155 |
|
40 |
+
| **CARDINAL** | 98.44% | 98.37% | 98.40% | 2,878 |
|
41 |
+
| **CONTACT** | 100.00% | 83.33% | 90.91% | 18 |
|
42 |
+
| **DATE** | 97.38% | 97.27% | 97.33% | 2,603 |
|
43 |
+
| **DISEASE** | 96.72% | 97.52% | 97.12% | 121 |
|
44 |
+
| **EVENT** | 83.24% | 93.51% | 88.07% | 154 |
|
45 |
+
| **FACILITY** | 68.95% | 84.83% | 76.07% | 178 |
|
46 |
+
| **GPE** | 98.46% | 96.50% | 97.47% | 1,656 |
|
47 |
+
| **LANGUAGE** | 95.45% | 89.36% | 92.31% | 47 |
|
48 |
+
| **LAW** | 87.50% | 87.50% | 87.50% | 56 |
|
49 |
+
| **LOCATION** | 92.49% | 93.81% | 93.14% | 210 |
|
50 |
+
| **MISCELLANEOUS** | 100.00% | 76.92% | 86.96% | 26 |
|
51 |
+
| **MONEY** | 99.56% | 100.00% | 99.78% | 455 |
|
52 |
+
| **NON_HUMAN** | 0.00% | 0.00% | 0.00% | 1 |
|
53 |
+
| **NORP** | 95.71% | 95.45% | 95.58% | 374 |
|
54 |
+
| **ORDINAL** | 98.14% | 95.84% | 96.98% | 385 |
|
55 |
+
| **ORGANISATION** | 92.19% | 90.97% | 91.58% | 753 |
|
56 |
+
| **PERCENTAGE** | 99.08% | 99.08% | 99.08% | 437 |
|
57 |
+
| **PERSON** | 98.47% | 98.72% | 98.60% | 1,175 |
|
58 |
+
| **POSITION** | 96.15% | 97.79% | 96.96% | 587 |
|
59 |
+
| **PRODUCT** | 89.06% | 78.08% | 83.21% | 73 |
|
60 |
+
| **PROJECT** | 92.13% | 95.22% | 93.65% | 209 |
|
61 |
+
| **QUANTITY** | 97.58% | 98.30% | 97.94% | 411 |
|
62 |
+
| **TIME** | 94.81% | 96.63% | 95.71% | 208 |
|
63 |
+
| **micro avg** | **96.58%** | **96.66%** | **96.62%** | **13,189** |
|
64 |
+
| **macro avg** | **90.12%** | **87.51%** | **88.39%** | **13,189** |
|
65 |
+
| **weighted avg** | **96.67%** | **96.66%** | **96.63%** | **13,189** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|