kimsiun commited on
Commit
7124c28
·
verified ·
1 Parent(s): e688cc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -15
README.md CHANGED
@@ -1,15 +1,83 @@
1
-
2
- ---
3
- language: ko
4
- tags:
5
- - bert
6
- - korean-english
7
- - clinical nlp
8
- - pharmacovigilance
9
- - adverse events
10
- license: mit
11
- ---
12
-
13
- # KAERS-BERT
14
-
15
- [... rest of the model card content remains the same ...]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: ko
4
+ tags:
5
+ - bert
6
+ - korean-english
7
+ - clinical nlp
8
+ - pharmacovigilance
9
+ - adverse events
10
+ license: mit
11
+ ---
12
+ ---
13
+ language: ko
14
+ tags:
15
+ - bert
16
+ - korean
17
+ - clinical nlp
18
+ - pharmacovigilance
19
+ - adverse events
20
+ license: mit
21
+ ---
22
+
23
+ # KAERS-BERT
24
+
25
+ ## Model Description
26
+
27
+ KAERS-BERT is a domain-specific Korean BERT model specialized for clinical text analysis, particularly for processing adverse drug event (ADE) narratives. It was developed by pretraining KoBERT (developed by SK Telecom) using 1.2 million ADE narratives reported through the Korea Adverse Event Reporting System (KAERS) between January 2015 and December 2019.
28
+
29
+ The model is specifically designed to handle clinical texts where code-switching between Korean and English is frequent, making it particularly effective for processing medical terms and abbreviations in a bilingual context.
30
+
31
+ ## Key Features
32
+
33
+ - Specialized in clinical and pharmaceutical domain text
34
+ - Handles Korean-English code-switching common in medical texts
35
+ - Optimized for processing adverse drug event narratives
36
+ - Built upon KoBERT architecture with domain-specific pretraining
37
+
38
+ ## Training Data
39
+
40
+ The model was pretrained on:
41
+ - 1.2 million ADE narratives from KAERS
42
+ - Training data specifically focused on 'disease history in detail' and 'adverse event in detail' sections
43
+ - Masked language modeling with 15% token masking rate
44
+ - Maximum sequence length of 200
45
+ - Learning rate: 5×10^-5
46
+
47
+ ## Performance
48
+
49
+ The model demonstrated strong performance in various NLP tasks related to drug safety information extraction:
50
+ - Named Entity Recognition (NER): 83.81% F1-score
51
+ - Sentence Extraction: 76.62% F1-score
52
+ - Relation Extraction: 64.37% F1-score (weighted)
53
+ - Label Classification:
54
+ - 'Occurred' Label: 81.33% F1-score
55
+ - 'Concerned' Label: 77.62% F1-score
56
+
57
+ When applied to the KAERS database, the model achieved an average increase of 3.24% in data completeness for structured data fields.
58
+
59
+ ## Intended Use
60
+
61
+ This model is designed for:
62
+ - Extracting drug safety information from clinical narratives
63
+ - Processing Korean medical texts with English medical terminology
64
+ - Supporting pharmacovigilance activities
65
+ - Improving data quality in adverse event reporting systems
66
+
67
+ ## Limitations
68
+
69
+ - The model is specifically trained on adverse event narratives and may not generalize well to other clinical domains
70
+ - Performance may vary for texts significantly different from KAERS narratives
71
+ - The model works best with Korean clinical texts containing English medical terminology
72
+
73
+ ## Citation
74
+
75
+ ```bibtex
76
+ @article{kim2023automatic,
77
+ title={Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques},
78
+ author={Kim, Siun and Kang, Taegwan and Chung, Tae Kyu and Choi, Yoona and Hong, YeSol and Jung, Kyomin and Lee, Howard},
79
+ journal={Drug Safety},
80
+ volume={46},
81
+ pages={781--795},
82
+ year={2023}
83
+ }