File size: 3,168 Bytes
e308a8e
55a20b8
9b9d0fe
 
 
 
120dd09
090e612
470e8fd
55a20b8
e968487
94248d8
120dd09
e308a8e
9b9d0fe
94248d8
9b9d0fe
 
 
 
 
 
94248d8
 
 
 
 
 
 
9b9d0fe
 
 
94248d8
 
 
9b9d0fe
 
94248d8
 
 
 
 
 
 
 
 
 
 
 
 
9b9d0fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
language: en
tags:
- bert
- medical
- clinical
- text-classification
- transformers
- diagnosis
thumbnail: https://core.app.datexis.com/static/paper.png
inference: true
widget:
- text: Patient with hypertension presents to ICU.
---

# CORe Model - Clinical Diagnosis Prediction

## Model description

The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf).
It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective.

This model checkpoint is **fine-tuned on the task of diagnosis prediction**.
The model expects patient admission notes as input and outputs multi-label ICD9-code predictions.

#### Model Predictions
The model makes predictions on a total of 9237 labels. These contain 3- and 4-digit ICD9 codes and textual descriptions of these codes. The 4-digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 _ICD+: Incorporation of ICD Hierarchy_ in our paper). We recommend to only use the **3-digit code predictions at inference time**, because only those have been evaluated in our work.

#### How to use CORe Diagnosis Prediction

You can load the model via the transformers library:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
```

The following code shows an inference example:

```
input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]
```
Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label.


### More Information

For all the details about CORe and contact info, please visit [CORe.app.datexis.com](http://core.app.datexis.com/).

### Cite

```bibtex
@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}
```