--- license: cc-by-nc-4.0 language: "en" tags: - longformer - clinical - biomedical --- **KEPTlongfomer** is a medical knowledge enhanced version of Longformer that was further pre-trained using [contrastive learning](https://arxiv.org/pdf/2210.03304.pdf). ### Pre-training We initialized this model from RoBERTa-base-PM-M3-Voc-distill from Facebook [bio-lm](https://github.com/facebookresearch/bio-lm/). And then pretrained with Hierarchical Self-Alignment Pretrain (HSAP) using Knowledge Graph UMLS. This includes (a) Hierarchy, (b) Synonym, (c) Abbreviation. For more info, see section 3.3 in [paper](https://arxiv.org/pdf/2210.03304.pdf). The learning rate was 5e-5, weight decay was 0.01, adam epsilon was 1e-5. ### Usage Try the following sentence with Fill-Mask task on the right. The sentence masks token "cardiac". 74F with HTN, HLD, DM2, newly diagnosed atrial fibrillation in October who was transferred to hospital for catheterization after presentation there with syncopal episode. Or load the model directly from Transformers: ``` from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("whaleloops/KEPTlongformer-PMM3") config = AutoConfig.from_pretrained("whaleloops/KEPTlongformer-PMM3") model = AutoModelForMaskedLM.from_pretrained("whaleloops/KEPTlongformer-PMM3", config=config) ``` See our [github](https://github.com/whaleloops/KEPT/tree/rerank300) for how to use this with prompts on auto ICD coding. With the following result: | Metric | Score | | ------------- | ------------- | |rec_micro| =0.5844294992252652| |rec_macro| =0.12471916602840005| |rec_at_8| =0.4138093882408751| |rec_at_75| =0.8581874197033126| |rec_at_50| =0.8109877644497351| |rec_at_5| =0.2923155353947738| |rec_at_15| =0.586890060777621| |prec_micro| =0.6537291416981642| |prec_macro| =0.1382069689951297| |prec_at_8| =0.7835112692763938| |prec_at_75| =0.20033214709371291| |prec_at_50| =0.2810260972716489| |prec_at_5| =0.8551008303677343| |prec_at_15| =0.6288256227758008| |f1_micro| =0.6171399726721254| |f1_macro| =0.13111711325953157| |f1_at_8| =0.54158310388029| |f1_at_75| =0.324835806140454| |f1_at_50| =0.4174099512237087| |f1_at_5| =0.4356905906241822| |f1_at_15| =0.6071345676658747| |auc_micro| =0.9653561390964384| |auc_macro| =0.8572490224880879| |acc_micro| =0.4462779749767132| |acc_macro| =0.09732882850157536|