Ozan Oktay commited on
Commit
896d61e
·
1 Parent(s): 6fcad0b

Update Readme -- Add information about BioViL Resnet50.

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -27,16 +27,22 @@ First, we pretrain [**CXR-BERT-general**](https://huggingface.co/microsoft/Biome
27
  | CXR-BERT-general | [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | PubMed & MIMIC | Pretrained for biomedical literature and clinical domains |
28
  | CXR-BERT-specialized (after multi-modal training) | [microsoft/BiomedVLP-CXR-BERT-specialized](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized) | PubMed & MIMIC | Pretrained for chest X-ray domain |
29
 
 
 
 
 
30
  ## Citation
31
 
32
- ```
 
 
33
  @misc{https://doi.org/10.48550/arxiv.2204.09817,
34
- title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
 
35
  author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
 
36
  publisher = {arXiv},
37
  year = {2022},
38
- url = {https://arxiv.org/abs/2204.09817},
39
- doi = {10.48550/ARXIV.2204.09817},
40
  }
41
  ```
42
 
@@ -127,9 +133,6 @@ This model was developed using English corpora, and thus can be considered Engli
127
 
128
  ## Further information
129
 
130
- Please refer to the corresponding paper, [Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing](https://arxiv.org/abs/2204.09817) for additional details on the model training and evaluation.
131
-
132
- For additional inference pipelines with CXR-BERT, please refer to the [HI-ML-Multimodal GitHub](https://hi-ml.readthedocs.io/en/latest/multimodal.html) repository. The associated source files will soon be accessible through this link.
133
-
134
-
135
 
 
 
27
  | CXR-BERT-general | [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) | PubMed & MIMIC | Pretrained for biomedical literature and clinical domains |
28
  | CXR-BERT-specialized (after multi-modal training) | [microsoft/BiomedVLP-CXR-BERT-specialized](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-specialized) | PubMed & MIMIC | Pretrained for chest X-ray domain |
29
 
30
+ ## Image model
31
+
32
+ **CXR-BERT-specialized** is jointly trained with a ResNet-50 image model in a multi-modal contrastive learning framework. Prior to multi-modal learning, the image model is pre-trained on the same set of images in MIMIC-CXR using [SimCLR](https://arxiv.org/abs/2002.05709). The corresponding model definition and its loading functions can be accessed through our [HI-ML-Multimodal](https://github.com/microsoft/hi-ml/blob/main/hi-ml-multimodal/src/health_multimodal/image/model/model.py) GitHub repository. The joint image and text model, namely [BioViL](https://arxiv.org/abs/2204.09817), can be used in phrase grounding applications as shown in this python notebook [example](https://mybinder.org/v2/gh/microsoft/hi-ml/HEAD?labpath=hi-ml-multimodal%2Fnotebooks%2Fphrase_grounding.ipynb). Additionally, please check the [MS-CXR benchmark](https://physionet.org/content/ms-cxr/0.1/) for a more systematic evaluation of joint image and text models in phrase grounding tasks.
33
+
34
  ## Citation
35
 
36
+ The corresponding manuscript is accepted to be presented at the [**European Conference on Computer Vision (ECCV) 2022.**](https://eccv2022.ecva.net/)
37
+
38
+ ```bibtex
39
  @misc{https://doi.org/10.48550/arxiv.2204.09817,
40
+ doi = {10.48550/ARXIV.2204.09817},
41
+ url = {https://arxiv.org/abs/2204.09817},
42
  author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
43
+ title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
44
  publisher = {arXiv},
45
  year = {2022},
 
 
46
  }
47
  ```
48
 
 
133
 
134
  ## Further information
135
 
136
+ Please refer to the corresponding paper, ["Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing", ECCV'22](https://arxiv.org/abs/2204.09817) for additional details on the model training and evaluation.
 
 
 
 
137
 
138
+ For additional inference pipelines with CXR-BERT, please refer to the [HI-ML-Multimodal GitHub](https://hi-ml.readthedocs.io/en/latest/multimodal.html) repository.