--- license: cc-by-nc-nd-4.0 language: - en tags: - histology - pathology - vision - pytorch - self-supervised - vit extra_gated_prompt: >- This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Please note that the primary email used to sign up for your Hugging Face account must match your institutional email to receive approval. By downloading the model, you attest that all information (affiliation, research use) is correct and up-to-date. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author. extra_gated_fields: Full name (first and last): text Current affiliation (no abbreviations): text Type of Affiliation: type: select options: - Academia - Industry - label: Other value: other Current and official institutional email (**this must match your primary email in your Hugging Face account, @gmail/@hotmail/@qq email domains will be denied**): text Please explain your intended research use: text I agree to all terms outlined above: checkbox I agree to use this model for non-commercial, academic purposes only: checkbox I agree not to distribute the model, if another user within your organization wishes to use the TITAN model, they must register as an individual user: checkbox metrics: - accuracy pipeline_tag: image-feature-extraction --- # Model Card for TITAN-preview \[[Preprint](https://arxiv.org/abs/2411.19666)\] | \[[Github Repo](https://github.com/mahmoodlab/TITAN)\] | \[[Cite](#bibtex)\] ## What is TITAN? **TITAN** (**T**ransformer-based pathology **I**mage and **T**ext **A**lignment **N**etwork) is a multimodal whole-slide foundation model pre-trained using visual self-supervised learning and vision-language alignment. It leverages 335,645 whole-slide images (WSIs) from a diverse set of internally collected neoplastic, infectious, and inflammatory cases at Mass General Brigham. Additionally, TITAN utilizes over 182,000 pathology reports and more than 423,000 synthetic captions generated by [PathChat](https://www.nature.com/articles/s41586-024-07618-3), our pathology co-pilot. TITAN's slide embeddings achieve state-of-the-art performance on diverse downstream tasks, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation. This is a preview and we will bring you further updates and improvements. ![](https://huggingface.co/MahmoodLab/TITAN/resolve/main/titan.jpg) ## Requesting Access As mentioned in the gated prompt, you must agree to the outlined terms of use, _**with the primary email for your HuggingFace account matching your institutional email**_. If your primary email is a personal email (@gmail/@hotmail/@qq) **your request will be denied**. To fix this, you can: (1) add your official institutional email to your HF account, and confirm your email address to verify, and (2) set your institutional email as your primary email in your HF account. Other reasons for your request access being denied include other mistakes in the form submitted, for example: full name includes abbreviations, affiliation is not spelled out, the described research use is not sufficient, or email domain address not recognized. ## Model Description - **Developed by:** Mahmood Lab AI for Pathology @ Harvard/BWH - **Model type:** Pretrained vision-language encoders - **Pretraining dataset:** Mass-340K, sourced from private histology collections (BWH / MGH), in addition to slides from the public GTEx consortium. - **Repository:** https://github.com/mahmoodlab/TITAN - **Preprint:** https://arxiv.org/abs/2411.19666 - **License:** CC-BY-NC-ND-4.0 ### Requirements ``` torch==2.0.1 timm==1.0.3 einops==0.6.1 einops-exts==0.0.4 transformers==4.46.0 ``` ### Model Usage TITAN-preview is a vision-lanuage model trained on CONCH v1.5 patch features with patch size of 512x512 pixels at 20x magnification. Following authentication (using ```huggingface_hub```), both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) can be loaded using the commands below: ```python from huggingface_hub import login from transformers import AutoModel login() # login with your User Access Token, found at https://huggingface.co/settings/tokens titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True) conch, eval_transform = titan.return_conch() ``` You can directly use TITAN-preview for slide-level feature extaction. TITAN builds a feature grids from CONCH v1.5 patch features using the coordinates and the distance between the patches. As patch coordinates are always saved at the slides' level 0 magnification, TITAN takes patch_size_lv0 which represents the distance between two adjacent patches at level 0 magnification. It is 1024 if slide is 40x, or 512 if slide is 20x. We have this info saved in our demo TCGA features. Slide-level feature extraction can be done in the following way: ```python import h5py from transformers import AutoModel # load model titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True) # load CONCH v1.5 demo features h5_path = 'TCGA_demo_features/TCGA-RM-A68W-01Z-00-DX1.4E62E4F4-415C-46EB-A6C8-45BA14E82708.h5' with h5py.File(h5_path, 'r') as file: features = torch.from_numpy(file['features'][:]) coords = torch.from_numpy(file['coords'][:]) patch_size_lv0 = file['coords'].attrs['patch_size_level0'] # extract slide embedding with torch.autocast('cuda', torch.float16), torch.inference_mode(): slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0) ``` These pre-extracted features can then be used for slide-level classification (via linear probing), retrieval (via l2 distance), and other machine learning settings, without task-specific finetuning. We also released all TCGA TITAN-preview features in `TCGA_TITAN_features.pkl`. We demonstrated more detailed linear probe and zero-shot evaluation in our [github](https://github.com/mahmoodlab/TITAN/tree/main/notebooks). ## License and Terms of Use This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author. ## Contact For any additional questions or comments, contact Faisal Mahmood (`faisalmahmood@bwh.harvard.edu`), \ Tong Ding (`tong_ding@g.harvard.edu`), \ Sophia J. Wagner (`sophia.wagner@helmholtz-munich.de`), \ Andrew H. Song (`asong@bwh.harvard.edu`), \ or Richard J. Chen (`richardchen@g.harvard.edu`), ## Acknowledgements The project was built on top of amazing repositories such as [ViT](https://github.com/google-research/big_vision), [iBOT](https://github.com/bytedance/ibot/tree/main), [OpenClip](https://github.com/mlfoundations/open_clip), [LGSSL](https://github.com/mbanani/lgssl), and [Timm](https://github.com/huggingface/pytorch-image-models/) (ViT model implementation). We thank the authors and developers for their contribution. ## BibTeX If you found our work useful in your research, please consider citing our work at: Ding, T.\*, Wagner S.J.\*, Song, A.H.\*, Chen, R.J.\* et al. Multimodal Whole Slide Foundation Model for Pathology, Arxiv, 2024 ``` @misc{ding2024multimodalslidefoundationmodel, title={Multimodal Whole Slide Foundation Model for Pathology}, author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro-Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood}, year={2024}, eprint={2411.19666}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2411.19666}, } ```