--- license: mit --- # SimMIM: A Simple Framework for Masked Image Modeling This repository is primarily used for storing [SimMIM](https://arxiv.org/abs/2111.09886) pretrained [Swin-V2](https://arxiv.org/abs/2111.09883) models, which are utilized in the ["On Data Scaling in Masked Image Modeling"](https://arxiv.org/abs/2206.04664) study. If you have any questions about SimMIM or the Data Scaling study, please file an issue in this repository or contact [`xie.zda@outlook.com`](mailto:xie.zda@outlook.com) directly. Please note that the [SimMIM](https://github.com/microsoft/SimMIM) and [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) repositories managed by Microsoft are no longer within my scope. ## SimMIM Pretrained Swin-V2 Models You can use the direct link below to download the checkpoints, or use the [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/guides/download) library to download checkpoints using Python. - **Model size** only includes the backbone weights and excludes weights in the decoders/classification heads. - **Batch size** for all models is set to 2048. - **Validation loss** is calculated on the ImageNet-1K validation set. - **Fine-tuned acc@1** refers to the top-1 accuracy on the ImageNet-1K validation set after fine-tuning. | name | model size | pre-train dataset | pre-train iterations | validation loss | fine-tuned acc@1 | pre-trained model | fine-tuned model | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | SwinV2-Small | 49M | ImageNet-1K 10% | 125k | 0.4820 | 82.69 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper10_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper10_125k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 10% | 250k | 0.4961 | 83.11 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper10_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper10_250k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 10% | 500k | 0.5115 | 83.17 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper10_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper10_500k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 20% | 125k | 0.4751 | 83.05 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper20_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper20_125k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 20% | 250k | 0.4722 | 83.56 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper20_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper20_250k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 20% | 500k | 0.4734 | 83.75 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper20_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper20_500k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 50% | 125k | 0.4732 | 83.04 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper50_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper50_125k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 50% | 250k | 0.4681 | 83.67 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper50_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper50_250k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K 50% | 500k | 0.4646 | 83.96 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1kper50_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1kper50_500k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K | 125k | 0.4728 | 82.92 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1k_125k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K | 250k | 0.4674 | 83.66 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1k_250k.pth?download=true) | | SwinV2-Small | 49M | ImageNet-1K | 500k | 0.4641 | 84.08 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_small_1k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_small_1k_500k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 10% | 125k | 0.4822 | 83.33 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper10_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper10_125k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 10% | 250k | 0.4997 | 83.60 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper10_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper10_250k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 10% | 500k | 0.5112 | 83.41 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper10_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper10_500k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 20% | 125k | 0.4703 | 83.86 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper20_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper20_125k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 20% | 250k | 0.4679 | 84.37 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper20_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper20_250k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 20% | 500k | 0.4711 | 84.61 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper20_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper20_500k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 50% | 125k | 0.4683 | 84.04 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper50_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper50_125k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 50% | 250k | 0.4633 | 84.57 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper50_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper50_250k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K 50% | 500k | 0.4598 | 84.95 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1kper50_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1kper50_500k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K | 125k | 0.4680 | 84.13 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1k_125k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K | 250k | 0.4626 | 84.65 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1k_250k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-1K | 500k | 0.4588 | 85.04 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_1k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_1k_500k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-22K | 125k | 0.4695 | 84.11 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_22k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_22k_125k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-22K | 250k | 0.4649 | 84.57 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_22k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_22k_250k.pth?download=true) | | SwinV2-Base | 87M | ImageNet-22K | 500k | 0.4614 | 85.11 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_base_22k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_base_22k_500k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 10% | 125k | 0.4995 | 83.69 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper10_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper10_125k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 10% | 250k | 0.5140 | 83.66 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper10_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper10_250k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 10% | 500k | 0.5150 | 83.50 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper10_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper10_500k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 20% | 125k | 0.4675 | 84.38 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper20_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper20_125k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 20% | 250k | 0.4746 | 84.71 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper20_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper20_250k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 20% | 500k | 0.4960 | 84.59 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper20_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper20_500k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 50% | 125k | 0.4622 | 84.78 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper50_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper50_125k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 50% | 250k | 0.4566 | 85.38 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper50_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper50_250k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K 50% | 500k | 0.4530 | 85.80 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1kper50_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1kper50_500k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K | 125k | 0.4611 | 84.98 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1k_125k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K | 250k | 0.4552 | 85.45 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1k_250k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-1K | 500k | 0.4507 | 85.91 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_1k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_1k_500k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-22K | 125k | 0.4649 | 84.61 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_22k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_22k_125k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-22K | 250k | 0.4586 | 85.39 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_22k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_22k_250k.pth?download=true) | | SwinV2-Large | 195M | ImageNet-22K | 500k | 0.4536 | 85.81 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_large_22k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_large_22k_500k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 20% | 125k | 0.4789 | 84.35 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper20_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper20_125k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 20% | 250k | 0.5038 | 84.16 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper20_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper20_250k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 20% | 500k | 0.5071 | 83.44 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper20_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper20_500k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 50% | 125k | 0.4549 | 85.09 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper50_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper50_125k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 50% | 250k | 0.4511 | 85.64 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper50_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper50_250k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K 50% | 500k | 0.4559 | 85.69 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1kper50_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1kper50_500k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K | 125k | 0.4531 | 85.23 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1k_125k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K | 250k | 0.4464 | 85.90 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1k_250k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-1K | 500k | 0.4416 | 86.34 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_1k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_1k_500k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-22K | 125k | 0.4564 | 85.14 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_22k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_22k_125k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-22K | 250k | 0.4499 | 85.86 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_22k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_22k_250k.pth?download=true) | | SwinV2-Huge | 655M | ImageNet-22K | 500k | 0.4444 | 86.27 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_huge_22k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_huge_22k_500k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K 50% | 125k | 0.4534 | 85.44 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1kper50_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1kper50_125k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K 50% | 250k | 0.4515 | 85.76 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1kper50_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1kper50_250k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K 50% | 500k | 0.4719 | 85.51 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1kper50_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1kper50_500k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K | 125k | 0.4513 | 85.57 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1k_125k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K | 250k | 0.4442 | 86.12 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1k_250k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-1K | 500k | 0.4395 | 86.46 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_1k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_1k_500k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-22K | 125k | 0.4544 | 85.39 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_22k_125k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_22k_125k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-22K | 250k | 0.4475 | 85.96 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_22k_250k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_22k_250k.pth?download=true) | | SwinV2-giant | 1.06B | ImageNet-22K | 500k | 0.4416 | 86.53 | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_pretrain_models/swinv2_giant_22k_500k.pth?download=true) | [huggingface](https://huggingface.co/zdaxie/SimMIM/resolve/main/simmim_swinv2_finetune_models/finetune_swinv2_giant_22k_500k.pth?download=true) | ## Citations ### Citing SimMIM ``` @inproceedings{xie2021simmim, title={SimMIM: A Simple Framework for Masked Image Modeling}, author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han}, booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022} } ``` ### Citing "On Data Scaling in Masked Image Modeling" ``` @article{xie2022data, title={On Data Scaling in Masked Image Modeling}, author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Wei, Yixuan and Dai, Qi and Hu, Han}, journal={arXiv preprint arXiv:2206.04664}, year={2022} } ``` ### Citing Swin V2 ``` @inproceedings{liu2021swinv2, title={Swin Transformer V2: Scaling Up Capacity and Resolution}, author={Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo}, booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022} } ```