timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
67c9d29
·
1 Parent(s): 37f73d2

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +108 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,6 +2,112 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
 
 
 
6
  ---
7
- # Model card for flexivit_base.patch30_in21k
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
  ---
10
+ # Model card for flexivit_base.patch30_in21k
11
+
12
+ A FlexiViT image classification model. Trained on ImageNet-1k in JAX by paper authors, ported to PyTorch by Ross Wightman.
13
+
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 102.6
19
+ - GMACs: 19.4
20
+ - Activations (M): 18.9
21
+ - Image size: 240 x 240
22
+ - **Papers:**
23
+ - FlexiViT: One Model for All Patch Sizes: https://arxiv.org/abs/2212.08013
24
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
25
+ - **Dataset:** ImageNet-1k
26
+ - **Original:** https://github.com/google-research/big_vision
27
+
28
+ ## Model Usage
29
+ ### Image Classification
30
+ ```python
31
+ from urllib.request import urlopen
32
+ from PIL import Image
33
+ import timm
34
+
35
+ img = Image.open(urlopen(
36
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
37
+ ))
38
+
39
+ model = timm.create_model('flexivit_base.patch30_in21k', pretrained=True)
40
+ model = model.eval()
41
+
42
+ # get model specific transforms (normalization, resize)
43
+ data_config = timm.data.resolve_model_data_config(model)
44
+ transforms = timm.data.create_transform(**data_config, is_training=False)
45
+
46
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
47
+
48
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
49
+ ```
50
+
51
+ ### Image Embeddings
52
+ ```python
53
+ from urllib.request import urlopen
54
+ from PIL import Image
55
+ import timm
56
+
57
+ img = Image.open(urlopen(
58
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
59
+ ))
60
+
61
+ model = timm.create_model(
62
+ 'flexivit_base.patch30_in21k',
63
+ pretrained=True,
64
+ num_classes=0, # remove classifier nn.Linear
65
+ )
66
+ model = model.eval()
67
+
68
+ # get model specific transforms (normalization, resize)
69
+ data_config = timm.data.resolve_model_data_config(model)
70
+ transforms = timm.data.create_transform(**data_config, is_training=False)
71
+
72
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
73
+
74
+ # or equivalently (without needing to set num_classes=0)
75
+
76
+ output = model.forward_features(transforms(img).unsqueeze(0))
77
+ # output is unpooled, a (1, 226, 768) shaped tensor
78
+
79
+ output = model.forward_head(output, pre_logits=True)
80
+ # output is a (1, num_features) shaped tensor
81
+ ```
82
+
83
+ ## Model Comparison
84
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
85
+
86
+ ## Citation
87
+ ```bibtex
88
+ @article{beyer2022flexivit,
89
+ title={FlexiViT: One Model for All Patch Sizes},
90
+ author={Beyer, Lucas and Izmailov, Pavel and Kolesnikov, Alexander and Caron, Mathilde and Kornblith, Simon and Zhai, Xiaohua and Minderer, Matthias and Tschannen, Michael and Alabdulmohsin, Ibrahim and Pavetic, Filip},
91
+ journal={arXiv preprint arXiv:2212.08013},
92
+ year={2022}
93
+ }
94
+ ```
95
+ ```bibtex
96
+ @article{dosovitskiy2020vit,
97
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
98
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
99
+ journal={ICLR},
100
+ year={2021}
101
+ }
102
+ ```
103
+ ```bibtex
104
+ @misc{rw2019timm,
105
+ author = {Ross Wightman},
106
+ title = {PyTorch Image Models},
107
+ year = {2019},
108
+ publisher = {GitHub},
109
+ journal = {GitHub repository},
110
+ doi = {10.5281/zenodo.4414861},
111
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
112
+ }
113
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ccf05ed75b304e8cb891785561d3b475cd10f6cf73221686a18c27d83c099bd
3
+ size 410483802