File size: 3,450 Bytes
a1c3590 7db28a3 a1c3590 7db28a3 a9e2ffc 7c7b979 7db28a3 19fce4d 2eb6d99 19fce4d bec7206 19fce4d 7db28a3 c0931f3 efa4892 7db28a3 80838a4 7db28a3 d5a705e 7db28a3 0a04b8c 7db28a3 cac2154 7db28a3 2eb6d99 7db28a3 80838a4 7db28a3 a9e2ffc 642ba04 17b3b4c c034e97 a9e2ffc bec7206 642ba04 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
pipeline_tag: image-classification
tags:
- arxiv:2010.07611
- arxiv:2104.00298
license: cc-by-nc-4.0
---
To be clear, this model is tailored to my image and video classification tasks, not to imagenet.
I built EfficientNetV2.5 s to outperform the existing EfficientNet b0 to b4, EfficientNet b1 to b4 pruned (I pruned b4), and EfficientNetV2 t to l models, whether trained using TensorFlow or PyTorch,
in terms of top-1 accuracy, efficiency, and robustness on my dataset and [CMAD benchmark](https://huggingface.co/datasets/aistrova/CMAD).
## Model Details
- **Model tasks:** Image classification / video classification / feature backbone
- **Model stats:**
- Params: 16.64 M
- Multiply-Add Operations: 5.32 G
- Image size: train = 299x299 / 304x304, test = 304x304
- Classification layer: defaults to 1,000 classes
- **Papers:**
- EfficientNetV2: Smaller Models and Faster Training: https://arxiv.org/abs/2104.00298
- Layer-adaptive sparsity for the Magnitude-based Pruning: https://arxiv.org/abs/2010.07611
- **Dataset:** ImageNet-1k
- **Pretrained:** Yes, but requires more pretraining
- **Original:** This model architecture is original
<br>
### Prepare Model for Training
To change the number of classes, replace the linear classification layer.
Here's an example of how to convert the architecture into a trainable model.
```bash
pip install ptflops timm
```
```python
from ptflops import get_model_complexity_info
import torch
import urllib.request
nclass = 3 # number of classes in your dataset
input_size = (3, 304, 304) # recommended image input size
print_layer_stats = True # prints the statistics for each layer of the model
verbose = True # prints additional info about the MAC calculation
# Download the model. Skip this step if already downloaded
base_model = "efficientnetv2.5_base_in1k"
url = f"https://huggingface.co/FredZhang7/efficientnetv2.5_rw_s/resolve/main/{base_model}.pth"
file_name = f"./{base_model}.pth"
urllib.request.urlretrieve(url, file_name)
shape = (2,) + input_size
example_inputs = torch.randn(shape)
example_inputs = (example_inputs - example_inputs.min()) / (example_inputs.max() - example_inputs.min())
model = torch.load(file_name)
model.classifier = torch.nn.Linear(in_features=1984, out_features=nclass, bias=True)
macs, nparams = get_model_complexity_info(model, input_size, as_strings=False, print_per_layer_stat=print_layer_stats, verbose=verbose)
traced_model = torch.jit.trace(model, example_inputs)
model_name = f'{base_model}_{"{:.2f}".format(nparams / 1e6)}M_{"{:.2f}".format(macs / 1e9)}G.pth'
traced_model.save(model_name)
# Load the trainable model
model = torch.load(model_name)
```
### Top-1 Accuracy Comparisons
I finetuned the existing models on either 299x299, 304x304, 320x320, or 384x384 resolution, depending on the input size used during pretraining and the VRAM usage.
`efficientnet_b3_pruned` achieved the second highest top-1 accuracy as well as the highest epoch-1 training accuracy on my task, out of EfficientNetV2.5 small and all existing EfficientNet models my 24 GB VRAM RTX 3090 could handle.
I will publish the detailed report in another model repository, including the link to the GVNS benchmarks.
This repository is only for the base model, pretrained on ImageNet, not my task.
### Carbon Emissions
Comparing all models and testing my new architectures costed roughly 504 GPU hours, over a span of 27 days. |