|
--- |
|
pipeline_tag: image-classification |
|
tags: |
|
- arxiv:2010.07611 |
|
- arxiv:2104.00298 |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
To be clear, this model is tailored to my image and video classification tasks, not to imagenet. |
|
I built EfficientNetV2.5 s to outperform the existing EfficientNet b0 to b4, EfficientNet b1 to b4 pruned (I pruned b4), and EfficientNetV2 t to l models, whether trained using TensorFlow or PyTorch, |
|
in terms of top-1 accuracy, efficiency, and robustness on my dataset and [CMAD benchmark](https://huggingface.co/datasets/aistrova/CMAD). |
|
|
|
## Model Details |
|
- **Model tasks:** Image classification / video classification / feature backbone |
|
- **Model stats:** |
|
- Params: 16.64 M |
|
- Multiply-Add Operations: 5.32 G |
|
- Image size: train = 299x299 / 304x304, test = 304x304 |
|
- Classification layer: defaults to 1,000 classes |
|
- **Papers:** |
|
- EfficientNetV2: Smaller Models and Faster Training: https://arxiv.org/abs/2104.00298 |
|
- Layer-adaptive sparsity for the Magnitude-based Pruning: https://arxiv.org/abs/2010.07611 |
|
- **Dataset:** ImageNet-1k |
|
- **Pretrained:** Yes, but requires more pretraining |
|
- **Original:** This model architecture is original |
|
|
|
<br> |
|
|
|
### Prepare Model for Training |
|
To change the number of classes, replace the linear classification layer. |
|
Here's an example of how to convert the architecture into a trainable model. |
|
```bash |
|
pip install ptflops timm |
|
``` |
|
```python |
|
from ptflops import get_model_complexity_info |
|
import torch |
|
import urllib.request |
|
|
|
nclass = 3 # number of classes in your dataset |
|
input_size = (3, 304, 304) # recommended image input size |
|
print_layer_stats = True # prints the statistics for each layer of the model |
|
verbose = True # prints additional info about the MAC calculation |
|
|
|
# Download the model. Skip this step if already downloaded |
|
base_model = "efficientnetv2.5_base_in1k" |
|
url = f"https://huggingface.co/FredZhang7/efficientnetv2.5_rw_s/resolve/main/{base_model}.pth" |
|
file_name = f"./{base_model}.pth" |
|
urllib.request.urlretrieve(url, file_name) |
|
|
|
shape = (2,) + input_size |
|
example_inputs = torch.randn(shape) |
|
example_inputs = (example_inputs - example_inputs.min()) / (example_inputs.max() - example_inputs.min()) |
|
|
|
model = torch.load(file_name) |
|
model.classifier = torch.nn.Linear(in_features=1984, out_features=nclass, bias=True) |
|
macs, nparams = get_model_complexity_info(model, input_size, as_strings=False, print_per_layer_stat=print_layer_stats, verbose=verbose) |
|
traced_model = torch.jit.trace(model, example_inputs) |
|
|
|
model_name = f'{base_model}_{"{:.2f}".format(nparams / 1e6)}M_{"{:.2f}".format(macs / 1e9)}G.pth' |
|
traced_model.save(model_name) |
|
|
|
# Load the trainable model |
|
model = torch.load(model_name) |
|
``` |
|
|
|
### Top-1 Accuracy Comparisons |
|
I finetuned the existing models on either 299x299, 304x304, 320x320, or 384x384 resolution, depending on the input size used during pretraining and the VRAM usage. |
|
|
|
`efficientnet_b3_pruned` achieved the second highest top-1 accuracy as well as the highest epoch-1 training accuracy on my task, out of EfficientNetV2.5 small and all existing EfficientNet models my 24 GB VRAM RTX 3090 could handle. |
|
|
|
I will publish the detailed report in another model repository, including the link to the GVNS benchmarks. |
|
This repository is only for the base model, pretrained on ImageNet, not my task. |
|
|
|
### Carbon Emissions |
|
Comparing all models and testing my new architectures costed roughly 504 GPU hours, over a span of 27 days. |