FredZhang7's picture
fix bugs
0a04b8c
|
raw
history blame
3.45 kB
metadata
pipeline_tag: image-classification
tags:
  - arxiv:2010.07611
  - arxiv:2104.00298
license: cc-by-nc-4.0

To be clear, this model is tailored to my image and video classification tasks, not to imagenet. I built EfficientNetV2.5 s to outperform the existing EfficientNet b0 to b4, EfficientNet b1 to b4 pruned (I pruned b4), and EfficientNetV2 t to l models, whether trained using TensorFlow or PyTorch, in terms of top-1 accuracy, efficiency, and robustness on my dataset and CMAD benchmark.

Model Details

  • Model tasks: Image classification / video classification / feature backbone
  • Model stats:
    • Params: 16.64 M
    • Multiply-Add Operations: 5.32 G
    • Image size: train = 299x299 / 304x304, test = 304x304
    • Classification layer: defaults to 1,000 classes
  • Papers:
  • Dataset: ImageNet-1k
  • Pretrained: Yes, but requires more pretraining
  • Original: This model architecture is original

Prepare Model for Training

To change the number of classes, replace the linear classification layer. Here's an example of how to convert the architecture into a trainable model.

pip install ptflops timm
from ptflops import get_model_complexity_info
import torch
import urllib.request

nclass = 3                  # number of classes in your dataset
input_size = (3, 304, 304)  # recommended image input size
print_layer_stats = True    # prints the statistics for each layer of the model
verbose = True              # prints additional info about the MAC calculation

# Download the model. Skip this step if already downloaded
base_model = "efficientnetv2.5_base_in1k"
url = f"https://huggingface.co/FredZhang7/efficientnetv2.5_rw_s/resolve/main/{base_model}.pth"
file_name = f"./{base_model}.pth"
urllib.request.urlretrieve(url, file_name)

shape = (2,) + input_size
example_inputs = torch.randn(shape)
example_inputs = (example_inputs - example_inputs.min()) / (example_inputs.max() - example_inputs.min())

model = torch.load(file_name)
model.classifier = torch.nn.Linear(in_features=1984, out_features=nclass, bias=True)
macs, nparams = get_model_complexity_info(model, input_size, as_strings=False, print_per_layer_stat=print_layer_stats, verbose=verbose)
traced_model = torch.jit.trace(model, example_inputs)

model_name = f'{base_model}_{"{:.2f}".format(nparams / 1e6)}M_{"{:.2f}".format(macs / 1e9)}G.pth'
traced_model.save(model_name)

# Load the trainable model
model = torch.load(model_name)

Top-1 Accuracy Comparisons

I finetuned the existing models on either 299x299, 304x304, 320x320, or 384x384 resolution, depending on the input size used during pretraining and the VRAM usage.

efficientnet_b3_pruned achieved the second highest top-1 accuracy as well as the highest epoch-1 training accuracy on my task, out of EfficientNetV2.5 small and all existing EfficientNet models my 24 GB VRAM RTX 3090 could handle.

I will publish the detailed report in another model repository, including the link to the GVNS benchmarks. This repository is only for the base model, pretrained on ImageNet, not my task.

Carbon Emissions

Comparing all models and testing my new architectures costed roughly 504 GPU hours, over a span of 27 days.