Model Colorization Autoencoder

Model Description

This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color.

Architecture

  • Encoder: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector.
  • Decoder: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function.

The architecture details are as follows:

class ModelColorization(nn.Module, PyTorchModelHubMixin):
    def __init__(self):
        super(ModelColorization, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Flatten(),
            nn.Linear(16*45*45, 4000),
        )
        self.decoder = nn.Sequential(
            nn.Linear(4000, 16 * 45 * 45),
            nn.ReLU(),
            nn.Unflatten(1, (16, 45, 45)),
            nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

Training Details

The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training:

Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042

Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035

Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032

Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030

Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030

The model demonstrated continuous improvement in reducing both training and validation loss over the epochs.

Usage

You can load the model from the Hugging Face Hub using the following code:

# Ensure you have the necessary dependencies installed:
pip install torch torchvision transformers

from transformers import AutoModel

model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization")
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
259M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train sebastiansarasti/AutoEncoderImageColorization

Space using sebastiansarasti/AutoEncoderImageColorization 1