Model Colorization Autoencoder
Model Description
This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color.
Architecture
- Encoder: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector.
- Decoder: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function.
The architecture details are as follows:
class ModelColorization(nn.Module, PyTorchModelHubMixin):
def __init__(self):
super(ModelColorization, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.Flatten(),
nn.Linear(16*45*45, 4000),
)
self.decoder = nn.Sequential(
nn.Linear(4000, 16 * 45 * 45),
nn.ReLU(),
nn.Unflatten(1, (16, 45, 45)),
nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.Sigmoid()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
Training Details
The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training:
Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042
Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035
Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032
Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030
Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030
The model demonstrated continuous improvement in reducing both training and validation loss over the epochs.
Usage
You can load the model from the Hugging Face Hub using the following code:
# Ensure you have the necessary dependencies installed:
pip install torch torchvision transformers
from transformers import AutoModel
model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization")