--- tags: - model_hub_mixin - pytorch_model_hub_mixin --- # Model Card: Time-Conditioned U-Net for MNIST ## Model Details - **Architecture**: Time-Conditioned U-Net - **Dataset**: [Comic Faces Paired Synthetic](https://www.kaggle.com/datasets/defileroff/comic-faces-paired-synthetic) - **Batch Size**: 256 - **Image Size**: 28x28 - **Loss Function**: Mean Squared Error (MSE) - **Optimizer**: Adam (learning rate = 1e-4) ## Model Architecture This model is a U-Net-based neural network that incorporates time conditioning using sinusoidal embeddings and an MLP. The architecture is designed for small grayscale images (e.g., MNIST) and consists of: ### **Encoder (Contracting Path)**: - **Downsampling** using three `DoubleConv` layers with 32, 64, and 128 channels, respectively. - Time embedding added at each convolution block. - **Max pooling** used to reduce spatial dimensions. ### **Decoder (Expanding Path)**: - **Upsampling** via bilinear interpolation. - Skip connections from encoder layers to corresponding decoder layers. - Two `DoubleConv` layers with 128+64 and 64+32 channels, respectively. - Final `1x1` convolution to map to the output. ### **Time Embedding**: - Uses a sinusoidal positional encoding to represent timestep information. - An MLP refines the embedding before passing it to convolutional layers. ## Implementation ### **Generator (U-Net)** ```python class UNet(nn.Module, PyTorchModelHubMixin): def __init__(self, in_channels=1, out_channels=1, time_embedding_dim=32): super(UNet, self).__init__() # Time embedding layer self.time_embedding = TimeEmbedding(time_embedding_dim) # Encoder self.down_conv1 = DoubleConv(in_channels, 32, time_embedding_dim) self.down_conv2 = DoubleConv(32, 64, time_embedding_dim) self.down_conv3 = DoubleConv(64, 128, time_embedding_dim) self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2) self.upsample = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True) # Decoder self.up_conv2 = DoubleConv(128 + 64, 64, time_embedding_dim) self.up_conv1 = DoubleConv(64 + 32, 32, time_embedding_dim) self.final_conv = nn.Conv2d(32, out_channels, kernel_size=1) def forward(self, x, timesteps): t = self.time_embedding(timesteps) x1 = self.down_conv1(x, t) x2 = self.down_conv2(self.maxpool(x1), t) x3 = self.down_conv3(self.maxpool(x2), t) x = self.upsample(x3) x = torch.cat([x2, x], dim=1) x = self.up_conv2(x, t) x = self.upsample(x) x = torch.cat([x1, x], dim=1) x = self.up_conv1(x, t) return self.final_conv(x) ``` Time Embedding ```python class TimeEmbedding(nn.Module): def __init__(self, embedding_dim): super().__init__() self.mlp = nn.Sequential( nn.SiLU(), nn.Linear(embedding_dim, embedding_dim), ) def forward(self, t): half_dim = self.embedding_dim // 2 embeddings = torch.exp(torch.arange(half_dim, device=t.device) * -(torch.log(torch.tensor(10000.0)) / (half_dim - 1))) embeddings = t[:, None] * embeddings[None, :] embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1) return self.mlp(embeddings) ``` ## Training Configuration - Batch Size: 256 - Image Size: 28x28 - Loss Function: Mean Squared Error (MSE) - Optimizer: Adam (learning rate = 1e-4) This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration: - Library: [More Information Needed] - Docs: [More Information Needed]