Model Card for samanthajmichael/siamese_model.h5

This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.

Model Details

Model Description

The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.

Developed by: samanthajmichael
Model type: Siamese Neural Network
Language(s): Not applicable (Computer Vision)
License: Not specified
Finetuned from model: EfficientNetB0 (used for initial feature extraction)

Model Sources

Repository: https://github.com/lasyaEd/ml_project
Demo: Available as a Streamlit application for analyzing YouTube videos

Uses

Direct Use

The model can be used to:

Detect match cuts in video sequences
Find visually similar sections within videos
Analyze motion patterns between frame pairs
Support video editing and content analysis tasks

Downstream Use

The model can be integrated into:

Video editing software for automated transition detection
Content analysis tools for finding visual patterns
YouTube video analysis applications (as demonstrated in the provided Streamlit app)
Film studies tools for analyzing editing techniques

Out-of-Scope Use

This model is not designed for:

Real-time video processing
General object detection or recognition
Scene classification without motion analysis
Processing single frames in isolation

Bias, Risks, and Limitations

The model's performance depends on the quality of optical flow extraction
May be sensitive to video resolution and frame rate
Performance may vary based on video content type and editing style
Not optimized for real-time processing of high-resolution videos

Recommendations

Users should:

Ensure input frames are properly preprocessed to 224x224 resolution
Use high-quality video sources for best results
Consider the model's confidence scores when making final decisions
Validate results in the context of their specific use case

How to Get Started with the Model

from huggingface_hub import from_pretrained_keras
import tensorflow as tf

# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")

# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1)  # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2)  # Shape: (224, 224, 3)

# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])

Training Details

Training Data

Training set: 14,264 frame pairs
Test set: 3,566 frame pairs
Data derived from video frames with optical flow features
Labels generated based on visual similarity thresholds

Training Procedure

Training Hyperparameters

Training regime: fp32
Optimizer: Adam
Loss function: Binary Cross-Entropy
Batch size: 64
Early stopping patience: 3
Input shape: (224, 224, 3)

Model Architecture

Base network:
- Conv2D (32 filters) + ReLU + MaxPooling2D
- Conv2D (64 filters) + ReLU + MaxPooling2D
- Conv2D (128 filters) + ReLU + MaxPooling2D
- Flatten
- Dense (128 units)
Similarity computed using absolute difference
Final dense layer with sigmoid activation

Evaluation

Testing Data, Factors & Metrics

Evaluation performed on 3,566 frame pairs
Balanced dataset of match and non-match pairs
Primary metric: Binary classification accuracy

Results

Test accuracy: 95.60%
Test loss: 0.1675
Model shows strong performance in distinguishing match cuts from non-matches

Environmental Impact

Trained on Google Colab
Training completed in 4 epochs with early stopping
Relatively lightweight model with 12.9M parameters

Technical Specifications

Compute Infrastructure

Training platform: Google Colab
GPU requirements: Standard GPU runtime
Inference can be performed on CPU for smaller workloads

Model Architecture and Objective

Total parameters: 12,938,561 (49.36 MB)

All parameters are trainable
Model objective: Binary classification of frame pair similarity

samanthajmichael
/

siamese_model.h5