Model Card for samanthajmichael/siamese_model.h5
This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.
Model Details
Model Description
The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.
- Developed by: samanthajmichael
- Model type: Siamese Neural Network
- Language(s): Not applicable (Computer Vision)
- License: Not specified
- Finetuned from model: EfficientNetB0 (used for initial feature extraction)
Model Sources
- Repository: https://github.com/lasyaEd/ml_project
- Demo: Available as a Streamlit application for analyzing YouTube videos
Uses
Direct Use
The model can be used to:
- Detect match cuts in video sequences
- Find visually similar sections within videos
- Analyze motion patterns between frame pairs
- Support video editing and content analysis tasks
Downstream Use
The model can be integrated into:
- Video editing software for automated transition detection
- Content analysis tools for finding visual patterns
- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
- Film studies tools for analyzing editing techniques
Out-of-Scope Use
This model is not designed for:
- Real-time video processing
- General object detection or recognition
- Scene classification without motion analysis
- Processing single frames in isolation
Bias, Risks, and Limitations
- The model's performance depends on the quality of optical flow extraction
- May be sensitive to video resolution and frame rate
- Performance may vary based on video content type and editing style
- Not optimized for real-time processing of high-resolution videos
Recommendations
Users should:
- Ensure input frames are properly preprocessed to 224x224 resolution
- Use high-quality video sources for best results
- Consider the model's confidence scores when making final decisions
- Validate results in the context of their specific use case
How to Get Started with the Model
from huggingface_hub import from_pretrained_keras
import tensorflow as tf
# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")
# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3)
# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])
Training Details
Training Data
- Training set: 14,264 frame pairs
- Test set: 3,566 frame pairs
- Data derived from video frames with optical flow features
- Labels generated based on visual similarity thresholds
Training Procedure
Training Hyperparameters
- Training regime: fp32
- Optimizer: Adam
- Loss function: Binary Cross-Entropy
- Batch size: 64
- Early stopping patience: 3
- Input shape: (224, 224, 3)
Model Architecture
- Base network:
- Conv2D (32 filters) + ReLU + MaxPooling2D
- Conv2D (64 filters) + ReLU + MaxPooling2D
- Conv2D (128 filters) + ReLU + MaxPooling2D
- Flatten
- Dense (128 units)
- Similarity computed using absolute difference
- Final dense layer with sigmoid activation
Evaluation
Testing Data, Factors & Metrics
- Evaluation performed on 3,566 frame pairs
- Balanced dataset of match and non-match pairs
- Primary metric: Binary classification accuracy
Results
- Test accuracy: 95.60%
- Test loss: 0.1675
- Model shows strong performance in distinguishing match cuts from non-matches
Environmental Impact
- Trained on Google Colab
- Training completed in 4 epochs with early stopping
- Relatively lightweight model with 12.9M parameters
Technical Specifications
Compute Infrastructure
- Training platform: Google Colab
- GPU requirements: Standard GPU runtime
- Inference can be performed on CPU for smaller workloads
Model Architecture and Objective
Total parameters: 12,938,561 (49.36 MB)
- All parameters are trainable
- Model objective: Binary classification of frame pair similarity
Model Card Contact
For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
language: - en tags: - siamese
Evaluation results
- Test Accuracyself-reported0.956