siamese_model.h5 / README.md
samanthajmichael's picture
Update README.md
f34edce verified
---
language: en
license: other
library_name: tensorflow
tags:
- computer-vision
- video-processing
- siamese-network
- match-cut-detection
datasets:
- custom
metrics:
- accuracy
model-index:
- name: siamese_model
results:
- task:
type: image-similarity
subtype: match-cut-detection
metrics:
- type: accuracy
value: 0.956
name: Test Accuracy
---: Test Accuracy
---
# Model Card for samanthajmichael/siamese_model.h5
This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.
## Model Details
### Model Description
The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.
- **Developed by:** samanthajmichael
- **Model type:** Siamese Neural Network
- **Language(s):** Not applicable (Computer Vision)
- **License:** Not specified
- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction)
### Model Sources
- **Repository:** https://github.com/lasyaEd/ml_project
- **Demo:** Available as a Streamlit application for analyzing YouTube videos
## Uses
### Direct Use
The model can be used to:
1. Detect match cuts in video sequences
2. Find visually similar sections within videos
3. Analyze motion patterns between frame pairs
4. Support video editing and content analysis tasks
### Downstream Use
The model can be integrated into:
- Video editing software for automated transition detection
- Content analysis tools for finding visual patterns
- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
- Film studies tools for analyzing editing techniques
### Out-of-Scope Use
This model is not designed for:
- Real-time video processing
- General object detection or recognition
- Scene classification without motion analysis
- Processing single frames in isolation
## Bias, Risks, and Limitations
- The model's performance depends on the quality of optical flow extraction
- May be sensitive to video resolution and frame rate
- Performance may vary based on video content type and editing style
- Not optimized for real-time processing of high-resolution videos
### Recommendations
Users should:
- Ensure input frames are properly preprocessed to 224x224 resolution
- Use high-quality video sources for best results
- Consider the model's confidence scores when making final decisions
- Validate results in the context of their specific use case
## How to Get Started with the Model
```python
from huggingface_hub import from_pretrained_keras
import tensorflow as tf
# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")
# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3)
# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])
```
## Training Details
### Training Data
- Training set: 14,264 frame pairs
- Test set: 3,566 frame pairs
- Data derived from video frames with optical flow features
- Labels generated based on visual similarity thresholds
### Training Procedure
#### Training Hyperparameters
- **Training regime:** fp32
- Optimizer: Adam
- Loss function: Binary Cross-Entropy
- Batch size: 64
- Early stopping patience: 3
- Input shape: (224, 224, 3)
### Model Architecture
- Base network:
- Conv2D (32 filters) + ReLU + MaxPooling2D
- Conv2D (64 filters) + ReLU + MaxPooling2D
- Conv2D (128 filters) + ReLU + MaxPooling2D
- Flatten
- Dense (128 units)
- Similarity computed using absolute difference
- Final dense layer with sigmoid activation
## Evaluation
### Testing Data, Factors & Metrics
- Evaluation performed on 3,566 frame pairs
- Balanced dataset of match and non-match pairs
- Primary metric: Binary classification accuracy
### Results
- Test accuracy: 95.60%
- Test loss: 0.1675
- Model shows strong performance in distinguishing match cuts from non-matches
## Environmental Impact
- Trained on Google Colab
- Training completed in 4 epochs with early stopping
- Relatively lightweight model with 12.9M parameters
## Technical Specifications
### Compute Infrastructure
- Training platform: Google Colab
- GPU requirements: Standard GPU runtime
- Inference can be performed on CPU for smaller workloads
### Model Architecture and Objective
Total parameters: 12,938,561 (49.36 MB)
- All parameters are trainable
- Model objective: Binary classification of frame pair similarity
## Model Card Contact
For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
---
language:
- en
tags:
- siamese
---