|
--- |
|
language: en |
|
license: other |
|
library_name: tensorflow |
|
tags: |
|
- computer-vision |
|
- video-processing |
|
- siamese-network |
|
- match-cut-detection |
|
datasets: |
|
- custom |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: siamese_model |
|
results: |
|
- task: |
|
type: image-similarity |
|
subtype: match-cut-detection |
|
metrics: |
|
- type: accuracy |
|
value: 0.956 |
|
name: Test Accuracy |
|
---: Test Accuracy |
|
--- |
|
|
|
# Model Card for samanthajmichael/siamese_model.h5 |
|
|
|
This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values. |
|
|
|
- **Developed by:** samanthajmichael |
|
- **Model type:** Siamese Neural Network |
|
- **Language(s):** Not applicable (Computer Vision) |
|
- **License:** Not specified |
|
- **Finetuned from model:** EfficientNetB0 (used for initial feature extraction) |
|
|
|
### Model Sources |
|
- **Repository:** https://github.com/lasyaEd/ml_project |
|
- **Demo:** Available as a Streamlit application for analyzing YouTube videos |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model can be used to: |
|
1. Detect match cuts in video sequences |
|
2. Find visually similar sections within videos |
|
3. Analyze motion patterns between frame pairs |
|
4. Support video editing and content analysis tasks |
|
|
|
### Downstream Use |
|
|
|
The model can be integrated into: |
|
- Video editing software for automated transition detection |
|
- Content analysis tools for finding visual patterns |
|
- YouTube video analysis applications (as demonstrated in the provided Streamlit app) |
|
- Film studies tools for analyzing editing techniques |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not designed for: |
|
- Real-time video processing |
|
- General object detection or recognition |
|
- Scene classification without motion analysis |
|
- Processing single frames in isolation |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- The model's performance depends on the quality of optical flow extraction |
|
- May be sensitive to video resolution and frame rate |
|
- Performance may vary based on video content type and editing style |
|
- Not optimized for real-time processing of high-resolution videos |
|
|
|
### Recommendations |
|
|
|
Users should: |
|
- Ensure input frames are properly preprocessed to 224x224 resolution |
|
- Use high-quality video sources for best results |
|
- Consider the model's confidence scores when making final decisions |
|
- Validate results in the context of their specific use case |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from huggingface_hub import from_pretrained_keras |
|
import tensorflow as tf |
|
|
|
# Load the model |
|
model = from_pretrained_keras("samanthajmichael/siamese_model.h5") |
|
|
|
# Preprocess your frame pairs (ensure 224x224 resolution) |
|
# frames should be normalized to [0,1] |
|
frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3) |
|
frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3) |
|
|
|
# Get similarity prediction |
|
prediction = model.predict([np.array([frame1]), np.array([frame2])]) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
- Training set: 14,264 frame pairs |
|
- Test set: 3,566 frame pairs |
|
- Data derived from video frames with optical flow features |
|
- Labels generated based on visual similarity thresholds |
|
|
|
### Training Procedure |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** fp32 |
|
- Optimizer: Adam |
|
- Loss function: Binary Cross-Entropy |
|
- Batch size: 64 |
|
- Early stopping patience: 3 |
|
- Input shape: (224, 224, 3) |
|
|
|
### Model Architecture |
|
|
|
- Base network: |
|
- Conv2D (32 filters) + ReLU + MaxPooling2D |
|
- Conv2D (64 filters) + ReLU + MaxPooling2D |
|
- Conv2D (128 filters) + ReLU + MaxPooling2D |
|
- Flatten |
|
- Dense (128 units) |
|
- Similarity computed using absolute difference |
|
- Final dense layer with sigmoid activation |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
- Evaluation performed on 3,566 frame pairs |
|
- Balanced dataset of match and non-match pairs |
|
- Primary metric: Binary classification accuracy |
|
|
|
### Results |
|
|
|
- Test accuracy: 95.60% |
|
- Test loss: 0.1675 |
|
- Model shows strong performance in distinguishing match cuts from non-matches |
|
|
|
## Environmental Impact |
|
|
|
- Trained on Google Colab |
|
- Training completed in 4 epochs with early stopping |
|
- Relatively lightweight model with 12.9M parameters |
|
|
|
## Technical Specifications |
|
|
|
### Compute Infrastructure |
|
|
|
- Training platform: Google Colab |
|
- GPU requirements: Standard GPU runtime |
|
- Inference can be performed on CPU for smaller workloads |
|
|
|
### Model Architecture and Objective |
|
|
|
Total parameters: 12,938,561 (49.36 MB) |
|
- All parameters are trainable |
|
- Model objective: Binary classification of frame pair similarity |
|
|
|
## Model Card Contact |
|
|
|
For questions about the model, please contact samanthajmichael through GitHub or Hugging Face. |
|
--- |
|
language: |
|
- en |
|
tags: |
|
- siamese |
|
--- |