--- language: en license: other library_name: tensorflow tags: - computer-vision - video-processing - siamese-network - match-cut-detection datasets: - custom metrics: - accuracy model-index: - name: siamese_model results: - task: type: image-similarity subtype: match-cut-detection metrics: - type: accuracy value: 0.956 name: Test Accuracy ---: Test Accuracy --- # Model Card for samanthajmichael/siamese_model.h5 This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features. ## Model Details ### Model Description The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values. - **Developed by:** samanthajmichael - **Model type:** Siamese Neural Network - **Language(s):** Not applicable (Computer Vision) - **License:** Not specified - **Finetuned from model:** EfficientNetB0 (used for initial feature extraction) ### Model Sources - **Repository:** https://github.com/lasyaEd/ml_project - **Demo:** Available as a Streamlit application for analyzing YouTube videos ## Uses ### Direct Use The model can be used to: 1. Detect match cuts in video sequences 2. Find visually similar sections within videos 3. Analyze motion patterns between frame pairs 4. Support video editing and content analysis tasks ### Downstream Use The model can be integrated into: - Video editing software for automated transition detection - Content analysis tools for finding visual patterns - YouTube video analysis applications (as demonstrated in the provided Streamlit app) - Film studies tools for analyzing editing techniques ### Out-of-Scope Use This model is not designed for: - Real-time video processing - General object detection or recognition - Scene classification without motion analysis - Processing single frames in isolation ## Bias, Risks, and Limitations - The model's performance depends on the quality of optical flow extraction - May be sensitive to video resolution and frame rate - Performance may vary based on video content type and editing style - Not optimized for real-time processing of high-resolution videos ### Recommendations Users should: - Ensure input frames are properly preprocessed to 224x224 resolution - Use high-quality video sources for best results - Consider the model's confidence scores when making final decisions - Validate results in the context of their specific use case ## How to Get Started with the Model ```python from huggingface_hub import from_pretrained_keras import tensorflow as tf # Load the model model = from_pretrained_keras("samanthajmichael/siamese_model.h5") # Preprocess your frame pairs (ensure 224x224 resolution) # frames should be normalized to [0,1] frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3) frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3) # Get similarity prediction prediction = model.predict([np.array([frame1]), np.array([frame2])]) ``` ## Training Details ### Training Data - Training set: 14,264 frame pairs - Test set: 3,566 frame pairs - Data derived from video frames with optical flow features - Labels generated based on visual similarity thresholds ### Training Procedure #### Training Hyperparameters - **Training regime:** fp32 - Optimizer: Adam - Loss function: Binary Cross-Entropy - Batch size: 64 - Early stopping patience: 3 - Input shape: (224, 224, 3) ### Model Architecture - Base network: - Conv2D (32 filters) + ReLU + MaxPooling2D - Conv2D (64 filters) + ReLU + MaxPooling2D - Conv2D (128 filters) + ReLU + MaxPooling2D - Flatten - Dense (128 units) - Similarity computed using absolute difference - Final dense layer with sigmoid activation ## Evaluation ### Testing Data, Factors & Metrics - Evaluation performed on 3,566 frame pairs - Balanced dataset of match and non-match pairs - Primary metric: Binary classification accuracy ### Results - Test accuracy: 95.60% - Test loss: 0.1675 - Model shows strong performance in distinguishing match cuts from non-matches ## Environmental Impact - Trained on Google Colab - Training completed in 4 epochs with early stopping - Relatively lightweight model with 12.9M parameters ## Technical Specifications ### Compute Infrastructure - Training platform: Google Colab - GPU requirements: Standard GPU runtime - Inference can be performed on CPU for smaller workloads ### Model Architecture and Objective Total parameters: 12,938,561 (49.36 MB) - All parameters are trainable - Model objective: Binary classification of frame pair similarity ## Model Card Contact For questions about the model, please contact samanthajmichael through GitHub or Hugging Face. --- language: - en tags: - siamese ---