siamese_model.h5 / README.md
samanthajmichael's picture
Update README.md
f34edce verified
metadata
language: en
license: other
library_name: tensorflow
tags:
  - computer-vision
  - video-processing
  - siamese-network
  - match-cut-detection
datasets:
  - custom
metrics:
  - accuracy
model-index:
  - name: siamese_model
    results:
      - task:
          type: image-similarity
          subtype: match-cut-detection
        metrics:
          - type: accuracy
            value: 0.956
            name: Test Accuracy
'---': Test Accuracy

Model Card for samanthajmichael/siamese_model.h5

This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.

Model Details

Model Description

The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.

  • Developed by: samanthajmichael
  • Model type: Siamese Neural Network
  • Language(s): Not applicable (Computer Vision)
  • License: Not specified
  • Finetuned from model: EfficientNetB0 (used for initial feature extraction)

Model Sources

Uses

Direct Use

The model can be used to:

  1. Detect match cuts in video sequences
  2. Find visually similar sections within videos
  3. Analyze motion patterns between frame pairs
  4. Support video editing and content analysis tasks

Downstream Use

The model can be integrated into:

  • Video editing software for automated transition detection
  • Content analysis tools for finding visual patterns
  • YouTube video analysis applications (as demonstrated in the provided Streamlit app)
  • Film studies tools for analyzing editing techniques

Out-of-Scope Use

This model is not designed for:

  • Real-time video processing
  • General object detection or recognition
  • Scene classification without motion analysis
  • Processing single frames in isolation

Bias, Risks, and Limitations

  • The model's performance depends on the quality of optical flow extraction
  • May be sensitive to video resolution and frame rate
  • Performance may vary based on video content type and editing style
  • Not optimized for real-time processing of high-resolution videos

Recommendations

Users should:

  • Ensure input frames are properly preprocessed to 224x224 resolution
  • Use high-quality video sources for best results
  • Consider the model's confidence scores when making final decisions
  • Validate results in the context of their specific use case

How to Get Started with the Model

from huggingface_hub import from_pretrained_keras
import tensorflow as tf

# Load the model
model = from_pretrained_keras("samanthajmichael/siamese_model.h5")

# Preprocess your frame pairs (ensure 224x224 resolution)
# frames should be normalized to [0,1]
frame1 = preprocess_frame(frame1)  # Shape: (224, 224, 3)
frame2 = preprocess_frame(frame2)  # Shape: (224, 224, 3)

# Get similarity prediction
prediction = model.predict([np.array([frame1]), np.array([frame2])])

Training Details

Training Data

  • Training set: 14,264 frame pairs
  • Test set: 3,566 frame pairs
  • Data derived from video frames with optical flow features
  • Labels generated based on visual similarity thresholds

Training Procedure

Training Hyperparameters

  • Training regime: fp32
  • Optimizer: Adam
  • Loss function: Binary Cross-Entropy
  • Batch size: 64
  • Early stopping patience: 3
  • Input shape: (224, 224, 3)

Model Architecture

  • Base network:
    • Conv2D (32 filters) + ReLU + MaxPooling2D
    • Conv2D (64 filters) + ReLU + MaxPooling2D
    • Conv2D (128 filters) + ReLU + MaxPooling2D
    • Flatten
    • Dense (128 units)
  • Similarity computed using absolute difference
  • Final dense layer with sigmoid activation

Evaluation

Testing Data, Factors & Metrics

  • Evaluation performed on 3,566 frame pairs
  • Balanced dataset of match and non-match pairs
  • Primary metric: Binary classification accuracy

Results

  • Test accuracy: 95.60%
  • Test loss: 0.1675
  • Model shows strong performance in distinguishing match cuts from non-matches

Environmental Impact

  • Trained on Google Colab
  • Training completed in 4 epochs with early stopping
  • Relatively lightweight model with 12.9M parameters

Technical Specifications

Compute Infrastructure

  • Training platform: Google Colab
  • GPU requirements: Standard GPU runtime
  • Inference can be performed on CPU for smaller workloads

Model Architecture and Objective

Total parameters: 12,938,561 (49.36 MB)

  • All parameters are trainable
  • Model objective: Binary classification of frame pair similarity

Model Card Contact

For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.

language: - en tags: - siamese