siamese_model.h5 / README.md

Update README.md

f34edce verified about 2 months ago

5.03 kB

	---
	language: en
	license: other
	library_name: tensorflow
	tags:
	- computer-vision
	- video-processing
	- siamese-network
	- match-cut-detection
	datasets:
	- custom
	metrics:
	- accuracy
	model-index:
	- name: siamese_model
	results:
	- task:
	type: image-similarity
	subtype: match-cut-detection
	metrics:
	- type: accuracy
	value: 0.956
	name: Test Accuracy
	---: Test Accuracy
	---

	# Model Card for samanthajmichael/siamese_model.h5

	This Siamese neural network model detects match cuts in video sequences by analyzing the visual similarity between frame pairs using optical flow features.

	## Model Details

	### Model Description

	The model uses a Siamese architecture to compare pairs of video frames and determine if they constitute a match cut - a film editing technique where visually similar frames are used to create a seamless transition between scenes. The model processes optical flow representations of video frames to focus on motion patterns rather than raw pixel values.

	- Developed by: samanthajmichael
	- Model type: Siamese Neural Network
	- Language(s): Not applicable (Computer Vision)
	- License: Not specified
	- Finetuned from model: EfficientNetB0 (used for initial feature extraction)

	### Model Sources
	- Repository: https://github.com/lasyaEd/ml_project
	- Demo: Available as a Streamlit application for analyzing YouTube videos

	## Uses

	### Direct Use

	The model can be used to:
	1. Detect match cuts in video sequences
	2. Find visually similar sections within videos
	3. Analyze motion patterns between frame pairs
	4. Support video editing and content analysis tasks

	### Downstream Use

	The model can be integrated into:
	- Video editing software for automated transition detection
	- Content analysis tools for finding visual patterns
	- YouTube video analysis applications (as demonstrated in the provided Streamlit app)
	- Film studies tools for analyzing editing techniques

	### Out-of-Scope Use

	This model is not designed for:
	- Real-time video processing
	- General object detection or recognition
	- Scene classification without motion analysis
	- Processing single frames in isolation

	## Bias, Risks, and Limitations

	- The model's performance depends on the quality of optical flow extraction
	- May be sensitive to video resolution and frame rate
	- Performance may vary based on video content type and editing style
	- Not optimized for real-time processing of high-resolution videos

	### Recommendations

	Users should:
	- Ensure input frames are properly preprocessed to 224x224 resolution
	- Use high-quality video sources for best results
	- Consider the model's confidence scores when making final decisions
	- Validate results in the context of their specific use case

	## How to Get Started with the Model

	```python
	from huggingface_hub import from_pretrained_keras
	import tensorflow as tf

	# Load the model
	model = from_pretrained_keras("samanthajmichael/siamese_model.h5")

	# Preprocess your frame pairs (ensure 224x224 resolution)
	# frames should be normalized to [0,1]
	frame1 = preprocess_frame(frame1) # Shape: (224, 224, 3)
	frame2 = preprocess_frame(frame2) # Shape: (224, 224, 3)

	# Get similarity prediction
	prediction = model.predict([np.array([frame1]), np.array([frame2])])
	```

	## Training Details

	### Training Data

	- Training set: 14,264 frame pairs
	- Test set: 3,566 frame pairs
	- Data derived from video frames with optical flow features
	- Labels generated based on visual similarity thresholds

	### Training Procedure

	#### Training Hyperparameters

	- Training regime: fp32
	- Optimizer: Adam
	- Loss function: Binary Cross-Entropy
	- Batch size: 64
	- Early stopping patience: 3
	- Input shape: (224, 224, 3)

	### Model Architecture

	- Base network:
	- Conv2D (32 filters) + ReLU + MaxPooling2D
	- Conv2D (64 filters) + ReLU + MaxPooling2D
	- Conv2D (128 filters) + ReLU + MaxPooling2D
	- Flatten
	- Dense (128 units)
	- Similarity computed using absolute difference
	- Final dense layer with sigmoid activation

	## Evaluation

	### Testing Data, Factors & Metrics

	- Evaluation performed on 3,566 frame pairs
	- Balanced dataset of match and non-match pairs
	- Primary metric: Binary classification accuracy

	### Results

	- Test accuracy: 95.60%
	- Test loss: 0.1675
	- Model shows strong performance in distinguishing match cuts from non-matches

	## Environmental Impact

	- Trained on Google Colab
	- Training completed in 4 epochs with early stopping
	- Relatively lightweight model with 12.9M parameters

	## Technical Specifications

	### Compute Infrastructure

	- Training platform: Google Colab
	- GPU requirements: Standard GPU runtime
	- Inference can be performed on CPU for smaller workloads

	### Model Architecture and Objective

	Total parameters: 12,938,561 (49.36 MB)
	- All parameters are trainable
	- Model objective: Binary classification of frame pair similarity

	## Model Card Contact

	For questions about the model, please contact samanthajmichael through GitHub or Hugging Face.
	---
	language:
	- en
	tags:
	- siamese
	---