bawolf
/

breaking-vision-clip-classifier

Video Classification

English

dance

vision

breaking

Model card Files Files and versions Community

bawolf commited on Nov 13, 2024

Commit

d9a7c37

1 Parent(s): 6a617e3

model card updates

Browse files

Files changed (1) hide show

model-card.md +0 -136

model-card.md DELETED Viewed

@@ -1,136 +0,0 @@
----
-language: en
-tags:
-- clip
-- breakdance
-- video-classification
-- dance
-- pytorch
-- vision-encoder
-license: MIT
-datasets:
-- custom
-library_name: transformers
-base_model: openai/clip-vit-large-patch14
-pipeline_tag: video-classification
-model-index:
-- name: CLIP-Based Break Dance Move Classifier
-  results:
-  - task:
-      type: video-classification
-    dataset:
-      name: custom_breakdance
-      type: custom
-    metrics:
-    - name: Overall Accuracy
-      type: accuracy
-      value: [specify %]
-    - name: Windmill Precision
-      type: precision
-      value: [specify %]
-    - name: Halo Precision
-      type: precision
-      value: [specify %]
-    - name: Swipe Precision
-      type: precision
-      value: [specify %]
----
-# CLIP-Based Break Dance Move Classifier
-This model is a fine-tuned version of CLIP (ViT-Large/14) specialized in classifying break dance power moves from video frames, including windmills, halos, and swipes.
-## Model Description
-- **Model Type:** Custom CLIP-based architecture (VariableLengthCLIP)
-- **Base Model:** CLIP ViT-Large/14 (for feature extraction)
-- **Architecture:**
-  - Uses CLIP's vision encoder for frame-level feature extraction
-  - Processes multiple frames from a video
-  - Averages frame features
-  - Projects to 3 classes via a learned linear layer
-- **Task:** Video Classification
-- **Training Data:** Custom break dance video dataset
-- **Output:** 3 classes of break dance moves (windmill, halo, swipe)
-## Usage
-```python
-import torch
-from transformers import CLIPProcessor
-from PIL import Image
-import cv2
-import numpy as np
-from src.models.model import create_model
-# Load model and processor
-model = create_model(num_classes=3, pretrained_model_name="openai/clip-vit-large-patch14")
-state_dict = torch.load("model.pth")
-model.load_state_dict(state_dict)
-processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
-# Process video
-def process_video(video_path, model, processor):
-    video = cv2.VideoCapture(video_path)
-    frames = []
-    while video.isOpened():
-        ret, frame = video.read()
-        if not ret:
-            break
-        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-        frame_pil = Image.fromarray(frame_rgb)
-        processed = processor(images=frame_pil, return_tensors="pt")
-        frames.append(processed.pixel_values)
-    video.release()
-    # Stack frames and process
-    frames_tensor = torch.cat(frames, dim=0)
-    with torch.no_grad():
-        predictions = model(frames_tensor.unsqueeze(0))
-    return predictions
-```
-## Limitations
-- Model performance may vary with video quality and lighting conditions
-- Best results are achieved with clear, centered shots of the dance moves
-- May have difficulty distinguishing between similar power moves
-- Performance may be affected by unusual camera angles or partial views
-- Currently only supports three specific power moves (windmills, halos, and swipes)
-## Training Procedure
-- Fine-tuned on CLIP ViT-Large/14 architecture
-- Training dataset: Custom dataset of break dance videos
-- Dataset size: [specify number] frames from [specify number] different videos
-- Training epochs: [specify number]
-- Learning rate: [specify rate]
-- Batch size: [specify size]
-- Hardware used: [specify GPU/CPU details]
-## Evaluation Results
-- Overall accuracy: [specify %]
-  Per-class performance:
-- Windmills: [specify precision/recall]
-- Halos: [specify precision/recall]
-- Swipes: [specify precision/recall]
-## Citation
-If you use this model in your research or project, please cite:
-```bibtex
-@misc{clip-breakdance-classifier,
-  author = {Bryant Wolf},
-  title = {CLIP-Based Break Dance Move Classifier},
-  year = {2024},
-  publisher = {Hugging Face},
-  journal = {Hugging Face Model Hub},
-  howpublished = {\url{https://huggingface.co/bawolf/clip-breakdance-classifier}}
-}
-```