|
--- |
|
language: en |
|
tags: |
|
- clip |
|
- breakdance |
|
- video-classification |
|
- dance |
|
- pytorch |
|
- vision-encoder |
|
license: MIT |
|
datasets: |
|
- custom |
|
library_name: transformers |
|
base_model: openai/clip-vit-large-patch14 |
|
pipeline_tag: video-classification |
|
model-index: |
|
- name: CLIP-Based Break Dance Move Classifier |
|
results: |
|
- task: |
|
type: video-classification |
|
dataset: |
|
name: custom_breakdance |
|
type: custom |
|
metrics: |
|
- name: Overall Accuracy |
|
type: accuracy |
|
value: [specify %] |
|
- name: Windmill Precision |
|
type: precision |
|
value: [specify %] |
|
- name: Halo Precision |
|
type: precision |
|
value: [specify %] |
|
- name: Swipe Precision |
|
type: precision |
|
value: [specify %] |
|
--- |
|
|
|
# CLIP-Based Break Dance Move Classifier |
|
|
|
This model is a fine-tuned version of CLIP (ViT-Large/14) specialized in classifying break dance power moves from video frames, including windmills, halos, and swipes. |
|
|
|
## Model Description |
|
|
|
- **Model Type:** Custom CLIP-based architecture (VariableLengthCLIP) |
|
- **Base Model:** CLIP ViT-Large/14 (for feature extraction) |
|
- **Architecture:** |
|
- Uses CLIP's vision encoder for frame-level feature extraction |
|
- Processes multiple frames from a video |
|
- Averages frame features |
|
- Projects to 3 classes via a learned linear layer |
|
- **Task:** Video Classification |
|
- **Training Data:** Custom break dance video dataset |
|
- **Output:** 3 classes of break dance moves (windmill, halo, swipe) |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from transformers import CLIPProcessor |
|
from PIL import Image |
|
import cv2 |
|
import numpy as np |
|
from src.models.model import create_model |
|
|
|
# Load model and processor |
|
model = create_model(num_classes=3, pretrained_model_name="openai/clip-vit-large-patch14") |
|
state_dict = torch.load("model.pth") |
|
model.load_state_dict(state_dict) |
|
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14") |
|
|
|
# Process video |
|
def process_video(video_path, model, processor): |
|
video = cv2.VideoCapture(video_path) |
|
frames = [] |
|
|
|
while video.isOpened(): |
|
ret, frame = video.read() |
|
if not ret: |
|
break |
|
|
|
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) |
|
frame_pil = Image.fromarray(frame_rgb) |
|
processed = processor(images=frame_pil, return_tensors="pt") |
|
frames.append(processed.pixel_values) |
|
|
|
video.release() |
|
|
|
# Stack frames and process |
|
frames_tensor = torch.cat(frames, dim=0) |
|
with torch.no_grad(): |
|
predictions = model(frames_tensor.unsqueeze(0)) |
|
|
|
return predictions |
|
``` |
|
|
|
## Limitations |
|
|
|
- Model performance may vary with video quality and lighting conditions |
|
- Best results are achieved with clear, centered shots of the dance moves |
|
- May have difficulty distinguishing between similar power moves |
|
- Performance may be affected by unusual camera angles or partial views |
|
- Currently only supports three specific power moves (windmills, halos, and swipes) |
|
|
|
## Training Procedure |
|
|
|
- Fine-tuned on CLIP ViT-Large/14 architecture |
|
- Training dataset: Custom dataset of break dance videos |
|
- Dataset size: [specify number] frames from [specify number] different videos |
|
- Training epochs: [specify number] |
|
- Learning rate: [specify rate] |
|
- Batch size: [specify size] |
|
- Hardware used: [specify GPU/CPU details] |
|
|
|
## Evaluation Results |
|
|
|
- Overall accuracy: [specify %] |
|
Per-class performance: |
|
- Windmills: [specify precision/recall] |
|
- Halos: [specify precision/recall] |
|
- Swipes: [specify precision/recall] |
|
|
|
## Citation |
|
|
|
If you use this model in your research or project, please cite: |
|
|
|
```bibtex |
|
@misc{clip-breakdance-classifier, |
|
author = {Bryant Wolf}, |
|
title = {CLIP-Based Break Dance Move Classifier}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Model Hub}, |
|
howpublished = {\url{https://huggingface.co/bawolf/clip-breakdance-classifier}} |
|
} |
|
``` |
|
|