bawolf commited on
Commit
d9a7c37
·
1 Parent(s): 6a617e3

model card updates

Browse files
Files changed (1) hide show
  1. model-card.md +0 -136
model-card.md DELETED
@@ -1,136 +0,0 @@
1
- ---
2
- language: en
3
- tags:
4
- - clip
5
- - breakdance
6
- - video-classification
7
- - dance
8
- - pytorch
9
- - vision-encoder
10
- license: MIT
11
- datasets:
12
- - custom
13
- library_name: transformers
14
- base_model: openai/clip-vit-large-patch14
15
- pipeline_tag: video-classification
16
- model-index:
17
- - name: CLIP-Based Break Dance Move Classifier
18
- results:
19
- - task:
20
- type: video-classification
21
- dataset:
22
- name: custom_breakdance
23
- type: custom
24
- metrics:
25
- - name: Overall Accuracy
26
- type: accuracy
27
- value: [specify %]
28
- - name: Windmill Precision
29
- type: precision
30
- value: [specify %]
31
- - name: Halo Precision
32
- type: precision
33
- value: [specify %]
34
- - name: Swipe Precision
35
- type: precision
36
- value: [specify %]
37
- ---
38
-
39
- # CLIP-Based Break Dance Move Classifier
40
-
41
- This model is a fine-tuned version of CLIP (ViT-Large/14) specialized in classifying break dance power moves from video frames, including windmills, halos, and swipes.
42
-
43
- ## Model Description
44
-
45
- - **Model Type:** Custom CLIP-based architecture (VariableLengthCLIP)
46
- - **Base Model:** CLIP ViT-Large/14 (for feature extraction)
47
- - **Architecture:**
48
- - Uses CLIP's vision encoder for frame-level feature extraction
49
- - Processes multiple frames from a video
50
- - Averages frame features
51
- - Projects to 3 classes via a learned linear layer
52
- - **Task:** Video Classification
53
- - **Training Data:** Custom break dance video dataset
54
- - **Output:** 3 classes of break dance moves (windmill, halo, swipe)
55
-
56
- ## Usage
57
-
58
- ```python
59
- import torch
60
- from transformers import CLIPProcessor
61
- from PIL import Image
62
- import cv2
63
- import numpy as np
64
- from src.models.model import create_model
65
-
66
- # Load model and processor
67
- model = create_model(num_classes=3, pretrained_model_name="openai/clip-vit-large-patch14")
68
- state_dict = torch.load("model.pth")
69
- model.load_state_dict(state_dict)
70
- processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
71
-
72
- # Process video
73
- def process_video(video_path, model, processor):
74
- video = cv2.VideoCapture(video_path)
75
- frames = []
76
-
77
- while video.isOpened():
78
- ret, frame = video.read()
79
- if not ret:
80
- break
81
-
82
- frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
83
- frame_pil = Image.fromarray(frame_rgb)
84
- processed = processor(images=frame_pil, return_tensors="pt")
85
- frames.append(processed.pixel_values)
86
-
87
- video.release()
88
-
89
- # Stack frames and process
90
- frames_tensor = torch.cat(frames, dim=0)
91
- with torch.no_grad():
92
- predictions = model(frames_tensor.unsqueeze(0))
93
-
94
- return predictions
95
- ```
96
-
97
- ## Limitations
98
-
99
- - Model performance may vary with video quality and lighting conditions
100
- - Best results are achieved with clear, centered shots of the dance moves
101
- - May have difficulty distinguishing between similar power moves
102
- - Performance may be affected by unusual camera angles or partial views
103
- - Currently only supports three specific power moves (windmills, halos, and swipes)
104
-
105
- ## Training Procedure
106
-
107
- - Fine-tuned on CLIP ViT-Large/14 architecture
108
- - Training dataset: Custom dataset of break dance videos
109
- - Dataset size: [specify number] frames from [specify number] different videos
110
- - Training epochs: [specify number]
111
- - Learning rate: [specify rate]
112
- - Batch size: [specify size]
113
- - Hardware used: [specify GPU/CPU details]
114
-
115
- ## Evaluation Results
116
-
117
- - Overall accuracy: [specify %]
118
- Per-class performance:
119
- - Windmills: [specify precision/recall]
120
- - Halos: [specify precision/recall]
121
- - Swipes: [specify precision/recall]
122
-
123
- ## Citation
124
-
125
- If you use this model in your research or project, please cite:
126
-
127
- ```bibtex
128
- @misc{clip-breakdance-classifier,
129
- author = {Bryant Wolf},
130
- title = {CLIP-Based Break Dance Move Classifier},
131
- year = {2024},
132
- publisher = {Hugging Face},
133
- journal = {Hugging Face Model Hub},
134
- howpublished = {\url{https://huggingface.co/bawolf/clip-breakdance-classifier}}
135
- }
136
- ```