|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- AlexFierro9/Kinetics400 |
|
- imagenet-1k |
|
- HuggingFaceM4/something_something_v2 |
|
language: |
|
- en |
|
pipeline_tag: video-classification |
|
extra_gated_fields: |
|
Name: text |
|
Company/Organization: text |
|
Country: text |
|
E-Mail: text |
|
--- |
|
|
|
|
|
|
|
<br> |
|
|
|
# VideoMamba |
|
|
|
## Model Details |
|
|
|
VideoMamba is a purely SSM-based model for video understanding. |
|
|
|
- **Developed by:** [OpenGVLab](https://github.com/OpenGVLab) |
|
- **Model type:** An efficient backbone based on the bidirectional state space model. |
|
- **License:** Non-commercial license |
|
|
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/OpenGVLab/VideoMamba |
|
- **Paper:** https://arxiv.org/abs/2403.06977 |
|
|
|
## Uses |
|
|
|
The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone. |
|
The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence. |
|
|
|
## How to Get Started with the Model |
|
|
|
- You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py |
|
- Then you can load this checkpoint and start training. |
|
|
|
|
|
### Citation Information |
|
|
|
``` |
|
@misc{li2024videomamba, |
|
title={VideoMamba: State Space Model for Efficient Video Understanding}, |
|
author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao}, |
|
year={2024}, |
|
eprint={2403.06977}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |