|
--- |
|
tags: |
|
- computer_vision |
|
- pose_estimation |
|
- animal_pose_estimation |
|
- deeplabcut |
|
--- |
|
|
|
# MODEL CARD: |
|
|
|
## Model Details |
|
|
|
• SuperAnimal-Quadruped model developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images. |
|
Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details. |
|
|
|
• The model is an HRNet-w32 trained on our Quadruped-80K dataset. |
|
|
|
• It was trained within the DeepLabCut framework. Full training details can be found in Ye et al. 2023. |
|
You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). |
|
Here is an example useage: |
|
|
|
```python |
|
from pathlib import Path |
|
from dlclibrary import download_huggingface_model |
|
|
|
# Creates a folder and downloads the model to it |
|
model_dir = Path("./superanimal_quadruped_model") |
|
model_dir.mkdir() |
|
download_huggingface_model("superanimal_quadruped", model_dir) |
|
``` |
|
|
|
## Intended Use |
|
• Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting |
|
point than ImageNet weights in downstream datasets such as AP-10K. |
|
|
|
• Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience |
|
and ecology. |
|
|
|
• Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with |
|
minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those |
|
we show in the paper. |
|
|
|
## Factors |
|
|
|
• Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and |
|
resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints. |
|
When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal, |
|
if used without further fine-tuning or with a method such as BUCTD (Zhou et al. 2023 ICCV). |
|
|
|
## Metrics |
|
• Mean Average Precision (mAP) |
|
|
|
## Evaluation Data |
|
• In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here, |
|
we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned" |
|
on all animal training data listed below. This model is meant for production and evaluation in downstream scientific |
|
applications. |
|
|
|
## Training Data: |
|
|
|
It consists of being trained together on the following datasets: |
|
|
|
- **AwA-Pose** Quadruped dataset, see full details at (1). |
|
- **AnimalPose** See full details at (2). |
|
- **AcinoSet** See full details at (3). |
|
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4). |
|
- **StanfordDogs** See full details at (5, 6). |
|
- **AP-10K** See full details at (7). |
|
- **iRodent** We utilized the iNaturalist API functions for scraping observations |
|
with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the |
|
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are |
|
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid |
|
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse |
|
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then |
|
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that |
|
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), |
|
pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and |
|
segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392. |
|
**APT-36K** See full details at (12). |
|
|
|
Here is an image with the keypoint guide: |
|
<p align="center"> |
|
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1702502929727-OS5FPNIBNTVIR4LL1Q0B/kypts_SAQ.png?format=1500w" width="95%"> |
|
</p> |
|
|
|
|
|
## Ethical Considerations |
|
|
|
• No experimental data was collected for this model; all datasets used are cited. |
|
|
|
## Caveats and Recommendations |
|
|
|
• The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal |
|
characteristics not well-represented in the training data. |
|
|
|
• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a |
|
unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary |
|
Note on annotator bias). |
|
|
|
• Note the dataset is highly diverse across species, but collectively has more |
|
representation of domesticated animals like dogs, cats, horses, and cattle. |
|
|
|
• We recommend if performance is not as |
|
good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own |
|
labeling. |
|
|
|
## License |
|
|
|
Modified MIT. |
|
|
|
Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors. |
|
|
|
Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive, |
|
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”) |
|
to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions: |
|
|
|
The above copyright notice and this permission notice shall be included in all copies or substantial |
|
portions of the Software: |
|
|
|
This software may not be used to harm any animal deliberately. |
|
|
|
LICENSEE acknowledges that the MODEL is a research tool. |
|
THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING |
|
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. |
|
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, |
|
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL |
|
OR THE USE OR OTHER DEALINGS IN THE MODEL. |
|
|
|
If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis |
|
([email protected]) and/or the TTO office at EPFL ([email protected]) for a commercial use license. |
|
|
|
Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2. |
|
|
|
|
|
## References |
|
|
|
1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021 |
|
2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. |
|
2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019. |
|
3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: |
|
A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation |
|
(ICRA), pages 13901–13908, 2021. |
|
4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining |
|
boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, |
|
pages 1859–1868, 2021. |
|
5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop |
|
on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011. |
|
6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of |
|
animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018. |
|
7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth |
|
Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. |
|
8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020 |
|
9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer |
|
vision, pages 2961–2969, 2017. |
|
10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016. |
|
11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, |
|
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014 |
|
12. Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and |
|
tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022 |