File size: 9,196 Bytes
674d42d
829b336
 
 
10c33d0
 
674d42d
829b336
10c33d0
829b336
10c33d0
5346d30
10c33d0
 
5346d30
10c33d0
673140e
10c33d0
 
 
673140e
 
 
 
 
 
 
 
 
 
 
10c33d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
673140e
 
 
10c33d0
 
 
 
 
 
673140e
10c33d0
673140e
 
 
 
 
088d7d0
 
10c33d0
088d7d0
673140e
10c33d0
673140e
548dd33
673140e
 
 
10c33d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
673140e
10c33d0
673140e
 
10c33d0
673140e
 
10c33d0
673140e
10c33d0
673140e
10c33d0
673140e
10c33d0
 
673140e
10c33d0
 
088d7d0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
tags:
- computer_vision
- pose_estimation
- animal_pose_estimation
- deeplabcut
---

# MODEL CARD:

## Model Details

• SuperAnimal-Quadruped model developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images. 
Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details.

• The model is an HRNet-w32 trained on our Quadruped-80K dataset.

• It was trained within the DeepLabCut framework. Full training details can be found in Ye et al. 2023.
You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). 
Here is an example useage:

```python
from pathlib import Path
from dlclibrary import download_huggingface_model

# Creates a folder and downloads the model to it
model_dir = Path("./superanimal_quadruped_model")
model_dir.mkdir()
download_huggingface_model("superanimal_quadruped", model_dir)
```

## Intended Use
• Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting
point than ImageNet weights in downstream datasets such as AP-10K.

• Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience
and ecology.

• Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with
minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those
we show in the paper.

## Factors

• Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and
resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints.
When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal,
if used without further fine-tuning or with a method such as BUCTD (Zhou et al. 2023 ICCV).

## Metrics
• Mean Average Precision (mAP)

## Evaluation Data
• In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here,
we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned"
on all animal training data listed below. This model is meant for production and evaluation in downstream scientific
applications.

## Training Data:

It consists of being trained together on the following datasets:

- **AwA-Pose** Quadruped dataset, see full details at (1).
- **AnimalPose** See full details at (2).
- **AcinoSet** See full details at (3).
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
- **StanfordDogs** See full details at (5, 6).
- **AP-10K** See full details at (7).
- **iRodent** We utilized the iNaturalist API functions for scraping observations
with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10),
pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and
segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.
**APT-36K** See full details at (12). 

Here is an image with the keypoint guide:
<p align="center">
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1702502929727-OS5FPNIBNTVIR4LL1Q0B/kypts_SAQ.png?format=1500w" width="95%">
</p>


## Ethical Considerations

• No experimental data was collected for this model; all datasets used are cited.

## Caveats and Recommendations

• The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal
characteristics not well-represented in the training data.

• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a
unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary
Note on annotator bias). 

• Note the dataset is highly diverse across species, but collectively has more
representation of domesticated animals like dogs, cats, horses, and cattle. 

• We recommend if performance is not as
good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own
labeling.

## License

Modified MIT.

Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors. 

Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software:

This software may not be used to harm any animal deliberately.

LICENSEE acknowledges that the MODEL is a research tool. 
THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING 
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL
OR THE USE OR OTHER DEALINGS IN THE MODEL.

If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis 
([email protected]) and/or the TTO office at EPFL ([email protected]) for a commercial use license.

Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2.


## References

1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset:
A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation
(ICRA), pages 13901–13908, 2021.
4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining
boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
pages 1859–1868, 2021.
5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop
on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of
animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth
Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020
9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer
vision, pages 2961–2969, 2017.
10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar,
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
12. Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and
tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022