Update README.md
Browse files
README.md
CHANGED
@@ -1,29 +1,60 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
tags:
|
4 |
- computer_vision
|
5 |
- pose_estimation
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
|
9 |
|
|
|
10 |
|
11 |
-
-
|
12 |
-
|
13 |
-
please contact EPFL-TTO (https://tto.epfl.ch/) for a full commercial license.
|
14 |
|
15 |
-
|
16 |
|
|
|
|
|
|
|
17 |
|
18 |
-
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
|
|
|
|
|
|
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
-
|
|
|
25 |
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
- **AwA-Pose** Quadruped dataset, see full details at (1).
|
29 |
- **AnimalPose** See full details at (2).
|
@@ -31,30 +62,74 @@ It consists of being trained together on the following datasets, which were othe
|
|
31 |
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
|
32 |
- **StanfordDogs** See full details at (5, 6).
|
33 |
- **AP-10K** See full details at (7).
|
34 |
-
- **iRodent**
|
35 |
with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the
|
36 |
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
|
37 |
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
|
38 |
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
|
39 |
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
|
40 |
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
|
41 |
-
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (
|
42 |
-
pretrained on the COCO datasets (
|
43 |
-
segmentation masks.
|
44 |
|
45 |
-
Here is an image with the keypoint guide
|
46 |
-
|
|
|
|
|
47 |
|
48 |
-
Please note that each
|
49 |
-
to a unified pose vocabulary, there will be annotator bias in keypoint placement (See
|
50 |
You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle.
|
51 |
We recommend if performance is not as good as you need it to be, first try video adaptation (see Ye et al. 2023),
|
52 |
-
or fine-tune these weights with your own labeling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
<p align="center">
|
55 |
-
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
|
56 |
-
</p>
|
57 |
|
|
|
58 |
|
59 |
1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
|
60 |
2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation.
|
@@ -76,4 +151,4 @@ Conference on Neural Information Processing Systems Datasets and Benchmarks Trac
|
|
76 |
vision, pages 2961–2969, 2017.
|
77 |
10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
|
78 |
11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar,
|
79 |
-
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
|
|
|
1 |
---
|
|
|
2 |
tags:
|
3 |
- computer_vision
|
4 |
- pose_estimation
|
5 |
+
- animal_pose_estimation
|
6 |
+
- deeplabcut
|
7 |
---
|
8 |
|
9 |
+
# MODEL CARD:
|
10 |
|
11 |
+
## Model Details
|
12 |
|
13 |
+
• SuperAnimal-Quadruped model developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images.
|
14 |
+
Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details.
|
|
|
15 |
|
16 |
+
• The model is an HRNet-w32 trained on our Quadruped-80K dataset.
|
17 |
|
18 |
+
• It was trained within the DeepLabCut framework. Full training details can be found in Ye et al. 2023.
|
19 |
+
You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary).
|
20 |
+
Here is an example useage:
|
21 |
|
22 |
+
```python
|
23 |
+
from pathlib import Path
|
24 |
+
from dlclibrary import download_huggingface_model
|
25 |
|
26 |
+
# Creates a folder and downloads the model to it
|
27 |
+
model_dir = Path("./superanimal_quadruped_model")
|
28 |
+
model_dir.mkdir()
|
29 |
+
download_huggingface_model("superanimal_quadruped", model_dir)
|
30 |
+
```
|
31 |
|
32 |
+
## Intended Use
|
33 |
+
• Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting
|
34 |
+
point than ImageNet weights in downstream datasets such as AP-10K.
|
35 |
+
• Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience
|
36 |
+
and ecology.
|
37 |
+
• Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with
|
38 |
+
minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those
|
39 |
+
we show in the paper.
|
40 |
+
Factors
|
41 |
+
• Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and
|
42 |
+
resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints.
|
43 |
+
When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal,
|
44 |
+
if used without further fine-tuning or with a method such as BUCTD (36).
|
45 |
|
46 |
+
## Metrics
|
47 |
+
• Mean Average Precision (mAP)
|
48 |
|
49 |
+
## Evaluation Data
|
50 |
+
• In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here,
|
51 |
+
we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned"
|
52 |
+
on all animal training data listed below. This model is meant for production and evaluation in downstream scientific
|
53 |
+
applications.
|
54 |
+
|
55 |
+
## Training Data:
|
56 |
+
|
57 |
+
It consists of being trained together on the following datasets:
|
58 |
|
59 |
- **AwA-Pose** Quadruped dataset, see full details at (1).
|
60 |
- **AnimalPose** See full details at (2).
|
|
|
62 |
- **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
|
63 |
- **StanfordDogs** See full details at (5, 6).
|
64 |
- **AP-10K** See full details at (7).
|
65 |
+
- **iRodent** We utilized the iNaturalist API functions for scraping observations
|
66 |
with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the
|
67 |
ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
|
68 |
Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
|
69 |
Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
|
70 |
(Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
|
71 |
generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
|
72 |
+
uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (8) model with a ResNet-50-FPN backbone (9),
|
73 |
+
pretrained on the COCO datasets (10). The processed 443 images were then manually labeled with both pose annotations and
|
74 |
+
segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.
|
75 |
|
76 |
+
Here is an image with the keypoint guide:
|
77 |
+
<p align="center">
|
78 |
+
<img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
|
79 |
+
</p>
|
80 |
|
81 |
+
Please note that each dataset was labeled by separate labs \& separate individuals, therefore while we map names
|
82 |
+
to a unified pose vocabulary (found here: https://github.com/AdaptiveMotorControlLab/modelzoo-figures), there will be annotator bias in keypoint placement (See the Supplementary Note on annotator bias).
|
83 |
You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle.
|
84 |
We recommend if performance is not as good as you need it to be, first try video adaptation (see Ye et al. 2023),
|
85 |
+
or fine-tune these weights with your own labeling.
|
86 |
+
|
87 |
+
|
88 |
+
## Ethical Considerations
|
89 |
+
|
90 |
+
• No experimental data was collected for this model; all datasets used are cited.
|
91 |
+
|
92 |
+
## Caveats and Recommendations
|
93 |
+
|
94 |
+
• The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal
|
95 |
+
characteristics not well-represented in the training data.
|
96 |
+
|
97 |
+
• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a
|
98 |
+
unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary
|
99 |
+
Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more
|
100 |
+
representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance is not as
|
101 |
+
good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own
|
102 |
+
labeling.
|
103 |
+
|
104 |
+
## License
|
105 |
+
|
106 |
+
Modified MIT.
|
107 |
+
|
108 |
+
Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors.
|
109 |
+
|
110 |
+
Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
|
111 |
+
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
|
112 |
+
to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions:
|
113 |
+
|
114 |
+
The above copyright notice and this permission notice shall be included in all copies or substantial
|
115 |
+
portions of the Software:
|
116 |
+
|
117 |
+
This software may not be used to harm any animal deliberately.
|
118 |
+
|
119 |
+
LICENSEE acknowledges that the MODEL is a research tool.
|
120 |
+
THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
|
121 |
+
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
122 |
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
123 |
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL
|
124 |
+
OR THE USE OR OTHER DEALINGS IN THE MODEL.
|
125 |
+
|
126 |
+
If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
|
127 |
+
([email protected]) and/or the TTO office at EPFL ([email protected]) for a commercial use license.
|
128 |
+
|
129 |
+
Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2.
|
130 |
|
|
|
|
|
|
|
131 |
|
132 |
+
## References
|
133 |
|
134 |
1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
|
135 |
2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation.
|
|
|
151 |
vision, pages 2961–2969, 2017.
|
152 |
10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
|
153 |
11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar,
|
154 |
+
and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
|