Spaces:
Sleeping
Sleeping
FIrst hlaf
Browse files- README.md +159 -12
- __init__.py +0 -0
- requirements.txt +14 -0
- test_body.sh +8 -0
- test_face.sh +8 -0
- train_body_pixel.sh +5 -0
- train_body_vq.sh +5 -0
- train_face.sh +5 -0
- visualise.sh +8 -0
README.md
CHANGED
@@ -1,12 +1,159 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023]
|
2 |
+
|
3 |
+
The official PyTorch implementation of the **CVPR2023** paper [**"Generating Holistic 3D Human Motion from Speech"**](https://arxiv.org/abs/2212.04420).
|
4 |
+
|
5 |
+
Please visit our [**webpage**](https://talkshow.is.tue.mpg.de/) for more details.
|
6 |
+
|
7 |
+
![teaser](visualise/teaser_01.png)
|
8 |
+
|
9 |
+
## HighLight
|
10 |
+
|
11 |
+
We directly provide the input and our output for the demo data, you can find them in `/demo/` and `/demo_audio/`. TalkSHOW can generalize well on English, French, Songs so far. Looking forward to more demos.
|
12 |
+
|
13 |
+
You can directly use the generated motion to animate your 3D character or your own digital avatar. We will provide more demos, please stay tuned. And we are quite looking forward to your pull request.
|
14 |
+
|
15 |
+
## Notes
|
16 |
+
|
17 |
+
We are using 100 dimension parameters for SMPL-X facial expression, if you need other dimensions parameters, you can use this code to convert.
|
18 |
+
|
19 |
+
```
|
20 |
+
https://github.com/yhw-yhw/SHOW/blob/main/cvt_exp_dim_tool.py
|
21 |
+
```
|
22 |
+
|
23 |
+
## TODO
|
24 |
+
|
25 |
+
- [x] [🤗Hugging Face Demo](https://huggingface.co/spaces/feifeifeiliu/TalkSHOW)
|
26 |
+
- [ ] Animated 2D videos by the generated motion from TalkSHOW.
|
27 |
+
|
28 |
+
|
29 |
+
## Getting started
|
30 |
+
|
31 |
+
The training code was tested on `Ubuntu 18.04.5 LTS` and the visualization code was test on `Windows 10`, and it requires:
|
32 |
+
|
33 |
+
* Python 3.7
|
34 |
+
* conda3 or miniconda3
|
35 |
+
* CUDA capable GPU (one is enough)
|
36 |
+
|
37 |
+
|
38 |
+
|
39 |
+
### 1. Setup environment
|
40 |
+
|
41 |
+
Clone the repo:
|
42 |
+
```bash
|
43 |
+
git clone https://github.com/yhw-yhw/TalkSHOW
|
44 |
+
cd TalkSHOW
|
45 |
+
```
|
46 |
+
Create conda environment:
|
47 |
+
```bash
|
48 |
+
conda create --name talkshow python=3.7
|
49 |
+
conda activate talkshow
|
50 |
+
```
|
51 |
+
Please install pytorch (v1.10.1).
|
52 |
+
|
53 |
+
pip install -r requirements.txt
|
54 |
+
|
55 |
+
Please install [**MPI-Mesh**](https://github.com/MPI-IS/mesh).
|
56 |
+
|
57 |
+
### 2. Get data
|
58 |
+
|
59 |
+
Please note that if you only want to generate demo videos, you can skip this step and directly download the pretrained models.
|
60 |
+
|
61 |
+
Download [**SHOW_dataset_v1.0.zip**](https://download.is.tue.mpg.de/download.php?domain=talkshow&resume=1&sfile=SHOW_dataset_v1.0.zip) from [**TalkSHOW download webpage**](https://talkshow.is.tue.mpg.de/download.php),
|
62 |
+
unzip using ``for i in $(ls *.tar.gz);do tar xvf $i;done``.
|
63 |
+
|
64 |
+
~~Run ``python data_utils/dataset_preprocess.py`` to check and split dataset.
|
65 |
+
Modify ``data_root`` in ``config/*.json`` to the dataset-path.~~
|
66 |
+
|
67 |
+
Modify ``data_root`` in ``data_utils/apply_split.py`` to the dataset path and run it to apply ``data_utils/split_more_than_2s.pkl`` to the dataset.
|
68 |
+
|
69 |
+
We will update the benchmark soon.
|
70 |
+
|
71 |
+
### 3. Download the pretrained models (Optional)
|
72 |
+
|
73 |
+
Download [**pretrained models**](https://drive.google.com/file/d/1bC0ZTza8HOhLB46WOJ05sBywFvcotDZG/view?usp=sharing),
|
74 |
+
unzip and place it in the TalkSHOW folder, i.e. ``path-to-TalkSHOW/experiments``.
|
75 |
+
|
76 |
+
### 4. Training
|
77 |
+
Please note that the process of loading data for the first time can be quite slow. If you have already completed the loading process, setting ``dataset_load_mode`` to ``pickle`` in ``config/[config_name].json`` will make the loading process much faster.
|
78 |
+
|
79 |
+
# 1. Train VQ-VAEs.
|
80 |
+
bash train_body_vq.sh
|
81 |
+
# 2. Train PixelCNN. Please modify "Model:vq_path" in config/body_pixel.json to the path of VQ-VAEs.
|
82 |
+
bash train_body_pixel.sh
|
83 |
+
# 3. Train face generator.
|
84 |
+
bash train_face.sh
|
85 |
+
|
86 |
+
### 5. Testing
|
87 |
+
|
88 |
+
Modify the arguments in ``test_face.sh`` and ``test_body.sh``. Then
|
89 |
+
|
90 |
+
bash test_face.sh
|
91 |
+
bash test_body.sh
|
92 |
+
|
93 |
+
### 5. Visualization
|
94 |
+
|
95 |
+
If you ssh into the linux machine, NotImplementedError might occur. In this case, please refer to [**issue**](https://github.com/MPI-IS/mesh/issues/66) for solving the error.
|
96 |
+
|
97 |
+
Download [**smplx model**](https://drive.google.com/file/d/1Ly_hQNLQcZ89KG0Nj4jYZwccQiimSUVn/view?usp=share_link) (Please register in the official [**SMPLX webpage**](https://smpl-x.is.tue.mpg.de) before you use it.)
|
98 |
+
and place it in ``path-to-TalkSHOW/visualise/smplx_model``.
|
99 |
+
To visualise the test set and generated result (in each video, left: generated result | right: ground truth).
|
100 |
+
The videos and generated motion data are saved in ``./visualise/video/body-pixel``:
|
101 |
+
|
102 |
+
bash visualise.sh
|
103 |
+
|
104 |
+
If you ssh into the linux machine, there might be an error about OffscreenRenderer. In this case, please refer to [**issue**](https://github.com/MPI-IS/mesh/issues/66) for solving the error.
|
105 |
+
|
106 |
+
To reproduce the demo videos, run
|
107 |
+
```bash
|
108 |
+
# the whole body demo
|
109 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/1st-page.wav --id 0 --whole_body
|
110 |
+
# the face demo
|
111 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 0 --only_face
|
112 |
+
# the identity-specific demo
|
113 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 0
|
114 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 1
|
115 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 2
|
116 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 3 --stand
|
117 |
+
# the diversity demo
|
118 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/style.wav --id 0 --num_samples 12
|
119 |
+
# the french demo
|
120 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/french.wav --id 0
|
121 |
+
# the synthetic speech demo
|
122 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/rich.wav --id 0
|
123 |
+
# the song demo
|
124 |
+
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/song.wav --id 0
|
125 |
+
````
|
126 |
+
### 6. Baseline
|
127 |
+
|
128 |
+
For training the reproducted "Learning Speech-driven 3D Conversational Gestures from Video" (Habibie et al.), you could run
|
129 |
+
```bash
|
130 |
+
python -W ignore scripts/train.py --speakers oliver seth conan chemistry --config_file ./config/LS3DCG.json
|
131 |
+
```
|
132 |
+
|
133 |
+
For visualization with the pretrained model, download the above [pretrained models](#3-download-the-pretrained-models--optional-) and run
|
134 |
+
```bash
|
135 |
+
python scripts/demo.py --config_file ./config/LS3DCG.json --infer --audio_file ./demo_audio/style.wav --body_model_name s2g_LS3DCG --body_model_path experiments/2022-10-19-smplx_S2G-LS3DCG/ckpt-99.pth --id 0
|
136 |
+
```
|
137 |
+
|
138 |
+
## Citation
|
139 |
+
If you find our work useful to your research, please consider citing:
|
140 |
+
```
|
141 |
+
@inproceedings{yi2022generating,
|
142 |
+
title={Generating Holistic 3D Human Motion from Speech},
|
143 |
+
author={Yi, Hongwei and Liang, Hualin and Liu, Yifei and Cao, Qiong and Wen, Yandong and Bolkart, Timo and Tao, Dacheng and Black, Michael J},
|
144 |
+
booktitle={CVPR},
|
145 |
+
year={2023}
|
146 |
+
}
|
147 |
+
```
|
148 |
+
|
149 |
+
## Acknowledgements
|
150 |
+
For functions or scripts that are based on external sources, we acknowledge the origin individually in each file.
|
151 |
+
Here are some great resources we benefit:
|
152 |
+
- [Freeform](https://github.com/TheTempAccount/Co-Speech-Motion-Generation) for training pipeline
|
153 |
+
- [MPI-Mesh](https://github.com/MPI-IS/mesh), [Pyrender](https://github.com/mmatl/pyrender), [Smplx](https://github.com/vchoutas/smplx), [VOCA](https://github.com/TimoBolkart/voca) for rendering
|
154 |
+
- [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) and [Faceformer](https://github.com/EvelynFan/FaceFormer) for audio encoder
|
155 |
+
|
156 |
+
## Contact
|
157 |
+
For questions, please contact [email protected] or [email protected] or [email protected] or [email protected]
|
158 |
+
|
159 |
+
For commercial licensing, please contact [email protected]
|
__init__.py
ADDED
File without changes
|
requirements.txt
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
numpy~=1.21.5
|
2 |
+
transformers~=4.22.1
|
3 |
+
matplotlib~=3.2.2
|
4 |
+
textgrid~=1.5
|
5 |
+
smplx~=0.1.28
|
6 |
+
scikit-learn~=1.0.2
|
7 |
+
pyrender~=0.1.45
|
8 |
+
trimesh~=3.14.1
|
9 |
+
tqdm~=4.64.1
|
10 |
+
librosa~=0.9.2
|
11 |
+
scipy~=1.7.3
|
12 |
+
python_speech_features~=0.6
|
13 |
+
opencv-python~=4.7.0.68
|
14 |
+
pyglet~=1.5
|
test_body.sh
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/test_body.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/body_pixel.json \
|
6 |
+
--body_model_name s2g_body_pixel \
|
7 |
+
--body_model_path ./experiments/2022-11-02-smplx_S2G-body-pixel-3d/ckpt-99.pth \
|
8 |
+
--infer
|
test_face.sh
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/test_face.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/face.json \
|
6 |
+
--face_model_name s2g_face \
|
7 |
+
--face_model_path ./experiments/2022-10-15-smplx_S2G-face-3d/ckpt-99.pth \
|
8 |
+
--infer
|
train_body_pixel.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/train.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/body_pixel.json
|
train_body_vq.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/train.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/body_vq.json
|
train_face.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/train.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/face.json
|
visualise.sh
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
python -W ignore scripts/diversity.py \
|
2 |
+
--save_dir experiments \
|
3 |
+
--exp_name smplx_S2G \
|
4 |
+
--speakers oliver seth conan chemistry \
|
5 |
+
--config_file ./config/body_pixel.json \
|
6 |
+
--face_model_path ./experiments/2022-10-15-smplx_S2G-face-3d/ckpt-99.pth \
|
7 |
+
--body_model_path ./experiments/2022-11-02-smplx_S2G-body-pixel-3d/ckpt-99.pth \
|
8 |
+
--infer
|