File size: 20,665 Bytes
7d421db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
# MotionDirector

This is the official repository of [MotionDirector](https://showlab.github.io/MotionDirector).

**MotionDirector: Motion Customization of Text-to-Video Diffusion Models.**
<br/>
[Rui Zhao](https://ruizhaocv.github.io/),
[Yuchao Gu](https://ycgu.site/), 
[Jay Zhangjie Wu](https://zhangjiewu.github.io/), 
[David Junhao Zhang](https://junhaozhang98.github.io/),
[Jiawei Liu](https://jia-wei-liu.github.io/),
[Weijia Wu](https://weijiawu.github.io/),
[Jussi Keppo](https://www.jussikeppo.com/),
[Mike Zheng Shou](https://sites.google.com/view/showlab)
<br/>

[![Project Page](https://img.shields.io/badge/Project-Website-orange)](https://showlab.github.io/MotionDirector)
[![arXiv](https://img.shields.io/badge/arXiv-MotionDirector-b31b1b.svg)](https://arxiv.org/abs/2310.08465)

<p align="center">
<img src="https://github.com/showlab/MotionDirector/blob/page/assets/teaser.gif" width="1080px"/>  
<br>
<em>MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.</em>
</p>

<table class="center">
<tr>
  <td style="text-align:center;" colspan="4"><b>Astronaut's daily life on Mars (Motion concepts learned by MotionDirector)</b></td>
</tr>
<tr>
<td style="text-align:center;"><b>Lifting Weights</b></td>
<td style="text-align:center;"><b>Playing Golf</b></td>
<td style="text-align:center;"><b>Riding Horse</b></td>
<td style="text-align:center;"><b>Riding Bicycle</b></td>
</tr>
<tr>
  <td><img src=assets/astronaut_mars/An_astronaut_is_lifting_weights_on_Mars_4K_high_quailty_highly_detailed_4008521.gif></td>
  <td><img src=assets/astronaut_mars/Astronaut_playing_golf_on_Mars_659514.gif></td>
  <td><img src=assets/astronaut_mars/An_astronaut_is_riding_a_horse_on_Mars_4K_high_quailty_highly_detailed_1913261.gif></td>              
  <td><img src=assets/astronaut_mars/An_astronaut_is_riding_a_bicycle_past_the_pyramids_Mars_4K_high_quailty_highly_detailed_5532778.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;">"An astronaut is lifting weights on Mars, 4K, high quailty, highly detailed.” </br> seed: 4008521</td>
  <td width=25% style="text-align:center;">"Astronaut playing golf on Mars” </br> seed: 659514</td>
  <td width=25% style="text-align:center;">"An astronaut is riding a horse on Mars, 4K, high quailty, highly detailed."  </br> seed: 1913261</td>
  <td width=25% style="text-align:center;">"An astronaut is riding a bicycle past the pyramids Mars, 4K, high quailty, highly detailed."  </br> seed: 5532778</td>
<tr>
</table>

## News
- [2023.12.06] [MotionDirector for Sports](#MotionDirector_for_Sports) released! Lifting weights, riding horse, palying golf, etc.
- [2023.12.05] [Colab demo](https://github.com/camenduru/MotionDirector-colab) is available. Thanks to [Camenduru](https://twitter.com/camenduru).
- [2023.12.04] [MotionDirector for Cinematic Shots](#MotionDirector_for_Cinematic_Shots) released. Now, you can make AI films with professional cinematic shots!
- [2023.12.02] Code and model weights released!

## ToDo
- [ ] Gradio Demo
- [ ] More trained weights of MotionDirector

## Setup
### Requirements

```shell
# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt
```

### Weights of Foundation Models
```shell
git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/
```
### Weights of trained MotionDirector <a name="download_weights"></a>
```shell
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs
```

## Usage
### Training

#### Train MotionDirector on multiple videos:
```bash
python MotionDirector_train.py --config ./configs/config_multi_videos.yaml
```
#### Train MotionDirector on a single video:
```bash
python MotionDirector_train.py --config ./configs/config_single_video.yaml
```

Note:  
- Before running the above command, 
make sure you replace the path to foundational model weights and training data with your own in the config files `config_multi_videos.yaml` or `config_single_video.yaml`.
- Generally, training on multiple 16-frame videos usually takes `300~500` steps, about `9~16` minutes using one A5000 GPU. Training on a single video takes `50~150` steps, about `1.5~4.5` minutes using one A5000 GPU. The required VRAM for training is around `14GB`.
- Reduce `n_sample_frames` if your GPU memory is limited.
- Reduce the learning rate and increase the training steps for better performance.


### Inference
```bash
python MotionDirector_inference.py --model /path/to/the/foundation/model  --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.
```
Note: 
- Replace `/path/to/the/foundation/model` with your own path to the foundation model, like ZeroScope.
- The value of `checkpoint_index` means the checkpoint saved at which the training step is selected.
- The value of `noise_prior` indicates how much the inversion noise of the reference video affects the generation. 
We recommend setting it to `0` for MotionDirector trained on multiple videos to achieve the highest diverse generation, while setting it to `0.1~0.5` for MotionDirector trained on a single video for faster convergence and better alignment with the reference video.


## Inference with pre-trained MotionDirector
All available weights are at official [Huggingface Repo](https://huggingface.co/ruizhaocv/MotionDirector_weights).
Run the [download command](#download_weights), the weights will be downloaded to the folder `outputs`, then run the following inference command to generate videos.

### MotionDirector trained on multiple videos:
```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A person is riding a bicycle past the Eiffel Tower." --checkpoint_folder ./outputs/train/riding_bicycle/ --checkpoint_index 300 --noise_prior 0. --seed 7192280
```
Note:  
- Replace `/path/to/the/ZeroScope` with your own path to the foundation model, i.e. the ZeroScope.
- Change the `prompt` to generate different videos. 
- The `seed` is set to a random value by default. Set it to a specific value will obtain certain results, as provided in the table below.

Results:

<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Videos</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/multi_videos_results/reference_videos.gif></td>
  <td><img src=assets/multi_videos_results/A_person_is_riding_a_bicycle_past_the_Eiffel_Tower_7192280.gif></td>
  <td><img src=assets/multi_videos_results/A_panda_is_riding_a_bicycle_in_a_garden_2178639.gif></td>              
  <td><img src=assets/multi_videos_results/An_alien_is_riding_a_bicycle_on_Mars_2390886.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A person is riding a bicycle."</td>
  <td width=25% style="text-align:center;">"A person is riding a bicycle past the Eiffel Tower.” </br> seed: 7192280</td>
  <td width=25% style="text-align:center;">"A panda is riding a bicycle in a garden."  </br> seed: 2178639</td>
  <td width=25% style="text-align:center;">"An alien is riding a bicycle on Mars."  </br> seed: 2390886</td>
</table>

### MotionDirector trained on a single video:
16 frames:
```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A tank is running on the moon." --checkpoint_folder ./outputs/train/car_16/ --checkpoint_index 150 --noise_prior 0.5 --seed 8551187
```
<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Video</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/single_video_results/reference_video.gif></td>
  <td><img src=assets/single_video_results/A_tank_is_running_on_the_moon_8551187.gif></td>
  <td><img src=assets/single_video_results/A_lion_is_running_past_the_pyramids_431554.gif></td>              
  <td><img src=assets/single_video_results/A_spaceship_is_flying_past_Mars_8808231.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td>
  <td width=25% style="text-align:center;">"A tank is running on the moon.” </br> seed: 8551187</td>
  <td width=25% style="text-align:center;">"A lion is running past the pyramids." </br> seed: 431554</td>
  <td width=25% style="text-align:center;">"A spaceship is flying past Mars."  </br> seed: 8808231</td>
</tr>
</table>

24 frames:
```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A truck is running past the Arc de Triomphe." --checkpoint_folder ./outputs/train/car_24/ --checkpoint_index 150 --noise_prior 0.5 --width 576 --height 320 --num-frames 24 --seed 34543
```
<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Video</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/single_video_results/24_frames/reference_video.gif></td>
  <td><img src=assets/single_video_results/24_frames/A_truck_is_running_past_the_Arc_de_Triomphe_34543.gif></td>
  <td><img src=assets/single_video_results/24_frames/An_elephant_is_running_in_a_forest_2171736.gif></td>              
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td>
  <td width=25% style="text-align:center;">"A truck is running past the Arc de Triomphe.” </br> seed: 34543</td>
  <td width=25% style="text-align:center;">"An elephant is running in a forest." </br> seed: 2171736</td>
 </tr>
<tr>
  <td><img src=assets/single_video_results/24_frames/reference_video.gif></td>
  <td><img src=assets/single_video_results/24_frames/A_person_on_a_camel_is_running_past_the_pyramids_4904126.gif></td>              
  <td><img src=assets/single_video_results/24_frames/A_spacecraft_is_flying_past_the_Milky_Way_galaxy_3235677.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td>
  <td width=25% style="text-align:center;">"A person on a camel is running past the pyramids." </br> seed: 4904126</td>
  <td width=25% style="text-align:center;">"A spacecraft is flying past the Milky Way galaxy."  </br> seed: 3235677</td>
</tr>
</table>

## MotionDirector for Sports <a name="MotionDirector_for_Sports"></a>

```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A panda is lifting weights in a garden." --checkpoint_folder ./outputs/train/lifting_weights/ --checkpoint_index 300 --noise_prior 0. --seed 9365597
```
<table class="center">
<tr>
  <td style="text-align:center;" colspan="4"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
<td style="text-align:center;" colspan="2"><b>Lifting Weights</b></td>
<td style="text-align:center;" colspan="2"><b>Riding Bicycle</b></td>
</tr>
<tr>
  <td><img src=assets/sports_results/lifting_weights/A_panda_is_lifting_weights_in_a_garden_1699276.gif></td>
  <td><img src=assets/sports_results/lifting_weights/A_police_officer_is_lifting_weights_in_front_of_the_police_station_6804745.gif></td>
  <td><img src=assets/multi_videos_results/A_panda_is_riding_a_bicycle_in_a_garden_2178639.gif></td>              
  <td><img src=assets/multi_videos_results/An_alien_is_riding_a_bicycle_on_Mars_2390886.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;">"A panda is lifting weights in a garden.” </br> seed: 1699276</td>
  <td width=25% style="text-align:center;">"A police officer is lifting weights in front of the police station.” </br> seed: 6804745</td>
  <td width=25% style="text-align:center;">"A panda is riding a bicycle in a garden."  </br> seed: 2178639</td>
  <td width=25% style="text-align:center;">"An alien is riding a bicycle on Mars."  </br> seed: 2390886</td>
<tr>
<td style="text-align:center;" colspan="2"><b>Riding Horse</b></td>
<td style="text-align:center;" colspan="2"><b>Playing Golf</b></td>
</tr>
<tr>
  <td><img src=assets/sports_results/riding_horse/A_Royal_Guard_riding_a_horse_in_front_of_Buckingham_Palace_4490970.gif></td>
  <td><img src=assets/sports_results/riding_horse/A_man_riding_an_elephant_through_the_jungle_6230765.gif></td>
  <td><img src=assets/sports_results/playing_golf/A_man_is_playing_golf_in_front_of_the_White_House_8870450.gif></td>              
  <td><img src=assets/sports_results/playing_golf/A_monkey_is_playing_golf_on_a_field_full_of_flowers_2989633.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;">"A Royal Guard riding a horse in front of Buckingham Palace.” </br> seed: 4490970</td>
  <td width=25% style="text-align:center;">"A man riding an elephant through the jungle.” </br> seed: 6230765</td>
  <td width=25% style="text-align:center;">"A man is playing golf in front of the White House."  </br> seed: 8870450</td>
  <td width=25% style="text-align:center;">"A monkey is playing golf on a field full of flowers."  </br> seed: 2989633</td>
<tr>
</table>

More sports, to be continued ...

## MotionDirector for Cinematic Shots <a name="MotionDirector_for_Cinematic_Shots"></a>

### 1. Zoom
#### 1.1 Dolly Zoom (Hitchcockian Zoom)
```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a dolly zoom." --checkpoint_folder ./outputs/train/dolly_zoom/ --checkpoint_index 150 --noise_prior 0.5 --seed 9365597
```
<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Video</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/cinematic_shots_results/dolly_zoom_16.gif></td>
  <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_dolly_zoom_9365597.gif></td>
  <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_dolly_zoom_1675932.gif></td>              
  <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_dolly_zoom_2310805.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A man standing in room captured with a dolly zoom."</td>
  <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a dolly zoom." </br> seed: 9365597 </br> noise_prior: 0.5</td>
  <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a dolly zoom." </br> seed: 1675932 </br> noise_prior: 0.5</td>
  <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a dolly zoom."  </br> seed: 2310805 </br> noise_prior: 0.5 </td>
</tr>
<tr>
  <td><img src=assets/cinematic_shots_results/dolly_zoom_16.gif></td>
  <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_dolly_zoom_4615820.gif></td>
  <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_dolly_zoom_4114896.gif></td>              
  <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_dolly_zoom_7492004.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A man standing in room captured with a dolly zoom."</td>
  <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a dolly zoom." </br> seed: 4615820 </br> noise_prior: 0.3</td>
  <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a dolly zoom." </br> seed: 4114896 </br> noise_prior: 0.3</td>
  <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a dolly zoom."  </br> seed: 7492004</td>
</tr>
</table>

#### 1.2 Zoom In
The reference video is shot with my own water cup. You can also pick up your cup or any other object to practice camera movements and turn it into imaginative videos. Create your AI films with customized camera movements!

```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom in." --checkpoint_folder ./outputs/train/zoom_in/ --checkpoint_index 150 --noise_prior 0.3 --seed 1429227
```
<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Video</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/cinematic_shots_results/zoom_in_16.gif></td>
  <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_zoom_in_1429227.gif></td>
  <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_zoom_in_487239.gif></td>              
  <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_zoom_in_1393184.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A cup in a lab captured with a zoom in."</td>
  <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a zoom in." </br> seed: 1429227</td>
  <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a zoom in." </br> seed: 487239 </td>
  <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a zoom in."  </br> seed: 1393184</td>
</tr>
</table>

#### 1.3 Zoom Out
```bash
python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom out." --checkpoint_folder ./outputs/train/zoom_out/ --checkpoint_index 150 --noise_prior 0.3 --seed 4971910
```
<table class="center">
<tr>
  <td style="text-align:center;"><b>Reference Video</b></td>
  <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td>
</tr>
<tr>
  <td><img src=assets/cinematic_shots_results/zoom_out_16.gif></td>
  <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_zoom_out_4971910.gif></td>
  <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_zoom_out_1767994.gif></td>              
  <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_zoom_out_8203639.gif></td>
</tr>
<tr>
  <td width=25% style="text-align:center;color:gray;">"A cup in a lab captured with a zoom out."</td>
  <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a zoom out." </br> seed: 4971910</td>
  <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a zoom out." </br> seed: 1767994 </td>
  <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a zoom out."  </br> seed: 8203639</td>
</tr>
</table>

More Cinematic Shots, to be continued ....

## More results

If you have a more impressive MotionDirector or generated videos, please feel free to open an issue and share them with us. We would greatly appreciate it.
Improvements to the code are also highly welcome.

Please refer to [Project Page](https://showlab.github.io/MotionDirector) for more results.


## Citation


```bibtex

@article{zhao2023motiondirector,
  title={MotionDirector: Motion Customization of Text-to-Video Diffusion Models},
  author={Zhao, Rui and Gu, Yuchao and Wu, Jay Zhangjie and Zhang, David Junhao and Liu, Jiawei and Wu, Weijia and Keppo, Jussi and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2310.08465},
  year={2023}
}

```

## Shoutouts

- This code builds on [diffusers](https://github.com/huggingface/diffusers) and [Text-To-Video-Finetuning](https://github.com/ExponentialML/Text-To-Video-Finetuning). Thanks for open-sourcing!
- Thanks to [camenduru](https://twitter.com/camenduru) for the [colab demo](https://github.com/camenduru/MotionDirector-colab).
- Thanks to [yhyu13](https://github.com/yhyu13) for the [Huggingface Repo](https://huggingface.co/Yhyu13/MotionDirector_LoRA).