File size: 10,985 Bytes
4450790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# FILM: Frame Interpolation for Large Motion

### [Website](https://film-net.github.io/) | [Paper](https://arxiv.org/pdf/2202.04901.pdf) | [Google AI Blog](https://ai.googleblog.com/2022/10/large-motion-frame-interpolation.html) | [Tensorflow Hub Colab](https://www.tensorflow.org/hub/tutorials/tf_hub_film_example) | [YouTube](https://www.youtube.com/watch?v=OAD-BieIjH4) <br>

The official Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn't use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone. <br>

[FILM: Frame Interpolation for Large Motion](https://arxiv.org/abs/2202.04901) <br />
[Fitsum Reda](https://fitsumreda.github.io/)<sup>1</sup>, [Janne Kontkanen](https://scholar.google.com/citations?user=MnXc4JQAAAAJ&hl=en)<sup>1</sup>, [Eric Tabellion](http://www.tabellion.org/et/)<sup>1</sup>, [Deqing Sun](https://deqings.github.io/)<sup>1</sup>, [Caroline Pantofaru](https://scholar.google.com/citations?user=vKAKE1gAAAAJ&hl=en)<sup>1</sup>, [Brian Curless](https://homes.cs.washington.edu/~curless/)<sup>1,2</sup><br />
<sup>1</sup>Google Research, <sup>2</sup>University of Washington<br />
In ECCV 2022.

![A sample 2 seconds moment.](https://github.com/googlestaging/frame-interpolation/blob/main/moment.gif)
FILM transforms near-duplicate photos into a slow motion footage that look like it is shot with a video camera.

## Web Demo

Integrated into [Hugging Face Spaces πŸ€—](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/johngoad/frame-interpolation)

Try the interpolation model with the replicate web demo at 
[![Replicate](https://replicate.com/google-research/frame-interpolation/badge)](https://replicate.com/google-research/frame-interpolation)

Try FILM to interpolate between two or more images with the PyTTI-Tools at [![PyTTI-Tools:FILM](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.sandbox.google.com/github/pytti-tools/frame-interpolation/blob/main/PyTTI_Tools_FiLM-colab.ipynb#scrollTo=-7TD7YZJbsy_)

An alternative Colab for running FILM on arbitrarily more input images, not just on two images, [![FILM-Gdrive](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NuaPPSvUhYafymUf2mEkvhnEtpD5oihs)

## Change Log
* **Nov 28, 2022**: Upgrade `eval.interpolator_cli` for **high resolution frame interpolation**. `--block_height` and `--block_width` determine the total number of patches (`block_height*block_width`) to subdivide the input images. By default, both arguments are set to 1, and so no subdivision will be done.
* **Mar 12, 2022**: Support for Windows, see [WINDOWS_INSTALLATION.md](https://github.com/google-research/frame-interpolation/blob/main/WINDOWS_INSTALLATION.md).
* **Mar 09, 2022**: Support for **high resolution frame interpolation**. Set `--block_height` and `--block_width` in `eval.interpolator_test` to extract patches from the inputs, and reconstruct the interpolated frame from the iteratively interpolated patches.

## Installation

*   Get Frame Interpolation source codes

```
git clone https://github.com/google-research/frame-interpolation
cd frame-interpolation
```

*   Optionally, pull the recommended Docker base image

```
docker pull gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest
```

* If you do not use Docker, set up your NVIDIA GPU environment with:
    * [Anaconda Python 3.9](https://www.anaconda.com/products/individual)
    * [CUDA Toolkit 11.2.1](https://developer.nvidia.com/cuda-11.2.1-download-archive)
    * [cuDNN 8.1.0](https://developer.nvidia.com/rdp/cudnn-download)

*   Install frame interpolation dependencies

```
pip3 install -r requirements.txt
sudo apt-get install -y ffmpeg
```

### See [WINDOWS_INSTALLATION](https://github.com/google-research/frame-interpolation/blob/main/WINDOWS_INSTALLATION.md) for Windows Support

## Pre-trained Models

*   Create a directory where you can keep large files. Ideally, not in this
    directory.

```
mkdir -p <pretrained_models>
```

*   Download pre-trained TF2 Saved Models from
    [google drive](https://drive.google.com/drive/folders/1q8110-qp225asX3DQvZnfLfJPkCHmDpy?usp=sharing)
    and put into `<pretrained_models>`.

The downloaded folder should have the following structure:

```
<pretrained_models>/
β”œβ”€β”€ film_net/
β”‚   β”œβ”€β”€ L1/
β”‚   β”œβ”€β”€ Style/
β”‚   β”œβ”€β”€ VGG/
β”œβ”€β”€ vgg/
β”‚   β”œβ”€β”€ imagenet-vgg-verydeep-19.mat
```

## Running the Codes

The following instructions run the interpolator on the photos provided in
'frame-interpolation/photos'.

### One mid-frame interpolation

To generate an intermediate photo from the input near-duplicate photos, simply run:

```
python3 -m eval.interpolator_test \
   --frame1 photos/one.png \
   --frame2 photos/two.png \
   --model_path <pretrained_models>/film_net/Style/saved_model \
   --output_frame photos/output_middle.png
```

This will produce the sub-frame at `t=0.5` and save as 'photos/output_middle.png'.

### Many in-between frames interpolation

It takes in a set of directories identified by a glob (--pattern). Each directory
is expected to contain at least two input frames, with each contiguous frame
pair treated as an input to generate in-between frames. Frames should be named such that when sorted (naturally) with `natsort`, their desired order is unchanged.

```
python3 -m eval.interpolator_cli \
   --pattern "photos" \
   --model_path <pretrained_models>/film_net/Style/saved_model \
   --times_to_interpolate 6 \
   --output_video
```

You will find the interpolated frames (including the input frames) in
'photos/interpolated_frames/', and the interpolated video at
'photos/interpolated.mp4'.

The number of frames is determined by `--times_to_interpolate`, which controls
the number of times the frame interpolator is invoked. When the number of frames
in a directory is `num_frames`, the number of output frames will be
`(2^times_to_interpolate+1)*(num_frames-1)`.

## Datasets

We use [Vimeo-90K](http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip) as
our main training dataset. For quantitative evaluations, we rely on commonly
used benchmark datasets, specifically:

*   [Vimeo-90K](http://data.csail.mit.edu/tofu/testset/vimeo_interp_test.zip)
*   [Middlebury-Other](https://vision.middlebury.edu/flow/data)
*   [UCF101](https://people.cs.umass.edu/~hzjiang/projects/superslomo/UCF101_results.zip)
*   [Xiph](https://github.com/sniklaus/softmax-splatting/blob/master/benchmark.py)

### Creating a TFRecord

The training and benchmark evaluation scripts expect the frame triplets in the
[TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) storage format. <br />

We have included scripts that encode the relevant frame triplets into a
[tf.train.Example](https://www.tensorflow.org/api_docs/python/tf/train/Example)
data format, and export to a TFRecord file. <br />

You can use the commands `python3 -m
datasets.create_<dataset_name>_tfrecord --help` for more information.

For example, run the command below to create a TFRecord for the Middlebury-other
dataset. Download the [images](https://vision.middlebury.edu/flow/data) and point `--input_dir` to the unzipped folder path.

```
python3 -m datasets.create_middlebury_tfrecord \
  --input_dir=<root folder of middlebury-other> \
  --output_tfrecord_filepath=<output tfrecord filepath> \
  --num_shards=3
```

The above command will output a TFRecord file with 3 shards as `<output tfrecord filepath>@3`.

## Training

Below are our training gin configuration files for the different loss function:

```
training/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ film_net-L1.gin
β”‚   β”œβ”€β”€ film_net-VGG.gin
β”‚   β”œβ”€β”€ film_net-Style.gin
```

To launch a training, simply pass the configuration filepath to the desired
experiment. <br />
By default, it uses all visible GPUs for training. To debug or train
on a CPU, append `--mode cpu`.

```
python3 -m training.train \
   --gin_config training/config/<config filename>.gin \
   --base_folder <base folder for all training runs> \
   --label <descriptive label for the run>
```

*   When training finishes, the folder structure will look like this:

```
<base_folder>/
β”œβ”€β”€ <label>/
β”‚   β”œβ”€β”€ config.gin
β”‚   β”œβ”€β”€ eval/
β”‚   β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ saved_model/
```

### Build a SavedModel

Optionally, to build a
[SavedModel](https://www.tensorflow.org/guide/saved_model) format from a trained
checkpoints folder, you can use this command:

```
python3 -m training.build_saved_model_cli \
   --base_folder <base folder of training sessions> \
   --label <the name of the run>
```

*   By default, a SavedModel is created when the training loop ends, and it will be saved at
    `<base_folder>/<label>/saved_model`.

## Evaluation on Benchmarks

Below, we provided the evaluation gin configuration files for the benchmarks we
have considered:

```
eval/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ middlebury.gin
β”‚   β”œβ”€β”€ ucf101.gin
β”‚   β”œβ”€β”€ vimeo_90K.gin
β”‚   β”œβ”€β”€ xiph_2K.gin
β”‚   β”œβ”€β”€ xiph_4K.gin
```

To run an evaluation, simply pass the configuration file of the desired evaluation dataset. <br />
If a GPU is visible, it runs on it.

```
python3 -m eval.eval_cli \
   --gin_config eval/config/<eval_dataset>.gin \
   --model_path <pretrained_models>/film_net/L1/saved_model
```

The above command will produce the PSNR and SSIM scores presented in the paper.

## Citation

If you find this implementation useful in your works, please acknowledge it
appropriately by citing:

```
@inproceedings{reda2022film,
 title = {FILM: Frame Interpolation for Large Motion},
 author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
 booktitle = {European Conference on Computer Vision (ECCV)},
 year = {2022}
}
```

```
@misc{film-tf,
  title = {Tensorflow 2 Implementation of "FILM: Frame Interpolation for Large Motion"},
  author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/google-research/frame-interpolation}}
}
```

## Acknowledgments

We would like to thank Richard Tucker, Jason Lai and David Minnen. We would also
like to thank Jamie Aspinall for the imagery included in this repository.

## Coding style

*   2 spaces for indentation
*   80 character line length
*   PEP8 formatting

## Disclaimer

This is not an officially supported Google product.