File size: 4,073 Bytes

9162acd
 
2f55211
9162acd
 
 
 
 
 
 
 
 
672f20d
 
d507d7d
832f7e3
672f20d
d507d7d
832f7e3
672f20d
d507d7d
832f7e3
672f20d
19cd645
 
672f20d
19cd645
 
672f20d
19cd645
 
672f20d
19cd645
 
832f7e3
 
9162acd
 
 
 
 
 
 
3580aa5
9162acd
 
 
 
 
 
589686f
3580aa5
9162acd
 
3580aa5
9162acd
3580aa5
 
 
 
9162acd
 
 
3580aa5
 
 
 
0562234
3580aa5
0562234
9162acd

---
license: cc-by-nc-sa-4.0
pipeline_tag: image-to-video
tags:
- turing
- autonomous driving
- video generation
- world model
---

# Terra

<div id="scroll-container" style="display: flex; overflow-x: auto; gap: 10px; scroll-behavior: smooth; padding: 10px; border: 1px solid #ddd;">
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_1.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_2.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_3.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_4.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_5.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_6.mp4" type="video/mp4">
  </video>
  <video width="512" controls>
    <source src="https://huggingface.co/turing-motors/Terra/resolve/main/assets/videos/row_7.mp4" type="video/mp4">
  </video>
</div>

**Terra** is a world model designed for autonomous driving and serves as a baseline model in th [ACT-Bench](https://github.com/turingmotors/ACT-Bench) framework.
Terra generates video continuations based on short video clips of approximately three frames and trajectory instructions.
A key feature of Terra is its **high adherence to trajectory instructions**, enabling accurate and reliable action-conditioned video generation.

## Related Links

For more technical details and discussions, please refer to:
- **Paper:** https://arxiv.org/abs/2412.05337
- **Code:** https://github.com/turingmotors/ACT-Bench

## How to use

We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM. 

Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, we have not include its implementation in this Hugging Face repository. Instead, **the implementation and setup instructions for the Video Refiner are provided in [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench)**. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.

### Install Packages

We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it.

```shell
$ git clone https://huggingface.co/turing-motors/Terra
$ uv sync
```

### Action-Conditioned Video Generation without Video Refiner

```shell
$ python inference.py
```

This command generates a video using three image frames located in [`assets/conditioning_frames`](./assets/conditioning_frames/) and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file [`assets/template_trajectory.json`](./assets/template_trajectory.json).

You can find more details by referring to the [`inference.py`](./inference.py) script.

## Citation

```bibtex
@misc{arai2024actbench,
      title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving}, 
      author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi},
      year={2024},
      eprint={2412.05337},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05337}, 
}
```