# Vlogger This repository is the official implementation of [Vlogger](https://arxiv.org/abs/2401.09414): **[Vlogger: Make Your Dream A Vlog](https://arxiv.org/abs/2401.09414)** Demo generated by our Vlogger: [Teddy Travel](https://youtu.be/ZRD1-jHbEGk) ## Setup ### Prepare Environment ``` conda create -n vlogger python==3.10.11 conda activate vlogger pip install -r requirements.txt ``` ### Download our model and T2I base model Our model is based on Stable diffusion v1.4, you may download [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and [OpenCLIP-ViT-H-14](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) to the director of ``` pretrained ``` . Download our model(ShowMaker) checkpoint (from [google drive](https://drive.google.com/file/d/1pAH73kz2QRfD2Dxk4lL3SrHvLAlWcPI3/view?usp=drive_link) or [hugging face](https://huggingface.co/GrayShine/Vlogger/tree/main)) and save to the directory of ```pretrained``` Now under `./pretrained`, you should be able to see the following: ``` ├── pretrained │ ├── ShowMaker.pt │ ├── stable-diffusion-v1-4 │ ├── OpenCLIP-ViT-H-14 │ │ ├── ... └── └── ├── ... ├── ... ``` ## Usage ### Inference for (T+I)2V Run the following command to get the (T+I)2V results: ```python python sample_scripts/with_mask_sample.py ``` The generated video will be saved in ```results/mask_no_ref```. ### Inference for (T+I+ref)2V Run the following command to get the (T+I+ref)2V results: ```python python sample_scripts/with_mask_ref_sample.py ``` The generated video will be saved in ```results/mask_ref```. ### Inference for LLM planning and make reference image Run the following command to get script, actors and protagonist: ```python python sample_scripts/vlog_write_script.py ``` The generated scripts will be saved in ```results/vlog/$your_story_dir/script```. The generated reference images will be saved in ```results/vlog/$your_story_dir/img```. !!!important: Enter your openai key in the 7th line of the file ```vlogger/planning_utils/gpt4_utils.py``` ### Inference for vlog generation Run the following command to get the vlog: ```python python sample_scripts/vlog_read_script_sample.py ``` The generated scripts will be saved in ```results/vlog/$your_story_dir/video```. #### More Details You may modify ```configs/with_mask_sample.yaml``` to change the (T+I)2V conditions. You may modify ```configs/with_mask_ref_sample.yaml``` to change the (T+I+ref)2V conditions. For example: ```ckpt``` is used to specify a model checkpoint. ```text_prompt``` is used to describe the content of the video. ```input_path``` is used to specify the path to the image. ```ref_path``` is used to specify the path to the reference image. ```save_path``` is used to specify the path to the generated video. ## Results ### (T+Ref)2V Results
Reference Image | Output Video |
Scene Reference |
Fireworks explode over the pyramids. |
Scene Reference |
The Great Wall burning with raging fire. |
Object Reference |
A cat is running on the beach. |
Input Image | Output Video |
Underwater environment cosmetic bottles. |
|
A big drop of water falls on a rose petal. |
|
A fish swims past an oriental woman. |
|
Cinematic photograph. View of piloting aaero. |
|
Planet hits earth. |
Output Video | |
A deer looks at the sunset behind him. |
A duck is teaching math to another duck. |
Bezos explores tropical rainforest. |
Light blue water lapping on the beach. |