FastHunyuan Model Card
Model Details
FastHunyuan is an accelerated HunyuanVideo model. It can sample high quality videos with 6 diffusion steps. That brings around 8X speed up compared to the original HunyuanVideo with 50 steps.
- Developed by: Hao AI Lab
- License: tencent-hunyuan-community
- Distilled from: HunyuanVideo
- Github Repository: https://github.com/hao-ai-lab/FastVideo
Usage
- Clone Fastvideo repository and follow the inference instructions in the README.
- Alternatively, you can inference FastHunyuan using the official Hunyuan Video repository by setting the shift to 17 and steps to 6, resolution to 720X1280X125, and cfg bigger than 6. We find that a large CFG scale generally leads to faster videos.
Training details
FastHunyuan is consistency distillated on the MixKit dataset with the following hyperparamters:
- Batch size: 16
- Resulotion: 720x1280
- Num of frames: 125
- Train steps: 320
- GPUs: 32
- LR: 1e-6
- Loss: huber
Evaluation
We provide some qualitative comparison between FastHunyuan 6 step inference v.s. the original Hunyuan with 6 step inference:
Memory requirements
Please check our github repo for details. https://github.com/hao-ai-lab/FastVideo
For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.
For Lora Finetune, minimum hardware requirement
- 40 GB GPU memory each for 2 GPUs with lora
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora.
- Downloads last month
- 448