a-r-r-o-w HF staff commited on
Commit
af6c70a
·
verified ·
1 Parent(s): f854f4b

Add quantization examples using torchao and quanto

Browse files

Hey, I'm Aryan from the Diffusers team 👋

Congratulations on the release of CogVideoX-5B!

It would be great to showcase some examples on how the quantized inference (`int8`, and other datatypes) can be run to lower memory requirements by using TorchAO and Quanto, especially since we mention it in the model card table. Feel free to modify the code/wording/URLs in whichever way you see best fit. Could we do it for the chinese README, CogVideoX-2B and CogVideo GitHub repo as well? Thanks!

Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -242,6 +242,61 @@ video = pipe(
242
  export_to_video(video, "output.mp4", fps=8)
243
  ```
244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
  ## Explore the Model
246
 
247
  Welcome to our [github](https://github.com/THUDM/CogVideo), where you will find:
 
242
  export_to_video(video, "output.mp4", fps=8)
243
  ```
244
 
245
+ ## Quantized Inference
246
+
247
+ [PytorchAO](https://github.com/pytorch/ao) and [Optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be used to quantize the Text Encoder, Transformer and VAE modules to lower the memory requirement of CogVideoX. This makes it possible to run the model on free-tier T4 Colab or smaller VRAM GPUs as well! It is also worth noting that TorchAO quantization is fully compatible with `torch.compile`, which allows for much faster inference speed.
248
+
249
+ ```diff
250
+ # To get started, PytorchAO needs to be installed from the GitHub source and PyTorch Nightly.
251
+ # Source and nightly installation is only required until next release.
252
+
253
+ import torch
254
+ from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXPipeline
255
+ from diffusers.utils import export_to_video
256
+ + from transformers import T5EncoderModel
257
+ + from torchao.quantization import quantize_, int8_weight_only, int8_dynamic_activation_int8_weight
258
+
259
+ + quantization = int8_weight_only
260
+
261
+ + text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
262
+ + quantize_(text_encoder, quantization())
263
+
264
+ + transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
265
+ + quantize_(transformer, quantization())
266
+
267
+ + vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
268
+ + quantize_(vae, quantization())
269
+
270
+ # Create pipeline and run inference
271
+ pipe = CogVideoXPipeline.from_pretrained(
272
+ "THUDM/CogVideoX-5b",
273
+ + text_encoder=text_encoder,
274
+ + transformer=transformer,
275
+ + vae=vae,
276
+ torch_dtype=torch.bfloat16,
277
+ )
278
+ pipe.enable_model_cpu_offload()
279
+ pipe.vae.enable_tiling()
280
+
281
+ prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
282
+
283
+ video = pipe(
284
+ prompt=prompt,
285
+ num_videos_per_prompt=1,
286
+ num_inference_steps=50,
287
+ num_frames=49,
288
+ guidance_scale=6,
289
+ generator=torch.Generator(device="cuda").manual_seed(42),
290
+ ).frames[0]
291
+
292
+ export_to_video(video, "output.mp4", fps=8)
293
+ ```
294
+
295
+ Additionally, the models can be serialized and stored in a quantized datatype to save disk space when using PytorchAO. Find examples and benchmarks at these links:
296
+ - [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
297
+ - [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
298
+
299
+
300
  ## Explore the Model
301
 
302
  Welcome to our [github](https://github.com/THUDM/CogVideo), where you will find: