zR commited on
Commit
d16a569
·
1 Parent(s): 35871ed

change qingying page

Browse files
Files changed (2) hide show
  1. README.md +1 -3
  2. README_zh.md +58 -2
README.md CHANGED
@@ -109,7 +109,7 @@ inference: false
109
 
110
  ## Model Introduction
111
 
112
- CogVideoX is an open-source version of the video generation model originating from [QingYing](https://chatglm.cn/video?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
113
 
114
  <table style="border-collapse: collapse; width: 100%;">
115
  <tr>
@@ -194,8 +194,6 @@ CogVideoX is an open-source version of the video generation model originating fr
194
  + Using [SAT](https://github.com/THUDM/SwissArmyTransformer) for inference and fine-tuning of SAT version
195
  models. Feel free to visit our GitHub for more information.
196
 
197
-
198
-
199
  ## Quick Start 🤗
200
 
201
  This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps.
 
109
 
110
  ## Model Introduction
111
 
112
+ CogVideoX is an open-source version of the video generation model originating from [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
113
 
114
  <table style="border-collapse: collapse; width: 100%;">
115
  <tr>
 
194
  + Using [SAT](https://github.com/THUDM/SwissArmyTransformer) for inference and fine-tuning of SAT version
195
  models. Feel free to visit our GitHub for more information.
196
 
 
 
197
  ## Quick Start 🤗
198
 
199
  This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps.
README_zh.md CHANGED
@@ -116,8 +116,8 @@ CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideo) 同源的开源
116
  </tr>
117
  <tr>
118
  <td style="text-align: center;">单GPU显存消耗<br></td>
119
- <td style="text-align: center;">FP16: 18GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>12.5GB* using diffusers</b><br><b>INT8: 7.8GB* using diffusers</b></td>
120
- <td style="text-align: center;">BF16: 26GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>20.7GB* using diffusers</b><br><b>INT8: 11.4GB* using diffusers</b></td>
121
  </tr>
122
  <tr>
123
  <td style="text-align: center;">多GPU推理显存消耗</td>
@@ -226,6 +226,62 @@ video = pipe(
226
 
227
  export_to_video(video, "output.mp4", fps=8)
228
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
 
230
  ## 深入研究
231
 
 
116
  </tr>
117
  <tr>
118
  <td style="text-align: center;">单GPU显存消耗<br></td>
119
+ <td style="text-align: center;">FP16: 18GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>12.5GB* using diffusers</b><br><b>INT8: 7.8GB* using diffusers with torchao</b></td>
120
+ <td style="text-align: center;">BF16: 26GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>20.7GB* using diffusers</b><br><b>INT8: 11.4GB* using diffusers with torchao</b></td>
121
  </tr>
122
  <tr>
123
  <td style="text-align: center;">多GPU推理显存消耗</td>
 
226
 
227
  export_to_video(video, "output.mp4", fps=8)
228
  ```
229
+ ## Quantized Inference
230
+
231
+ [PytorchAO](https://github.com/pytorch/ao) 和 [Optimum-quanto](https://github.com/huggingface/optimum-quanto/)
232
+ 可以用于对文本编码器、Transformer 和 VAE 模块进行量化,从而降低 CogVideoX 的内存需求。这使得在免费的 T4 Colab 或较小 VRAM 的
233
+ GPU 上运行该模型成为可能!值得注意的是,TorchAO 量化与 `torch.compile` 完全兼容,这可以显著加快推理速度。
234
+
235
+ ```diff
236
+ # To get started, PytorchAO needs to be installed from the GitHub source and PyTorch Nightly.
237
+ # Source and nightly installation is only required until next release.
238
+
239
+ import torch
240
+ from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXPipeline
241
+ from diffusers.utils import export_to_video
242
+ + from transformers import T5EncoderModel
243
+ + from torchao.quantization import quantize_, int8_weight_only, int8_dynamic_activation_int8_weight
244
+
245
+ + quantization = int8_weight_only
246
+
247
+ + text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
248
+ + quantize_(text_encoder, quantization())
249
+
250
+ + transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
251
+ + quantize_(transformer, quantization())
252
+
253
+ + vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
254
+ + quantize_(vae, quantization())
255
+
256
+ # Create pipeline and run inference
257
+ pipe = CogVideoXPipeline.from_pretrained(
258
+ "THUDM/CogVideoX-5b",
259
+ + text_encoder=text_encoder,
260
+ + transformer=transformer,
261
+ + vae=vae,
262
+ torch_dtype=torch.bfloat16,
263
+ )
264
+ pipe.enable_model_cpu_offload()
265
+ pipe.vae.enable_tiling()
266
+
267
+ prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
268
+
269
+ video = pipe(
270
+ prompt=prompt,
271
+ num_videos_per_prompt=1,
272
+ num_inference_steps=50,
273
+ num_frames=49,
274
+ guidance_scale=6,
275
+ generator=torch.Generator(device="cuda").manual_seed(42),
276
+ ).frames[0]
277
+
278
+ export_to_video(video, "output.mp4", fps=8)
279
+ ```
280
+
281
+ 此外,这些模型可以通过使用PytorchAO以量化数据类型序列化并存储,从而节省磁盘空间。你可以在以下链接中找到示例和基准测试。
282
+
283
+ - [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
284
+ - [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
285
 
286
  ## 深入研究
287