ModelTC · llmc-reviewer · May 12, 2025 · May 9, 2025 · May 9, 2025 · May 12, 2025
diff --git a/README.md b/README.md
@@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 ## Latest News
 
+- **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
+
 - **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
 
 - **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.

diff --git a/README_ja.md b/README_ja.md
@@ -48,7 +48,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 ## 最新情報
 
-- V 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`**、**`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。
+- **2025年5月12日：** 🔥 **`Wan2.1`** シリーズのビデオ生成モデルの量子化を完全にサポートし、実際に量子化された **INT8/FP8** 重みのエクスポートにも対応しました。これらは [lightx2v](https://github.com/ModelTC/lightx2v) 推論フレームワークと互換性があります。詳細は [lightx2v ドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html) をご参照ください。
+
+- **2025年2月7日:** 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`**、**`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。
 
 - **2024年11月20日:** 🔥 私たちは現在、✨`DeepSeekv2(2.5)`などの`MOE`モデルおよび✨`Qwen2VL`、`Llama3.2`などの`VLM`モデルの量子化を完全にサポートしています。対応する量子化手法には、✅整数量子化、✅浮動小数点量子化、さらに✅AWQ、✅GPTQ、✅SmoothQuant、✅Quarotといった高度なアルゴリズムが含まれます。
 

diff --git a/README_zh.md b/README_zh.md
@@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
 
 ## 最新消息
 
+- **2025年5月12日：** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化，并支持导出真实量化的 **INT8/FP8** 权重，兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)。
+
 - **2025年2月7日:** 🔥 我们现已全面支持 **`DeepSeekv3`**、**`DeepSeek-R1`** 和 **`DeepSeek-R1-zero`** 等 671B 大规模 **`MOE`** 模型的量化。 您可以直接加载 `FP8` 权重，无需额外转换，使用单张 80G 显存的 GPU 即可运行 `AWQ` 和 `RTN` 量化，同时还支持导出真实量化的 **INT4/INT8** 权重
 
 - **2024年11月20日:** 🔥 我们现已全面支持✨`DeepSeekv2(2.5)`等`MOE`模型以及✨`Qwen2VL`、`Llama3.2`等`VLM`模型的量化。支持的量化方案包括✅整型量化、✅浮点量化，以及✅AWQ、✅GPTQ、✅SmoothQuant 和 ✅Quarot 等先进算法。

diff --git a/assets/wan_i2v/calib/astronaut.jpg b/assets/wan_i2v/calib/astronaut.jpg
diff --git a/assets/wan_i2v/calib/samples.json b/assets/wan_i2v/calib/samples.json
@@ -0,0 +1,7 @@
+[
+    {
+        "image": "astronaut.jpg",
+        "prompt": "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.",
+        "negative_prompt": "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
+    }
+]
diff --git a/assets/wan_i2v/eval/astronaut.jpg b/assets/wan_i2v/eval/astronaut.jpg
diff --git a/assets/wan_i2v/eval/samples.json b/assets/wan_i2v/eval/samples.json
@@ -0,0 +1,7 @@
+[
+    {
+        "image": "astronaut.jpg",
+        "prompt": "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.",
+        "negative_prompt": "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
+    }
+]
diff --git a/assets/wan_t2v/calib/samples.json b/assets/wan_t2v/calib/samples.json
diff --git a/assets/wan_t2v/eval/samples.json b/assets/wan_t2v/eval/samples.json
diff --git a/configs/quantization/video_gen/wan_i2v/awq_w_a.yaml b/configs/quantization/video_gen/wan_i2v/awq_w_a.yaml
@@ -0,0 +1,49 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    torch_dtype: auto
+calib:
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/calib/
+    sample_steps: 40
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    seed: *seed
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_awq/
+quant:
+    video_gen:
+        method: Awq
+        weight:
+            bit: 8
+            symmetric: True
+            granularity: per_channel
+            group_size: -1
+        act:
+            bit: 8
+            symmetric: True
+            granularity: per_token
+        special:
+            trans: True
+            trans_version: v2
+            weight_clip: False
+            clip_sym: True
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_i2v/rtn_w_a.yaml b/configs/quantization/video_gen/wan_i2v/rtn_w_a.yaml
@@ -0,0 +1,32 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    torch_dtype: auto
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_rtn/
+quant:
+    video_gen:
+        method: RTN
+        weight:
+            bit: 8
+            symmetric: True
+            granularity: per_channel
+        act:
+            bit: 8
+            symmetric: True
+            granularity: per_token
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_i2v/rtn_w_a_lora.yaml b/configs/quantization/video_gen/wan_i2v/rtn_w_a_lora.yaml
@@ -0,0 +1,33 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    lora_path: /path/to/lora_weights
+    torch_dtype: auto
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_rtn_lora/
+quant:
+    video_gen:
+        method: RTN
+        weight:
+            bit: 8
+            symmetric: True
+            granularity: per_channel
+        act:
+            bit: 8
+            symmetric: True
+            granularity: per_token
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml b/configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml
@@ -0,0 +1,45 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    torch_dtype: auto
+calib:
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/calib/
+    sample_steps: 40
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    seed: *seed
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_sq/
+quant:
+    video_gen:
+        method: SmoothQuant
+        weight:
+            bit: 8
+            symmetric: True
+            granularity: per_channel
+        act:
+            bit: 8
+            symmetric: True
+            granularity: per_token
+        special:
+            alpha: 0.75
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_i2v/smoothquant_w_a_fp8.yaml b/configs/quantization/video_gen/wan_i2v/smoothquant_w_a_fp8.yaml
@@ -0,0 +1,49 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    torch_dtype: auto
+calib:
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/calib/
+    sample_steps: 40
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    seed: *seed
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_sq/
+quant:
+    video_gen:
+        method: SmoothQuant
+        weight:
+            quant_type: float-quant
+            bit: e4m3
+            symmetric: True
+            granularity: per_channel
+            use_qtorch: True
+        act:
+            quant_type: float-quant
+            bit: e4m3
+            symmetric: True
+            granularity: per_token
+            use_qtorch: True
+        special:
+            alpha: 0.75
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_i2v/smoothquant_w_a_int8_lora.yaml b/configs/quantization/video_gen/wan_i2v/smoothquant_w_a_int8_lora.yaml
@@ -0,0 +1,46 @@
+base:
+    seed: &seed 42
+model:
+    type: WanI2V
+    path: /path/to/model
+    lora_path: /path/to/lora_weights
+    torch_dtype: auto
+calib:
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/calib/
+    sample_steps: 40
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    seed: *seed
+eval:
+    eval_pos: [fake_quant]
+    type: video_gen
+    name: i2v
+    download: False
+    path: ../assets/wan_i2v/eval/
+    bs: 1
+    target_height: 480
+    target_width: 832
+    num_frames: 81
+    guidance_scale: 5.0
+    output_video_path: ./output_videos_sq/
+quant:
+    video_gen:
+        method: SmoothQuant
+        weight:
+            bit: 8
+            symmetric: True
+            granularity: per_channel
+        act:
+            bit: 8
+            symmetric: True
+            granularity: per_token
+        special:
+            alpha: 0.75
+save:
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_t2v/awq_w_a.yaml b/configs/quantization/video_gen/wan_t2v/awq_w_a.yaml
@@ -5,7 +5,7 @@ model:
     path: /path/to/wan_t2v
     torch_dtype: auto
 calib:
-    name: custom_t2v
+    name: t2v
     download: False
     path: ../assets/wan_t2v/calib/
     sample_steps: 20
@@ -18,7 +18,7 @@ calib:
 eval:
     eval_pos: [transformed, fake_quant]
     type: video_gen
-    name: custom_t2v
+    name: t2v
     download: False
     path: ../assets/wan_t2v/calib/
     bs: 1
@@ -45,6 +45,5 @@ quant:
             weight_clip: True
             clip_sym: True
 save:
-    save_trans: False
-    save_fake: False
-    save_path: /path/to/save/
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_t2v/rtn_w_a.yaml b/configs/quantization/video_gen/wan_t2v/rtn_w_a.yaml
@@ -7,15 +7,15 @@ model:
 eval:
     eval_pos: [transformed, fake_quant]
     type: video_gen
-    name: custom_t2v
+    name: t2v
     download: False
-    path: /mtc/gushiqiao/llmc_video_new/llmc/assets/wan_t2v/
+    path: ../assets/wan_t2v/eval/
     bs: 1
     target_height: 480
     target_width: 832
     num_frames: 81
     guidance_scale: 5.0
-    output_video_path: ./output_videos_sq/
+    output_video_path: ./output_videos_rtn/
 quant:
     video_gen:
         method: RTN
@@ -28,6 +28,5 @@ quant:
             symmetric: True
             granularity: per_token
 save:
-    save_trans: False
-    save_fake: False
-    save_path: /path/to/save/
+    save_lightx2v: True
+    save_path: /path/to/x2v/
diff --git a/configs/quantization/video_gen/wan_t2v/smoothquant_w_a.yaml b/configs/quantization/video_gen/wan_t2v/smoothquant_w_a.yaml
@@ -5,7 +5,7 @@ model:
     path: /path/to/wan_t2v
     torch_dtype: auto
 calib:
-    name: custom_t2v
+    name: t2v
     download: False
     path: ../assets/wan_t2v/calib/
     sample_steps: 20
@@ -18,7 +18,7 @@ calib:
 eval:
     eval_pos: [transformed, fake_quant]
     type: video_gen
-    name: custom_t2v
+    name: t2v
     download: False
     path: ../assets/wan_t2v/calib/
     bs: 1
@@ -41,6 +41,5 @@ quant:
         special:
             alpha: 0.7
 save:
-    save_trans: False
-    save_fake: False
-    save_path: /path/to/save/
+    save_lightx2v: True
+    save_path: /path/to/x2v/