Skip to content

Commit d3ffb37

Browse files
gushiqiaogushiqiao
andauthored
Dev fix (#376)
* Support wan i2v and lora. * Update docs and readme. --------- Co-authored-by: gushiqiao <[email protected]> Co-authored-by: root <gushiqiao>
1 parent 12fa9f8 commit d3ffb37

File tree

7 files changed

+456
-1
lines changed

7 files changed

+456
-1
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
4848

4949
## Latest News
5050

51+
- **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).
52+
5153
- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.
5254

5355
- **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.

README_ja.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
4848

4949
## 最新情報
5050

51-
- V 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`****`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。
51+
- **2025年5月12日:** 🔥 **`Wan2.1`** シリーズのビデオ生成モデルの量子化を完全にサポートし、実際に量子化された **INT8/FP8** 重みのエクスポートにも対応しました。これらは [lightx2v](https://github.com/ModelTC/lightx2v) 推論フレームワークと互換性があります。詳細は [lightx2v ドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html) をご参照ください。
52+
53+
- **2025年2月7日:** 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`****`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。
5254

5355
- **2024年11月20日:** 🔥 私たちは現在、✨`DeepSeekv2(2.5)`などの`MOE`モデルおよび✨`Qwen2VL``Llama3.2`などの`VLM`モデルの量子化を完全にサポートしています。対応する量子化手法には、✅整数量子化、✅浮動小数点量子化、さらに✅AWQ、✅GPTQ、✅SmoothQuant、✅Quarotといった高度なアルゴリズムが含まれます。
5456

README_zh.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates
4848

4949
## 最新消息
5050

51+
- **2025年5月12日:** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化,并支持导出真实量化的 **INT8/FP8** 权重,兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)
52+
5153
- **2025年2月7日:** 🔥 我们现已全面支持 **`DeepSeekv3`****`DeepSeek-R1`****`DeepSeek-R1-zero`** 等 671B 大规模 **`MOE`** 模型的量化。 您可以直接加载 `FP8` 权重,无需额外转换,使用单张 80G 显存的 GPU 即可运行 `AWQ``RTN` 量化,同时还支持导出真实量化的 **INT4/INT8** 权重
5254

5355
- **2024年11月20日:** 🔥 我们现已全面支持✨`DeepSeekv2(2.5)``MOE`模型以及✨`Qwen2VL``Llama3.2``VLM`模型的量化。支持的量化方案包括✅整型量化、✅浮点量化,以及✅AWQ、✅GPTQ、✅SmoothQuant 和 ✅Quarot 等先进算法。
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
base:
2+
seed: &seed 42
3+
model:
4+
type: WanI2V
5+
path: /path/to/model
6+
torch_dtype: auto
7+
calib:
8+
name: i2v
9+
download: False
10+
path: ../assets/wan_i2v/calib/
11+
sample_steps: 40
12+
bs: 1
13+
target_height: 480
14+
target_width: 832
15+
num_frames: 81
16+
guidance_scale: 5.0
17+
seed: *seed
18+
eval:
19+
eval_pos: [fake_quant]
20+
type: video_gen
21+
name: i2v
22+
download: False
23+
path: ../assets/wan_i2v/eval/
24+
bs: 1
25+
target_height: 480
26+
target_width: 832
27+
num_frames: 81
28+
guidance_scale: 5.0
29+
output_video_path: ./output_videos_sq/
30+
quant:
31+
video_gen:
32+
method: SmoothQuant
33+
weight:
34+
quant_type: float-quant
35+
bit: e4m3
36+
symmetric: True
37+
granularity: per_channel
38+
use_qtorch: True
39+
act:
40+
quant_type: float-quant
41+
bit: e4m3
42+
symmetric: True
43+
granularity: per_token
44+
use_qtorch: True
45+
special:
46+
alpha: 0.75
47+
save:
48+
save_lightx2v: True
49+
save_path: /path/to/x2v/
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
base:
2+
seed: &seed 42
3+
model:
4+
type: WanI2V
5+
path: /path/to/model
6+
lora_path: /path/to/lora_weights
7+
torch_dtype: auto
8+
calib:
9+
name: i2v
10+
download: False
11+
path: ../assets/wan_i2v/calib/
12+
sample_steps: 40
13+
bs: 1
14+
target_height: 480
15+
target_width: 832
16+
num_frames: 81
17+
guidance_scale: 5.0
18+
seed: *seed
19+
eval:
20+
eval_pos: [fake_quant]
21+
type: video_gen
22+
name: i2v
23+
download: False
24+
path: ../assets/wan_i2v/eval/
25+
bs: 1
26+
target_height: 480
27+
target_width: 832
28+
num_frames: 81
29+
guidance_scale: 5.0
30+
output_video_path: ./output_videos_sq/
31+
quant:
32+
video_gen:
33+
method: SmoothQuant
34+
weight:
35+
bit: 8
36+
symmetric: True
37+
granularity: per_channel
38+
act:
39+
bit: 8
40+
symmetric: True
41+
granularity: per_token
42+
special:
43+
alpha: 0.75
44+
save:
45+
save_lightx2v: True
46+
save_path: /path/to/x2v/

docs/en/source/backend/lightx2v.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# lightx2v Quantized Inference
2+
3+
[lightx2v](https://github.com/ModelTC/lightx2v) is an efficient backend designed specifically to meet the inference demands of video generation models. By optimizing memory management and computational efficiency, it significantly accelerates the inference process.
4+
5+
**LLMC** supports exporting quantized model formats required by **lightx2v** and offers strong support for multiple quantization algorithms (such as AWQ, GPTQ, SmoothQuant, etc.), maintaining high quantization accuracy while improving inference speed. Combining **LLMC** with **lightx2v** enables accelerated inference and memory optimization without compromising accuracy, making it ideal for scenarios that require efficient video model processing.
6+
7+
---
8+
9+
## 1.1 Environment Setup
10+
11+
To use **lightx2v** for quantized inference, first install and configure the environment:
12+
13+
```bash
14+
# Clone the repository and its submodules
15+
git clone https://github.com/ModelTC/lightx2v.git lightx2v && cd lightx2v
16+
git submodule update --init --recursive
17+
18+
# Create and activate the conda environment
19+
conda create -n lightx2v python=3.11 && conda activate lightx2v
20+
pip install -r requirements.txt
21+
22+
# Reinstall transformers separately to bypass version conflicts
23+
pip install transformers==4.45.2
24+
25+
# Install flash-attention 2
26+
cd lightx2v/3rd/flash-attention && pip install --no-cache-dir -v -e .
27+
28+
# Install flash-attention 3 (only if using Hopper architecture)
29+
cd lightx2v/3rd/flash-attention/hopper && pip install --no-cache-dir -v -e .
30+
```
31+
32+
---
33+
34+
## 1.2 Quantization Formats
35+
36+
**lightx2v** supports several fixed-point quantization formats:
37+
38+
- **W8A8**: int8 for weights and activations.
39+
- **FP8 (E4M3)**: float8 for weights and activations.
40+
- **Weight per-channel quantization**.
41+
- **Activation per-token dynamic quantization** for improved precision.
42+
- **Symmetric quantization** for both weights and activations (uses only scale).
43+
44+
When using **LLMC** to quantize models, ensure the bit-width of weights and activations matches supported **lightx2v** formats.
45+
46+
---
47+
48+
## 1.3 Quantizing Models with LLMC
49+
50+
### 1.3.1 Calibration Data
51+
52+
For example, for the Wan2.1 model on the I2V task, a calibration dataset is provided in the [directory](https://github.com/ModelTC/llmc/tree/main/assets/wan_i2v/calib). Users can add more samples as needed.
53+
54+
### 1.3.2 Choosing Quantization Algorithm
55+
56+
#### **W8A8**
57+
58+
We recommend using **SmoothQuant** for W8A8 settings.
59+
Refer to the SmoothQuant W8A8 [configuration file](https://github.com/ModelTC/llmc/tree/main/configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml):
60+
61+
```yaml
62+
quant:
63+
video_gen:
64+
method: SmoothQuant
65+
weight:
66+
bit: 8
67+
symmetric: True
68+
granularity: per_channel
69+
act:
70+
bit: 8
71+
symmetric: True
72+
granularity: per_token
73+
special:
74+
alpha: 0.75
75+
```
76+
77+
If SmoothQuant does not meet the precision requirement, use **AWQ** for better accuracy. See the corresponding [configuration](https://github.com/ModelTC/llmc/tree/main/configs/quantization/video_gen/wan_i2v/awq_w_a.yaml).
78+
79+
#### **FP8-Dynamic**
80+
81+
LLMC supports FP8 quantization with per-channel weights and per-token dynamic activations. SmoothQuant is again recommended. See the SmoothQuant FP8 [configuration](https://github.com/ModelTC/llmc/tree/main/configs/quantization/backend/lightx2v/fp8/awq_fp8.yml):
82+
83+
```yaml
84+
quant:
85+
video_gen:
86+
method: SmoothQuant
87+
weight:
88+
quant_type: float-quant
89+
bit: e4m3
90+
symmetric: True
91+
granularity: per_channel
92+
use_qtorch: True
93+
act:
94+
quant_type: float-quant
95+
bit: e4m3
96+
symmetric: True
97+
granularity: per_token
98+
use_qtorch: True
99+
special:
100+
alpha: 0.75
101+
```
102+
103+
Ensure `quant_type` is set to `float-quant` and `use_qtorch` to `True`, as **LLMC** uses [QPyTorch](https://github.com/Tiiiger/QPyTorch) for float quantization.
104+
105+
Install QPyTorch with:
106+
107+
```bash
108+
pip install qtorch
109+
```
110+
111+
### 1.3.3 Exporting the Quantized Model
112+
113+
```yaml
114+
save:
115+
save_lightx2v: True
116+
save_path: /path/to/save_for_lightx2v/
117+
```
118+
119+
Set `save_lightx2v` to `True`. LLMC will export weights as `torch.int8` or `torch.float8_e4m3fn` for direct loading in **lightx2v**, along with quantization parameters.
120+
121+
### 1.3.4 Running LLMC
122+
123+
Edit the config path in the run script and execute:
124+
125+
```bash
126+
# scripts/run_llmc.sh
127+
llmc=llmc_path
128+
export PYTHONPATH=$llmc:$PYTHONPATH
129+
130+
task_name=sq_for_lightx2v
131+
config=${llmc}/configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml
132+
```
133+
134+
After LLMC completes, the quantized model is saved to `save.save_path`.
135+
136+
### 1.3.5 Evaluation
137+
138+
For the I2V task with the Wan2.1 model, an evaluation dataset is provided [here](https://github.com/ModelTC/llmc/tree/main/assets/wan_i2v/eval). Set the following in the config file:
139+
140+
```yaml
141+
eval:
142+
eval_pos: [fake_quant]
143+
type: video_gen
144+
name: i2v
145+
download: False
146+
path: ../assets/wan_i2v/eval/
147+
bs: 1
148+
target_height: 480
149+
target_width: 832
150+
num_frames: 81
151+
guidance_scale: 5.0
152+
output_video_path: ./output_videos_sq/
153+
```
154+
155+
LLMC will generate evaluation videos using the pseudo-quantized model.
156+
157+
---
158+
159+
## 1.4 Inference with lightx2v
160+
161+
### 1.4.1 Weight Structure Conversion
162+
163+
After LLMC exports the model, convert its structure to match **lightx2v** requirements using the [conversion script](https://github.com/ModelTC/lightx2v/blob/main/examples/diffusers/converter.py):
164+
165+
```bash
166+
python converter.py -s /path/to/save_for_lightx2v/ -o /path/to/output/ -d backward
167+
```
168+
169+
The converted model will be saved under `/path/to/output/`.
170+
171+
### 1.4.2 Offline Inference
172+
173+
Edit the [inference script](https://github.com/ModelTC/lightx2v/blob/main/scripts/run_wan_i2v_advanced_ptq.sh), set `model_path` to `/path/to/output/` and `lightx2v_path` to your local lightx2v path, then run:
174+
175+
```bash
176+
bash run_wan_i2v_advanced_ptq.sh
177+
```

0 commit comments

Comments
 (0)