Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

## Latest News

- **May 12, 2025:** 🔥 We now fully support quantization for the **`Wan2.1`** series of video generation models and provide export of truly quantized **INT8/FP8** weights, compatible with the [lightx2v](https://github.com/ModelTC/lightx2v) inference framework. For details, please refer to the [lightx2v documentation](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html).

- **Feb 7, 2025:** 🔥 We now fully support quantization of large-scale **`MOE`** models like **`DeepSeekv3`**, **`DeepSeek-R1`**, and **`DeepSeek-R1-zero`** with **`671B`** parameters. You can now directly load FP8 weights without any extra conversion. AWQ and RTN quantization can run on a single 80GB GPU, and we also support the export of true quantized **INT4/INT8** weights.

- **Nov 20, 2024:** 🔥 We now fully support the quantization of ✨`DeepSeekv2(2.5)` and other `MOE` models, as well as ✨`Qwen2VL`, `Llama3.2`, and other `VLM` models. Supported quantization methods include ✅integer quantization, ✅floating-point quantization, and advanced algorithms like ✅AWQ, ✅GPTQ, ✅SmoothQuant, and ✅Quarot.
Expand Down
4 changes: 3 additions & 1 deletion README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,9 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

## 最新情報

- V 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`**、**`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。
- **2025年5月12日:** 🔥 **`Wan2.1`** シリーズのビデオ生成モデルの量子化を完全にサポートし、実際に量子化された **INT8/FP8** 重みのエクスポートにも対応しました。これらは [lightx2v](https://github.com/ModelTC/lightx2v) 推論フレームワークと互換性があります。詳細は [lightx2v ドキュメント](https://llmc-en.readthedocs.io/en/latest/backend/lightx2v.html) をご参照ください。

- **2025年2月7日:** 🔥 私たちは現在、671Bパラメータを持つ大規模な **`MOE`** モデル、例えば **`DeepSeekv3`**、**`DeepSeek-R1`**、および **`DeepSeek-R1-zero`** の量子化を完全にサポートしています。今すぐFP8ウェイトを追加の変換なしで直接読み込むことができます。AWQおよびRTN量子化は、1枚の80GB GPUで実行でき、さらに、真の量子化された **INT4/INT8** ウェイトのエクスポートにも対応しています。

- **2024年11月20日:** 🔥 私たちは現在、✨`DeepSeekv2(2.5)`などの`MOE`モデルおよび✨`Qwen2VL`、`Llama3.2`などの`VLM`モデルの量子化を完全にサポートしています。対応する量子化手法には、✅整数量子化、✅浮動小数点量子化、さらに✅AWQ、✅GPTQ、✅SmoothQuant、✅Quarotといった高度なアルゴリズムが含まれます。

Expand Down
2 changes: 2 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/llmcompression:pure-lates

## 最新消息

- **2025年5月12日:** 🔥 我们现已全面支持 **`Wan2.1`** 系列视频生成模型的量化,并支持导出真实量化的 **INT8/FP8** 权重,兼容 [lightx2v](https://github.com/ModelTC/lightx2v) 推理框架。详情请参考 [lightx2v 使用文档](https://llmc-zhcn.readthedocs.io/en/latest/backend/lightx2v.html)。

- **2025年2月7日:** 🔥 我们现已全面支持 **`DeepSeekv3`**、**`DeepSeek-R1`** 和 **`DeepSeek-R1-zero`** 等 671B 大规模 **`MOE`** 模型的量化。 您可以直接加载 `FP8` 权重,无需额外转换,使用单张 80G 显存的 GPU 即可运行 `AWQ` 和 `RTN` 量化,同时还支持导出真实量化的 **INT4/INT8** 权重

- **2024年11月20日:** 🔥 我们现已全面支持✨`DeepSeekv2(2.5)`等`MOE`模型以及✨`Qwen2VL`、`Llama3.2`等`VLM`模型的量化。支持的量化方案包括✅整型量化、✅浮点量化,以及✅AWQ、✅GPTQ、✅SmoothQuant 和 ✅Quarot 等先进算法。
Expand Down
Binary file added assets/wan_i2v/calib/astronaut.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions assets/wan_i2v/calib/samples.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[
{
"image": "astronaut.jpg",
"prompt": "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.",
"negative_prompt": "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
}
]
Binary file added assets/wan_i2v/eval/astronaut.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions assets/wan_i2v/eval/samples.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[
{
"image": "astronaut.jpg",
"prompt": "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.",
"negative_prompt": "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
}
]
Empty file modified assets/wan_t2v/calib/samples.json
100644 → 100755
Empty file.
Empty file modified assets/wan_t2v/eval/samples.json
100644 → 100755
Empty file.
49 changes: 49 additions & 0 deletions configs/quantization/video_gen/wan_i2v/awq_w_a.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
torch_dtype: auto
calib:
name: i2v
download: False
path: ../assets/wan_i2v/calib/
sample_steps: 40
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
seed: *seed
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_awq/
quant:
video_gen:
method: Awq
weight:
bit: 8
symmetric: True
granularity: per_channel
group_size: -1
act:
bit: 8
symmetric: True
granularity: per_token
special:
trans: True
trans_version: v2
weight_clip: False
clip_sym: True
save:
save_lightx2v: True
save_path: /path/to/x2v/
32 changes: 32 additions & 0 deletions configs/quantization/video_gen/wan_i2v/rtn_w_a.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
torch_dtype: auto
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_rtn/
quant:
video_gen:
method: RTN
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
save:
save_lightx2v: True
save_path: /path/to/x2v/
33 changes: 33 additions & 0 deletions configs/quantization/video_gen/wan_i2v/rtn_w_a_lora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
lora_path: /path/to/lora_weights
torch_dtype: auto
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_rtn_lora/
quant:
video_gen:
method: RTN
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
save:
save_lightx2v: True
save_path: /path/to/x2v/
45 changes: 45 additions & 0 deletions configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
torch_dtype: auto
calib:
name: i2v
download: False
path: ../assets/wan_i2v/calib/
sample_steps: 40
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
seed: *seed
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_sq/
quant:
video_gen:
method: SmoothQuant
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
special:
alpha: 0.75
save:
save_lightx2v: True
save_path: /path/to/x2v/
49 changes: 49 additions & 0 deletions configs/quantization/video_gen/wan_i2v/smoothquant_w_a_fp8.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
torch_dtype: auto
calib:
name: i2v
download: False
path: ../assets/wan_i2v/calib/
sample_steps: 40
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
seed: *seed
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_sq/
quant:
video_gen:
method: SmoothQuant
weight:
quant_type: float-quant
bit: e4m3
symmetric: True
granularity: per_channel
use_qtorch: True
act:
quant_type: float-quant
bit: e4m3
symmetric: True
granularity: per_token
use_qtorch: True
special:
alpha: 0.75
save:
save_lightx2v: True
save_path: /path/to/x2v/
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
base:
seed: &seed 42
model:
type: WanI2V
path: /path/to/model
lora_path: /path/to/lora_weights
torch_dtype: auto
calib:
name: i2v
download: False
path: ../assets/wan_i2v/calib/
sample_steps: 40
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
seed: *seed
eval:
eval_pos: [fake_quant]
type: video_gen
name: i2v
download: False
path: ../assets/wan_i2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_sq/
quant:
video_gen:
method: SmoothQuant
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
special:
alpha: 0.75
save:
save_lightx2v: True
save_path: /path/to/x2v/
9 changes: 4 additions & 5 deletions configs/quantization/video_gen/wan_t2v/awq_w_a.yaml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ model:
path: /path/to/wan_t2v
torch_dtype: auto
calib:
name: custom_t2v
name: t2v
download: False
path: ../assets/wan_t2v/calib/
sample_steps: 20
Expand All @@ -18,7 +18,7 @@ calib:
eval:
eval_pos: [transformed, fake_quant]
type: video_gen
name: custom_t2v
name: t2v
download: False
path: ../assets/wan_t2v/calib/
bs: 1
Expand All @@ -45,6 +45,5 @@ quant:
weight_clip: True
clip_sym: True
save:
save_trans: False
save_fake: False
save_path: /path/to/save/
save_lightx2v: True
save_path: /path/to/x2v/
11 changes: 5 additions & 6 deletions configs/quantization/video_gen/wan_t2v/rtn_w_a.yaml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ model:
eval:
eval_pos: [transformed, fake_quant]
type: video_gen
name: custom_t2v
name: t2v
download: False
path: /mtc/gushiqiao/llmc_video_new/llmc/assets/wan_t2v/
path: ../assets/wan_t2v/eval/
bs: 1
target_height: 480
target_width: 832
num_frames: 81
guidance_scale: 5.0
output_video_path: ./output_videos_sq/
output_video_path: ./output_videos_rtn/
quant:
video_gen:
method: RTN
Expand All @@ -28,6 +28,5 @@ quant:
symmetric: True
granularity: per_token
save:
save_trans: False
save_fake: False
save_path: /path/to/save/
save_lightx2v: True
save_path: /path/to/x2v/
9 changes: 4 additions & 5 deletions configs/quantization/video_gen/wan_t2v/smoothquant_w_a.yaml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ model:
path: /path/to/wan_t2v
torch_dtype: auto
calib:
name: custom_t2v
name: t2v
download: False
path: ../assets/wan_t2v/calib/
sample_steps: 20
Expand All @@ -18,7 +18,7 @@ calib:
eval:
eval_pos: [transformed, fake_quant]
type: video_gen
name: custom_t2v
name: t2v
download: False
path: ../assets/wan_t2v/calib/
bs: 1
Expand All @@ -41,6 +41,5 @@ quant:
special:
alpha: 0.7
save:
save_trans: False
save_fake: False
save_path: /path/to/save/
save_lightx2v: True
save_path: /path/to/x2v/
Loading