Skip to content

feat: support DeepSeek V3.2 W4A8 MoE for mlu and add smoke test.#969

Open
phantomlei3 wants to merge 3 commits intojd-opensource:mainfrom
phantomlei3:feat/ds-v32-w4a8
Open

feat: support DeepSeek V3.2 W4A8 MoE for mlu and add smoke test.#969
phantomlei3 wants to merge 3 commits intojd-opensource:mainfrom
phantomlei3:feat/ds-v32-w4a8

Conversation

@phantomlei3
Copy link
Collaborator

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for DeepSeek V3.2 W4A8 MoE quantization for MLU, including necessary changes in model loading, quantization argument handling, and the FusedMoE layer implementation. It also includes new smoke tests to verify the functionality. The changes are well-structured and the new feature is accompanied by tests. I have one suggestion to improve the robustness of quantization argument loading to prevent a subtle bug.

Comment on lines +262 to +265
if (auto v = reader.value<int64_t>("quantization_config.bits")) {
quant_args_.bits() = v.value();
}
quant_args_.moe_weight_bits() = quant_args_.bits();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for setting moe_weight_bits can lead to unintended behavior. If quantization_config exists but does not contain bits, quant_args_.bits() will be its default value of 0, and quant_args_.moe_weight_bits() will be incorrectly set to 0, overriding its own default of 8. This can cause failures later on in a non-obvious way.

To make the logic more robust and explicit, moe_weight_bits should only be updated from bits when bits is explicitly specified in the configuration. This ensures moe_weight_bits retains its sensible default value if bits is not specified.

Suggested change
if (auto v = reader.value<int64_t>("quantization_config.bits")) {
quant_args_.bits() = v.value();
}
quant_args_.moe_weight_bits() = quant_args_.bits();
if (auto v = reader.value<int64_t>("quantization_config.bits")) {
quant_args_.bits() = v.value();
quant_args_.moe_weight_bits() = v.value();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only set moe_weight_bits when quantization_config.bits is present

yq33victor
yq33victor previously approved these changes Mar 4, 2026
Copy link
Collaborator

@yq33victor yq33victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants