feat: support DeepSeek V3.2 W4A8 MoE for mlu and add smoke test.#969
feat: support DeepSeek V3.2 W4A8 MoE for mlu and add smoke test.#969phantomlei3 wants to merge 3 commits intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for DeepSeek V3.2 W4A8 MoE quantization for MLU, including necessary changes in model loading, quantization argument handling, and the FusedMoE layer implementation. It also includes new smoke tests to verify the functionality. The changes are well-structured and the new feature is accompanied by tests. I have one suggestion to improve the robustness of quantization argument loading to prevent a subtle bug.
| if (auto v = reader.value<int64_t>("quantization_config.bits")) { | ||
| quant_args_.bits() = v.value(); | ||
| } | ||
| quant_args_.moe_weight_bits() = quant_args_.bits(); |
There was a problem hiding this comment.
The current logic for setting moe_weight_bits can lead to unintended behavior. If quantization_config exists but does not contain bits, quant_args_.bits() will be its default value of 0, and quant_args_.moe_weight_bits() will be incorrectly set to 0, overriding its own default of 8. This can cause failures later on in a non-obvious way.
To make the logic more robust and explicit, moe_weight_bits should only be updated from bits when bits is explicitly specified in the configuration. This ensures moe_weight_bits retains its sensible default value if bits is not specified.
| if (auto v = reader.value<int64_t>("quantization_config.bits")) { | |
| quant_args_.bits() = v.value(); | |
| } | |
| quant_args_.moe_weight_bits() = quant_args_.bits(); | |
| if (auto v = reader.value<int64_t>("quantization_config.bits")) { | |
| quant_args_.bits() = v.value(); | |
| quant_args_.moe_weight_bits() = v.value(); | |
| } |
There was a problem hiding this comment.
only set moe_weight_bits when quantization_config.bits is present
6e55719 to
c175a2d
Compare
No description provided.