-
-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[ROCm][quantization] improve OCP weight quant parser robust #34431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ROCm][quantization] improve OCP weight quant parser robust #34431
Conversation
Signed-off-by: xuebwang-amd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR fixes a crash in the OCP weight quantization parser by adding a type check. The change is correct but only partially addresses the underlying issue, as the same type of error can occur in other functions. I've added a comment with a recommendation for a more robust fix.
| if isinstance(weight_quant, list): | ||
| logger.debug( | ||
| "Quark model's weight quantization is incompatible with OCP_MX format: " | ||
| "weight_quant is a list (e.g. fp8_w4a8), OCP_MX requires a single dict." | ||
| ) | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check correctly prevents the reported crash when weight_quant is a list. However, the underlying issue can also cause crashes in _is_fp8_w8a8 and _is_static_tensor_w8a8, which are called earlier in _get_scheme_from_config.
The crash likely occurred in this function for the specific model because input_quant was None, allowing the previous checks to pass without error.
To create a more robust fix, I suggest adding similar isinstance(weight_quant, list) checks to _is_fp8_w8a8 and _is_static_tensor_w8a8 as well.
Purpose
This PR aims to fix an error:
Model: amd/Kimi-K2-Thinking-W4A8
Test Plan
Test Result