-
Notifications
You must be signed in to change notification settings - Fork 600
Unset NZ in BF16/FP16 #4415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.11.0-dev
Are you sure you want to change the base?
Unset NZ in BF16/FP16 #4415
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
此 PR 旨在默认情况下对浮点场景禁用 NZ 转换。代码修改了 is_enable_nz 函数,使其行为依赖于数据类型。
我发现了两个关键问题:
- 在
vllm_ascend/utils.py中,is_enable_nz函数的实现有副作用,它会修改一个全局状态。一旦使用浮点类型调用,它将永久禁用 NZ 转换,这可能会影响后续使用整数类型的操作。 - 在
vllm_ascend/ops/common_fused_moe.py中,is_enable_nz函数被错误地以tensor.type(一个方法)而不是tensor.dtype(类型属性)调用。
建议修复这两个问题以确保代码的正确性和稳定性。
vllm_ascend/ops/common_fused_moe.py
Outdated
| layer.w2_weight = torch.nn.Parameter(w2_data, requires_grad=False) | ||
|
|
||
| if not is_310p() and is_enable_nz(): | ||
| if not is_310p() and is_enable_nz(layer.w13_weight.data.type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处调用 is_enable_nz 时使用了 layer.w13_weight.data.type,这似乎是一个错误。torch.Tensor.type 是一个返回类型字符串(例如 'torch.npu.FloatTensor')的方法,而不是 torch.dtype 对象。is_enable_nz 函数期望接收一个 torch.dtype 对象。
你应该使用 .dtype 属性来获取张量的数据类型。
| if not is_310p() and is_enable_nz(layer.w13_weight.data.type): | |
| if not is_310p() and is_enable_nz(layer.w13_weight.data.dtype): |
| if dtype in [torch.float16, torch.bfloat16]: | ||
| _ENABLE_NZ = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_enable_nz 函数中对全局变量 _ENABLE_NZ 的修改会产生意外的副作用。当使用浮点类型(torch.float16 或 torch.bfloat16)调用此函数时,_ENABLE_NZ 会被永久设置为 0。这将导致后续所有对 is_enable_nz 的调用(无论数据类型如何)都返回 0(或 False),从而禁用了 NZ 格式转换。
为了避免这种副作用,建议不要修改全局变量,而是根据 dtype 直接返回适当的值。
| if dtype in [torch.float16, torch.bfloat16]: | |
| _ENABLE_NZ = 0 | |
| if dtype in [torch.float16, torch.bfloat16]: | |
| return 0 |
…ctions (vllm-project#4370) ### What this PR does / why we need it? add single node PD disaggregation instructions for Qwen 2.5VL model. ### Does this PR introduce _any_ user-facing change? no --------- Signed-off-by: mazhixin <[email protected]> Signed-off-by: mazhixin000 <[email protected]> Co-authored-by: mazhixin <[email protected]> Signed-off-by: 刘哲续 <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
What this PR does / why we need it?
浮点场景默认不做NZ转换
-->
Does this PR introduce any user-facing change?
How was this patch tested?