Skip to content

Conversation

@henryxuxu0716
Copy link
Collaborator

What this PR does / why we need it?

浮点场景默认不做NZ转换
-->

Does this PR introduce any user-facing change?

How was this patch tested?

@henryxuxu0716 henryxuxu0716 changed the title V0.11.0 dev Unset NZ in BF16/FP Nov 25, 2025
@henryxuxu0716 henryxuxu0716 changed the title Unset NZ in BF16/FP Unset NZ in BF16/FP16 Nov 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此 PR 旨在默认情况下对浮点场景禁用 NZ 转换。代码修改了 is_enable_nz 函数,使其行为依赖于数据类型。

我发现了两个关键问题:

  1. vllm_ascend/utils.py 中,is_enable_nz 函数的实现有副作用,它会修改一个全局状态。一旦使用浮点类型调用,它将永久禁用 NZ 转换,这可能会影响后续使用整数类型的操作。
  2. vllm_ascend/ops/common_fused_moe.py 中,is_enable_nz 函数被错误地以 tensor.type(一个方法)而不是 tensor.dtype(类型属性)调用。

建议修复这两个问题以确保代码的正确性和稳定性。

layer.w2_weight = torch.nn.Parameter(w2_data, requires_grad=False)

if not is_310p() and is_enable_nz():
if not is_310p() and is_enable_nz(layer.w13_weight.data.type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

此处调用 is_enable_nz 时使用了 layer.w13_weight.data.type,这似乎是一个错误。torch.Tensor.type 是一个返回类型字符串(例如 'torch.npu.FloatTensor')的方法,而不是 torch.dtype 对象。is_enable_nz 函数期望接收一个 torch.dtype 对象。

你应该使用 .dtype 属性来获取张量的数据类型。

Suggested change
if not is_310p() and is_enable_nz(layer.w13_weight.data.type):
if not is_310p() and is_enable_nz(layer.w13_weight.data.dtype):

Comment on lines +81 to +82
if dtype in [torch.float16, torch.bfloat16]:
_ENABLE_NZ = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

is_enable_nz 函数中对全局变量 _ENABLE_NZ 的修改会产生意外的副作用。当使用浮点类型(torch.float16torch.bfloat16)调用此函数时,_ENABLE_NZ 会被永久设置为 0。这将导致后续所有对 is_enable_nz 的调用(无论数据类型如何)都返回 0(或 False),从而禁用了 NZ 格式转换。

为了避免这种副作用,建议不要修改全局变量,而是根据 dtype 直接返回适当的值。

Suggested change
if dtype in [torch.float16, torch.bfloat16]:
_ENABLE_NZ = 0
if dtype in [torch.float16, torch.bfloat16]:
return 0

mazhixin000 and others added 4 commits November 25, 2025 10:36
…ctions (vllm-project#4370)

### What this PR does / why we need it?

add single node PD disaggregation instructions for Qwen 2.5VL model.

### Does this PR introduce _any_ user-facing change?
no

---------

Signed-off-by: mazhixin <[email protected]>
Signed-off-by: mazhixin000 <[email protected]>
Co-authored-by: mazhixin <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>
Signed-off-by: 刘哲续 <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants