[main][bugfix] Fix MatmulNZ format bug on some machines (#2549)

rjg-lyh · web-flow · commit 358ba6899401 · 2025-08-27T09:08:17.000+08:00
### What this PR does / why we need it? This PR fixes the bug on some machines where quantmatmul failed to run with the NZ format. The change ensures proper execution under the expected data layout. ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@b5d34af Signed-off-by: rjg-lyh <1318825571@qq.com>
diff --git a/vllm_ascend/worker/model_runner_v1.py b/vllm_ascend/worker/model_runner_v1.py
@@ -112,6 +112,9 @@
 
 import vllm_ascend.envs as envs_ascend
 
+# if true, allow tensor initialization and casting with internal format (e.g., NZ)
+torch.npu.config.allow_internal_format = True
+
 if is_310p():
     torch_npu.npu.set_compile_mode(jit_compile=False)
     ACL_FORMAT = ACL_FORMAT_FRACTAL_NZ