Skip to content

[Bug]: [NPU]Qwen-Image-2512 use usp failed in 1328*1328Β #845

@zyqzhang1996

Description

@zyqzhang1996

Your current environment

The output of python collect_env.py
800T A2

Your code version

The commit id or version of vllm-omni
commit id: 3fc4f988eabf562572821896847dc96b69a4cf74

πŸ› Describe the bug

also occurred on 15841056 and10561584

export ASCEND_RT_VISIBLE_DEVICES=4,5
export HCCL_OP_EXPANSION_MODE="AIV"

python /vllm-workspace/vllm-omni/examples/offline_inference/text_to_image/text_to_image.py \
  --model /home/weights/Qwen-Image-2512/ \
  --prompt "a cup of coffee on the table" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 50 \
  --height 1328 \
  --width 1328 \
  --output outputs/coffee.png \
  --cache_backend cache_dit \
  --ulysses_degree 2 \

log:

[rank0]:[E119 16:32:47.013946230 compiler_depend.ts:444] operator():build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 NPU function error: call aclnnFlashAttentionScore failed, error code is 561103
[ERROR] 2026-01-19-16:32:47 (PID:10443, Device:0, RankID:0) ERR00100 PTA call acl api failed.
EZ9999: Inner Error!
EZ9999[PID: 10443] 2026-01-19-16:32:47.486.947 (EZ9999):  get unsupported atten_mask shape, the shape is [1, 6902]. B=[1], N=[12], Sq=[6902], Skv=[6902], supported atten_mask shape can be [B, N, Sq, Skv], [B, 1, Sq, Skv], [1, 1, Sq, Skv] and [Sq, Skv].[FUNC:AnalyzeOptionalInput][FILE:flash_attention_score_tiling_general.cpp][LINE:1621]
        TraceBack (most recent call last):
       fail to analyze context info.[FUNC:GetShapeAttrsInfo][FILE:flash_attention_score_tiling_general.cpp][LINE:866]
       Tiling failed
       Tiling Failed.
       Kernel Run failed. opType: 38, FlashAttentionScore
       launch failed for FlashAttentionScore, errno:561103.

Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0xffffa92848c0 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x68 (0xffffa922c140 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x111d6b4 (0xfffdf972d6b4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0x29f0894 (0xfffdfb000894 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x9cc700 (0xfffdf8fdc700 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x9cd2dc (0xfffdf8fdd2dc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x9cb1f8 (0xfffdf8fdb1f8 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0xd29cc (0xffffb6e629cc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x80398 (0xffffb7040398 in /lib/aarch64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0xe9e9c (0xffffb70a9e9c in /lib/aarch64-linux-gnu/libc.so.6)

[rank1]:[E119 16:32:47.014632430 compiler_depend.ts:444] operator():build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 NPU function error: call aclnnFlashAttentionScore failed, error code is 561103
[ERROR] 2026-01-19-16:32:47 (PID:10444, Device:1, RankID:1) ERR00100 PTA call acl api failed.
EZ9999: Inner Error!
EZ9999[PID: 10444] 2026-01-19-16:32:47.487.817 (EZ9999):  get unsupported atten_mask shape, the shape is [1, 6902]. B=[1], N=[12], Sq=[6902], Skv=[6902], supported atten_mask shape can be [B, N, Sq, Skv], [B, 1, Sq, Skv], [1, 1, Sq, Skv] and [Sq, Skv].[FUNC:AnalyzeOptionalInput][FILE:flash_attention_score_tiling_general.cpp][LINE:1621]
        TraceBack (most recent call last):
       fail to analyze context info.[FUNC:GetShapeAttrsInfo][FILE:flash_attention_score_tiling_general.cpp][LINE:866]
       Tiling failed
       Tiling Failed.
       Kernel Run failed. opType: 38, FlashAttentionScore
       launch failed for FlashAttentionScore, errno:561103.

Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:1042 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0xffffaf0b48c0 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x68 (0xffffaf05c140 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x111d6b4 (0xfffdff98d6b4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: <unknown function> + 0x29f0894 (0xfffe01260894 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: <unknown function> + 0x9cc700 (0xfffdff23c700 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: <unknown function> + 0x9cd2dc (0xfffdff23d2dc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: <unknown function> + 0x9cb1f8 (0xfffdff23b1f8 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: <unknown function> + 0xd29cc (0xffffbcca29cc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x80398 (0xffffbce80398 in /lib/aarch64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0xe9e9c (0xffffbcee9e9c in /lib/aarch64-linux-gnu/libc.so.6)

[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Error executing RPC: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionScore.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] [ERROR] 2026-01-19-16:32:47 (PID:10443, Device:0, RankID:0) ERR00100 PTA call acl api failed.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Traceback (most recent call last):
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/gpu_diffusion_worker.py", line 265, in execute_rpc
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     result = func(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]              ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/npu/npu_worker.py", line 118, in generate
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self.execute_model(requests, self.od_config)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return func(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/npu/npu_worker.py", line 138, in execute_model
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     output = self.pipeline.forward(req)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]              ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 771, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     latents = self.diffuse(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 610, in diffuse
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     noise_pred = self.transformer(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                  ^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 439, in new_forward_with_hf_hook
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     outputs = new_forward(self, *args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 427, in new_forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     outputs = original_forward(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 906, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     encoder_hidden_states, hidden_states = block(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                                            ^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 321, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     ) = self.call_Mn_blocks(  # middle
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 448, in call_Mn_blocks
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     hidden_states = block(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                     ^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 653, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     attn_output = self.attn(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                   ^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 439, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     if hidden_states_mask is not None and hidden_states_mask.all():
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionScore.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] [ERROR] 2026-01-19-16:32:47 (PID:10443, Device:0, RankID:0) ERR00100 PTA call acl api failed.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Error executing RPC: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionScore.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] [ERROR] 2026-01-19-16:32:47 (PID:10444, Device:1, RankID:1) ERR00100 PTA call acl api failed.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Traceback (most recent call last):
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/gpu_diffusion_worker.py", line 265, in execute_rpc
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     result = func(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]              ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/npu/npu_worker.py", line 118, in generate
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self.execute_model(requests, self.od_config)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return func(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/worker/npu/npu_worker.py", line 138, in execute_model
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     output = self.pipeline.forward(req)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]              ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 771, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     latents = self.diffuse(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py", line 610, in diffuse
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     noise_pred = self.transformer(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                  ^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 439, in new_forward_with_hf_hook
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     outputs = new_forward(self, *args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_adapters/cache_adapter.py", line 427, in new_forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     outputs = original_forward(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 906, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     encoder_hidden_states, hidden_states = block(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                                            ^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 321, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     ) = self.call_Mn_blocks(  # middle
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/cache_dit/caching/cache_blocks/pattern_base.py", line 448, in call_Mn_blocks
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     hidden_states = block(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                     ^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 653, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     attn_output = self.attn(
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]                   ^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return self._call_impl(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     return forward_call(*args, **kwargs)
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]   File "/workspace/vllm-omni/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py", line 439, in forward
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]     if hidden_states_mask is not None and hidden_states_mask.all():
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnFlashAttentionScore.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270] [ERROR] 2026-01-19-16:32:47 (PID:10444, Device:1, RankID:1) ERR00100 PTA call acl api failed.
[Stage-0] ERROR 01-19 16:32:47 [gpu_diffusion_worker.py:270]
[Stage-0] ERROR 01-19 16:32:47 [diffusion_engine.py:187] Generation failed: 'dict' object has no attribute 'error'
INFO 01-19 16:32:47 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-19 16:32:47 [log_utils.py:550]  'request_id': '0_4c569671-ae5b-4ee4-a2c3-6a661d0a4f58',
INFO 01-19 16:32:47 [log_utils.py:550]  'e2e_time_ms': 189.84675407409668,
INFO 01-19 16:32:47 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-19 16:32:47 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-19 16:32:47 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-19 16:32:47 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-19 16:32:47 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 183.68911743164062,
INFO 01-19 16:32:47 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-19 16:32:47 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  5.28img/s, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-19 16:32:47 [omni.py:782] [Summary] {'e2e_requests': 1,| 1/1 [00:00<00:00,  5.29img/s, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-19 16:32:47 [omni.py:782]  'e2e_total_time_ms': 191.65873527526855,
INFO 01-19 16:32:47 [omni.py:782]  'e2e_sum_time_ms': 189.84675407409668,
INFO 01-19 16:32:47 [omni.py:782]  'e2e_total_tokens': 0,
INFO 01-19 16:32:47 [omni.py:782]  'e2e_avg_time_per_request_ms': 189.84675407409668,
INFO 01-19 16:32:47 [omni.py:782]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-19 16:32:47 [omni.py:782]  'wall_time_ms': 191.65873527526855,
INFO 01-19 16:32:47 [omni.py:782]  'final_stage_id': {'0_4c569671-ae5b-4ee4-a2c3-6a661d0a4f58': 0},
INFO 01-19 16:32:47 [omni.py:782]  'stages': [{'stage_id': 0,
INFO 01-19 16:32:47 [omni.py:782]              'requests': 1,
INFO 01-19 16:32:47 [omni.py:782]              'tokens': 0,
INFO 01-19 16:32:47 [omni.py:782]              'total_time_ms': 190.59062004089355,
INFO 01-19 16:32:47 [omni.py:782]              'avg_time_per_request_ms': 190.59062004089355,
INFO 01-19 16:32:47 [omni.py:782]              'avg_tokens_per_s': 0.0}],
INFO 01-19 16:32:47 [omni.py:782]  'transfers': []}
Adding requests:   0%|                                                                                                      | 0/1 [00:00<?, ?it/s]
[Stage-0] INFO 01-19 16:32:47 [omni_stage.py:677] Received shutdown signal
[Stage-0] INFO 01-19 16:32:47 [gpu_diffusion_worker.py:304] Worker 1: Received shutdown message
[Stage-0] INFO 01-19 16:32:47 [gpu_diffusion_worker.py:304] Worker 0: Received shutdown message
[Stage-0] INFO 01-19 16:32:47 [gpu_diffusion_worker.py:325] event loop terminated.
[Stage-0] INFO 01-19 16:32:47 [gpu_diffusion_worker.py:325] event loop terminated.
[Stage-0] INFO 01-19 16:32:47 [npu_worker.py:251] Worker 0: Shutdown complete.
[Stage-0] INFO 01-19 16:32:47 [npu_worker.py:251] Worker 1: Shutdown complete.
Total generation time: 5.1984 seconds (5198.37 ms)
INFO 01-19 16:32:52 [text_to_image.py:168] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[namespace(request_id='0_4c569671-ae5b-4ee4-a2c3-6a661d0a4f58', output=None)], images=[], prompt=None, latents=None, metrics={})]
Traceback (most recent call last):
  File "/vllm-workspace/vllm-omni/examples/offline_inference/text_to_image/text_to_image.py", line 198, in <module>
    main()
  File "/vllm-workspace/vllm-omni/examples/offline_inference/text_to_image/text_to_image.py", line 177, in main
    raise ValueError("Invalid request_output structure or missing 'images' key")
ValueError: Invalid request_output structure or missing 'images' key
[ERROR] 2026-01-19-16:32:52 (PID:10169, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

NPUPR related to Ascend NPUbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions