split_qkv_rmsnorm_rope_kernel问题

在910A上使用SGLang启动Qwen3，报如下错误，请问这是什么意思呢？
Capturing batches (bs=24 avail_mem=3.45 GB):   0%|                                                                                                                                      | 0/7 [00:06<?, ?it/s]
[2025-11-28 15:05:04 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 344, in __init__
    self.capture()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 502, in capture
    _capture_one_stream()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 486, in _capture_one_stream
    ) = self.capture_one_batch_size(bs, forward, stream_idx)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 693, in capture_one_batch_size
    run_once()
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 680, in run_once
    logits_output_or_pp_proxy_tensors = forward(
                                        ^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen3.py", line 427, in forward
    hidden_states = self.model(
                    ^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen2.py", line 362, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen3.py", line 309, in forward
    hidden_states, residual = self.layer_communicator.prepare_mlp(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/communicator.py", line 497, in prepare_mlp
    return self._communicate_with_all_reduce_and_layer_norm_fn(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/communicator.py", line 780, in _gather_hidden_states_and_residual
    _ = prepare_weight_cache(hidden_states, context.cache)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/utils/common.py", line 664, in prepare_weight_cache
    torch_npu.npu_prefetch(
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1243, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is split_qkv_rmsnorm_rope_kernel.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2025-11-28-15:05:03 (PID:53712, Device:1, RankID:-1) ERR00100 PTA call acl api failed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split_qkv_rmsnorm_rope_kernel问题 #212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

split_qkv_rmsnorm_rope_kernel问题 #212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions