-
Notifications
You must be signed in to change notification settings - Fork 82
Description
在910A上使用SGLang启动Qwen3,报如下错误,请问这是什么意思呢?
Capturing batches (bs=24 avail_mem=3.45 GB): 0%| | 0/7 [00:06<?, ?it/s]
[2025-11-28 15:05:04 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 344, in init
self.capture()
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 502, in capture
_capture_one_stream()
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 486, in _capture_one_stream
) = self.capture_one_batch_size(bs, forward, stream_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 693, in capture_one_batch_size
run_once()
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 680, in run_once
logits_output_or_pp_proxy_tensors = forward(
^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen3.py", line 427, in forward
hidden_states = self.model(
^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen2.py", line 362, in forward
hidden_states, residual = layer(
^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/models/qwen3.py", line 309, in forward
hidden_states, residual = self.layer_communicator.prepare_mlp(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/communicator.py", line 497, in prepare_mlp
return self._communicate_with_all_reduce_and_layer_norm_fn(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/layers/communicator.py", line 780, in _gather_hidden_states_and_residual
_ = prepare_weight_cache(hidden_states, context.cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/sglang/srt/utils/common.py", line 664, in prepare_weight_cache
torch_npu.npu_prefetch(
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1243, in call
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is split_qkv_rmsnorm_rope_kernel.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2025-11-28-15:05:03 (PID:53712, Device:1, RankID:-1) ERR00100 PTA call acl api failed.