Skip to content

[Bug]: failed to start qwen3-next-80b RuntimeError: Ascend config is not initialized. Please call init_ascend_config first. #3291

@razIove

Description

@razIove

Your current environment

quay.io/ascend/vllm-ascend:v0.11.0rc0

The output of `python collect_env.py`
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (aarch64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 4.1.0
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.1+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.11.13 (main, Jul 26 2025, 07:27:32) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.10.0-60.18.0.50.oe2203.aarch64-aarch64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          192
On-line CPU(s) list:             0-191
Vendor ID:                       HiSilicon
BIOS Vendor ID:                  HiSilicon
Model name:                      Kunpeng-920
BIOS Model name:                 HUAWEI Kunpeng 920 5250
Model:                           0
Thread(s) per core:              1
Core(s) per socket:              48
Socket(s):                       4
Stepping:                        0x1
BogoMIPS:                        200.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                       12 MiB (192 instances)
L1i cache:                       12 MiB (192 instances)
L2 cache:                        96 MiB (192 instances)
L3 cache:                        192 MiB (8 instances)
NUMA node(s):                    4
NUMA node0 CPU(s):               0-47
NUMA node1 CPU(s):               48-95
NUMA node2 CPU(s):               96-143
NUMA node3 CPU(s):               144-191
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1.dev20250724
[pip3] torchvision==0.22.1
[pip3] transformers==4.56.2
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.0rc3
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

无法启动服务

docker run -it --rm --name vllm-ascend-80b --privileged --network host --shm-size 500g --device /dev/davinci0 --device /dev/davinci1 --device /dev/davinci2 --device /dev/davinci3 --device /dev/davinci4 --device /dev/davinci5 --device /dev/davinci6 --device /dev/davinci7 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /etc/ascend_install.info:/etc/ascend_install.info -v /root/.cache:/root/.cache -v /data/ai/docker-image:/data/model -v $(pwd)/startup_script.sh:/startup_script.sh -w /vllm-workspace/vllm quay.io/ascend/vllm-ascend:v0.11.0rc0 bash

root@limxt:/vllm-workspace/vllm# cd /data/model/installer/
root@limxt:/data/model/installer# ./Ascend-BiSheng-toolkit_aarch64.run --install
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND_RUN_PACKAGE 100%
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] install start
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] mkdir install path /usr/local/Ascend/8.3.RC1/bisheng_toolkit successfully
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] mkdir install path /usr/local/Ascend/8.3.RC1/aarch64-linux successfully
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] mkdir install path /usr/local/Ascend/8.3.RC1/compiler successfully
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] mkdir install path /usr/local/Ascend/latest successfully
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] mkdir install path /usr/local/Ascend/latest/aarch64-linux successfully
Please make sure that
- PATH includes /usr/local/Ascend/8.3.RC1/compiler/bishengir/bin
[BiSheng-toolkit] [2025-09-30 05:05:28] [INFO] BiSheng-toolkit-8.3.RC1 install success
root@limxt:/data/model/installer# source /usr/local/Ascend/8.3.RC1/bisheng_toolkit/set_env.sh
root@limxt:/data/model/installer# pip install triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Processing ./triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Installing collected packages: triton-ascend
Successfully installed triton-ascend-3.2.0.dev20250914
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
root@limxt:/data/model/installer# vllm serve /data/model/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --enforce-eager
INFO 09-30 05:07:42 [init.py:36] Available plugins for group vllm.platform_plugins:
INFO 09-30 05:07:42 [init.py:38] - ascend -> vllm_ascend:register
INFO 09-30 05:07:42 [init.py:41] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 09-30 05:07:42 [init.py:207] Platform plugin ascend is activated
WARNING 09-30 05:07:47 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 09-30 05:07:50 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(APIServer pid=565) INFO 09-30 05:07:51 [api_server.py:1839] vLLM API server version 0.11.0rc3
(APIServer pid=565) INFO 09-30 05:07:51 [utils.py:233] non-default args: {'model_tag': '/data/model/Qwen3-Next-80B-A3B-Instruct', 'model': '/data/model/Qwen3-Next-80B-A3B-Instruct', 'enforce_eager': True, 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95}
(APIServer pid=565) INFO 09-30 05:07:51 [model.py:547] Resolved architecture: Qwen3NextForCausalLM
(APIServer pid=565) torch_dtype is deprecated! Use dtype instead!
(APIServer pid=565) INFO 09-30 05:07:51 [model.py:1510] Using max model len 262144
(APIServer pid=565) INFO 09-30 05:07:51 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=565) INFO 09-30 05:07:51 [config.py:297] Hybrid or mamba-based model detected: disabling prefix caching since it is not yet supported.
(APIServer pid=565) INFO 09-30 05:07:51 [config.py:308] Hybrid or mamba-based model detected: setting cudagraph mode to FULL_AND_PIECEWISE in order to optimize performance.
(APIServer pid=565) Traceback (most recent call last):
(APIServer pid=565) File "/usr/local/python3.11.13/bin/vllm", line 8, in
(APIServer pid=565) sys.exit(main())
(APIServer pid=565) ^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=565) args.dispatch_function(args)
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=565) uvloop.run(run_server(args))
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/init.py", line 105, in run
(APIServer pid=565) return runner.run(wrapper())
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=565) return self._loop.run_until_complete(task)
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=565) return await main
(APIServer pid=565) ^^^^^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=565) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=565) async with build_async_engine_client(
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in aenter
(APIServer pid=565) return await anext(self.gen)
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=565) async with build_async_engine_client_from_engine_args(
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in aenter
(APIServer pid=565) return await anext(self.gen)
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 206, in build_async_engine_client_from_engine_args
(APIServer pid=565) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/engine/arg_utils.py", line 1431, in create_engine_config
(APIServer pid=565) config = VllmConfig(
(APIServer pid=565) ^^^^^^^^^^^
(APIServer pid=565) File "/usr/local/python3.11.13/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 123, in init
(APIServer pid=565) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/config/init.py", line 306, in post_init
(APIServer pid=565) self.try_verify_and_update_config()
(APIServer pid=565) File "/vllm-workspace/vllm/vllm/config/init.py", line 642, in try_verify_and_update_config
(APIServer pid=565) HybridAttentionMambaModelConfig.verify_and_update_config(self)
(APIServer pid=565) File "/vllm-workspace/vllm-ascend/vllm_ascend/patch/platform/patch_common/patch_mamba_config.py", line 27, in verify_and_update_config
(APIServer pid=565) ascend_config = get_ascend_config()
(APIServer pid=565) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=565) File "/vllm-workspace/vllm-ascend/vllm_ascend/ascend_config.py", line 199, in get_ascend_config
(APIServer pid=565) raise RuntimeError(
(APIServer pid=565) RuntimeError: Ascend config is not initialized. Please call init_ascend_config first.
(APIServer pid=565) [ERROR] 2025-09-30-05:07:51 (PID:565, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions