[Bug]: LoRA + Ascend Quantization Fails on Qwen3-32B-W8A8 with AscendRMSNorm AttributeError

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of above commands here
```
PyTorch version: 2.7.1+cpu
Is debug build: False

OS: openEuler 22.03 (LTS-SP4) (aarch64)
GCC version: (GCC) 10.3.1
Clang version: Could not collect
CMake version: version 4.1.2
Libc version: glibc-2.34

Python version: 3.10.17 (main, May  8 2025, 08:13:48) [GCC 10.3.1] (64-bit runtime)
Python platform: Linux-6.8.0-31-generic-aarch64-with-glibc2.34

CPU:
Architecture:                         aarch64
CPU op-mode(s):                       64-bit
Byte Order:                           Little Endian
CPU(s):                               192
On-line CPU(s) list:                  0-191
Vendor ID:                            HiSilicon
Model name:                           Kunpeng-920
Model:                                0
Thread(s) per core:                   1
Core(s) per cluster:                  48
Socket(s):                            -
Cluster(s):                           4
Stepping:                             0x1
BogoMIPS:                             200.00
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                            12 MiB (192 instances)
L1i cache:                            12 MiB (192 instances)
L2 cache:                             96 MiB (192 instances)
L3 cache:                             192 MiB (8 instances)
NUMA node(s):                         8
NUMA node0 CPU(s):                    0-23
NUMA node1 CPU(s):                    24-47
NUMA node2 CPU(s):                    48-71
NUMA node3 CPU(s):                    72-95
NUMA node4 CPU(s):                    96-119
NUMA node5 CPU(s):                    120-143
NUMA node6 CPU(s):                    144-167
NUMA node7 CPU(s):                    168-191
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization
Vulnerability Spectre v2:             Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1.dev20250724
[pip3] torchvision==0.22.1
[pip3] transformers==4.57.1
[conda] Could not collect
vLLM Version: 0.11.0
vLLM Ascend Version: 0.11.0rc0

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 25.0.rc1.1               Version: 25.0.rc1.1                                           |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B4-1             | OK            | 103.1       40                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3386 / 65536         |
+===========================+===============+====================================================+
| 1     910B4-1             | OK            | 92.9        37                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3379 / 65536         |
+===========================+===============+====================================================+
| 2     910B4-1             | OK            | 90.4        37                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3380 / 65536         |
+===========================+===============+====================================================+
| 3     910B4-1             | OK            | 90.5        37                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3379 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
</details>




### 🐛 Describe the bug

## Summary

When serving **Qwen3-32B-w8a8** with **vLLM** using **Ascend quantization** **and** enabling **LoRA**, vLLM crashes with an `AscendRMSNorm` attribute error.

The issue only appears **when combining**:
- W8A8 quantized base model **and**
- `--enable-lora` + `--lora-modules ...` **and**
- `--quantization "ascend"`

Both **FP16+LoRA** and **W8A8 without LoRA** work correctly.

---

## Commands and Behaviors

### ❌ Failing command

```bash
vllm serve /root/data/Qwen/Qwen3-32B-w8a8 \
  --tensor_parallel_size=4 \
  --enable-lora \
  --lora-modules icd_model=./all_adaptor \
  --quantization "ascend" \
  --port 8000
```

**Error:**

```text
INFO 11-20 08:15:07 [parallel_state.py:1208] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 11-20 08:15:07 [parallel_state.py:1208] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 11-20 08:15:07 [parallel_state.py:1208] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
INFO 11-20 08:15:07 [parallel_state.py:1208] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
(Worker_TP0 pid=11657) INFO 11-20 08:15:29 [model_runner_v1.py:2627] Starting to load model /root/data/Qwen/Qwen3-32B-w8a8...
(Worker_TP0 pid=11657) INFO 11-20 08:15:29 [utils.py:60] Using the vLLM Ascend Quantization now!
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
(Worker_TP3 pid=11916) INFO 11-20 08:15:35 [model_runner_v1.py:2627] Starting to load model /root/data/Qwen/Qwen3-32B-w8a8...
(Worker_TP3 pid=11916) INFO 11-20 08:15:35 [utils.py:60] Using the vLLM Ascend Quantization now!
(Worker_TP2 pid=11680) INFO 11-20 08:15:36 [model_runner_v1.py:2627] Starting to load model /root/data/Qwen/Qwen3-32B-w8a8...
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:07<00:00,  7.02s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:07<00:00,  7.02s/it]
(Worker_TP0 pid=11657) 
(Worker_TP2 pid=11680) INFO 11-20 08:15:38 [utils.py:60] Using the vLLM Ascend Quantization now!
(Worker_TP0 pid=11657) INFO 11-20 08:15:38 [default_loader.py:267] Loading weights took 7.89 seconds
(Worker_TP0 pid=11657) INFO 11-20 08:15:38 [punica_selector.py:19] Using PunicaWrapperNPU.
(Worker_TP1 pid=11661) INFO 11-20 08:15:39 [model_runner_v1.py:2627] Starting to load model /root/data/Qwen/Qwen3-32B-w8a8...
(Worker_TP0 pid=11657) INFO 11-20 08:15:39 [model_runner_v1.py:2661] Loading model weights took 10.0903 GB
(Worker_TP1 pid=11661) INFO 11-20 08:15:40 [utils.py:60] Using the vLLM Ascend Quantization now!
(Worker_TP3 pid=11916) INFO 11-20 08:15:44 [default_loader.py:267] Loading weights took 8.23 seconds
(Worker_TP3 pid=11916) INFO 11-20 08:15:45 [punica_selector.py:19] Using PunicaWrapperNPU.
(Worker_TP3 pid=11916) INFO 11-20 08:15:46 [model_runner_v1.py:2661] Loading model weights took 10.0903 GB
(Worker_TP2 pid=11680) INFO 11-20 08:15:48 [default_loader.py:267] Loading weights took 10.01 seconds
(Worker_TP2 pid=11680) INFO 11-20 08:15:48 [punica_selector.py:19] Using PunicaWrapperNPU.
(Worker_TP2 pid=11680) INFO 11-20 08:15:50 [model_runner_v1.py:2661] Loading model weights took 10.0903 GB
(Worker_TP1 pid=11661) INFO 11-20 08:15:50 [default_loader.py:267] Loading weights took 10.28 seconds
(Worker_TP1 pid=11661) INFO 11-20 08:15:51 [punica_selector.py:19] Using PunicaWrapperNPU.
(Worker_TP1 pid=11661) INFO 11-20 08:15:52 [model_runner_v1.py:2661] Loading model weights took 10.0903 GB
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     output = func(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 205, in determine_available_memory
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     self.model_runner.profile_run()
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2509, in profile_run
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states = self._dummy_run(self.max_num_tokens,
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return func(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2475, in _dummy_run
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states = self._generate_dummy_run_hidden_states(
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2320, in _generate_dummy_run_hidden_states
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states = self.model(input_ids=input_ids,
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 323, in forward
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 310, in __call__
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     output = self.compiled_callable(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 364, in forward
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states, residual = layer(positions, hidden_states, residual)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 235, in forward
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     hidden_states, residual = self.post_attention_layernorm(
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/custom_op.py", line 44, in forward
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     return self._forward_method(*args, **kwargs)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/layernorm.py", line 70, in forward_oot
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     self, x, residual, self.next_need_quant_fusion_linear)
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671]     raise AttributeError(
(Worker_TP0 pid=11657) ERROR 11-20 08:15:55 [multiproc_executor.py:671] AttributeError: 'AscendRMSNorm' object has no attribute 'next_need_quant_fusion_linear'
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708]     raise RuntimeError(
(EngineCore_DP0 pid=11521) ERROR 11-20 08:15:55 [core.py:708] RuntimeError: Worker failed with error ''AscendRMSNorm' object has no attribute 'next_need_quant_fusion_linear'', please check the stack trace above for the root cause
(EngineCore_DP0 pid=11521) ERROR 11-20 08:16:05 [multiproc_executor.py:154] Worker proc VllmWorker-3 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=11521) Process EngineCore_DP0:
(EngineCore_DP0 pid=11521) Traceback (most recent call last):
(EngineCore_DP0 pid=11521)   File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=11521)     self.run()
(EngineCore_DP0 pid=11521)   File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=11521)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=11521)     raise e
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=11521)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=11521)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=11521)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=11521)     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=11521)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=11521)     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=11521)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=11521)     raise RuntimeError(
(EngineCore_DP0 pid=11521) RuntimeError: Worker failed with error ''AscendRMSNorm' object has no attribute 'next_need_quant_fusion_linear'', please check the stack trace above for the root cause
(APIServer pid=11383) Traceback (most recent call last):
(APIServer pid=11383)   File "/usr/local/python3.10.17/bin/vllm", line 8, in <module>
(APIServer pid=11383)     sys.exit(main())
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=11383)     args.dispatch_function(args)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=11383)     uvloop.run(run_server(args))
(APIServer pid=11383)   File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=11383)     return loop.run_until_complete(wrapper())
(APIServer pid=11383)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=11383)   File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=11383)     return await main
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=11383)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=11383)     async with build_async_engine_client(
(APIServer pid=11383)   File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=11383)     return await anext(self.gen)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=11383)     async with build_async_engine_client_from_engine_args(
(APIServer pid=11383)   File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=11383)     return await anext(self.gen)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=11383)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=11383)     return fn(*args, **kwargs)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=11383)     return cls(
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=11383)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=11383)     return AsyncMPClient(*client_args)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=11383)     super().__init__(
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=11383)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=11383)   File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=11383)     next(self.gen)
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=11383)     wait_for_engine_startup(
(APIServer pid=11383)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=11383)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=11383) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(APIServer pid=11383) [ERROR] 2025-11-20-08:16:12 (PID:11383, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
```

So the engine fails at runtime with:
> `'AscendRMSNorm' object has no attribute 'next_need_quant_fusion_linear'`

---

### ✅ Working command: FP16 + LoRA

```bash
vllm serve /root/data/Qwen/Qwen3-32B \
  --tensor_parallel_size=4 \
  --enable-lora \
  --lora-modules icd_model=./all_adaptor \
  --port 8000
```

- Base model in **FP16**
- LoRA adapter enabled
- **No** `--quantization "ascend"`

This configuration serves successfully.

---

### ✅ Working command: W8A8 without LoRA

```bash
vllm serve /root/data/Qwen/Qwen3-32B-w8a8 \
  --tensor_parallel_size=4 \
  --port 8000
```

- Base model **W8A8 quantized**
- **No** LoRA
- **No** `--quantization "ascend"` flag

This configuration also serves successfully.

---


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: LoRA + Ascend Quantization Fails on Qwen3-32B-W8A8 with AscendRMSNorm AttributeError #4308

Your current environment

🐛 Describe the bug

Summary

Commands and Behaviors

❌ Failing command

✅ Working command: FP16 + LoRA

✅ Working command: W8A8 without LoRA

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: LoRA + Ascend Quantization Fails on Qwen3-32B-W8A8 with AscendRMSNorm AttributeError #4308

Description

Your current environment

🐛 Describe the bug

Summary

Commands and Behaviors

❌ Failing command

✅ Working command: FP16 + LoRA

✅ Working command: W8A8 without LoRA

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions