[Bug]: The Qwen3-VL-30B-A3B-Thinking model deployed by vllm  is not responding to requests.

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.2 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version                : 13.0.1 (1.7.0-beba8b)
CMake version                : version 3.26.0
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.8.0
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.3 (main, Oct 20 2025, 15:15:24) [GCC 13.3.0] (64-bit runtime)
Python platform              : Linux-5.10.134-008.18.kangaroo.al8.x86_64-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.0
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : GPU 0: PPU-ZW810E
Nvidia driver version        : 1.4.1-816bc0
cuDNN version                : Probably one of the following:
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.5
/usr/local/PPU_SDK/CUDA_SDK/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.5
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   52 bits physical, 57 bits virtual
Byte Order:                      Little Endian
CPU(s):                          10
On-line CPU(s) list:             0-9
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Processor
CPU family:                      6
Model:                           207
Thread(s) per core:              1
Core(s) per socket:              10
Socket(s):                       1
Stepping:                        2
BogoMIPS:                        5600.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd avx512vbmi umip pku waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       240 KiB (5 instances)
L1i cache:                       160 KiB (5 instances)
L2 cache:                        10 MiB (5 instances)
L3 cache:                        320 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-9
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.2.6.post1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-dali-cuda120==1.44.0
[pip3] nvidia-ml-py==12.560.30
[pip3] pyzmq==27.1.0
[pip3] torch==2.8.0
[pip3] torchao==0.11.0
[pip3] torchaudio==2.8.0
[pip3] torchdata==0.11.0
[pip3] torchtext==0.18.0
[pip3] torchtune==0.0.0
[pip3] torchvision==0.23.0
[pip3] transformers==4.57.0
[pip3] triton==3.4.0+git70b4432e
[pip3] triton_kernels==1.0.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	GPU0	CPU Affinity	NUMA Affinity
GPU0	X	0-9		0

Legend:
Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

GPU Rear Group:

  Rear ID 0: GPU 0

==============================
     Environment Variables
==============================
CUDA_TOOLKIT_ROOT=/usr/local/PPU_SDK/CUDA_SDK
NCCL_SOCKET_IFNAME=eth0
CUDNN_HOME=/usr/local/PPU_SDK/CUDA_SDK
NCCL_DEBUG=INFO
NCCL_IB_HCA=
CUDA_SDK=/usr/local/PPU_SDK/CUDA_SDK
CUDACXX=/usr/local/PPU_SDK/CUDA_SDK/bin/nvcc
CUDA_SDK_VER=cuda-12.9
CUDA_PATH=/usr/local/PPU_SDK/CUDA_SDK
LD_LIBRARY_PATH=/usr/local/PPU_SDK/CUDA_SDK/lib64:/usr/local/PPU_SDK/lib:/usr/local/lib:/usr/local/PPU_SDK/CUDA_SDK/lib64:/usr/local/PPU_SDK/lib:/usr/local/PPU_SDK/sailSHMEM/lib:
NCCL_IB_DISABLE=1
CUDA_HOME=/usr/local/PPU_SDK/CUDA_SDK
CUDA_HOME=/usr/local/PPU_SDK/CUDA_SDK
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
```

</details>


### 🐛 Describe the bug

root@qwen3-vl-30b-a3b-7d895d6bb6-mfc2r:/workspace/pytorch# 
curl -m 30 http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-VL-30B-A3B-Thinking",
    "prompt": "Hello, how are you?",
    "max_tokens": 10,
    "temperature": 0,
    "skip_special_tokens": true
  }'


curl: (28) Operation timed out after 30002 milliseconds with 0 bytes received

# thread trace from gdb can get from here
```
https://zcy-distribute.oss-cn-hangzhou.aliyuncs.com/zdebug/gdb_threads.log
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: The Qwen3-VL-30B-A3B-Thinking model deployed by vllm is not responding to requests. #31798

Your current environment

🐛 Describe the bug

thread trace from gdb can get from here

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: The Qwen3-VL-30B-A3B-Thinking model deployed by vllm is not responding to requests. #31798

Description

Your current environment

🐛 Describe the bug

thread trace from gdb can get from here

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions