-
Notifications
You must be signed in to change notification settings - Fork 485
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Collecting environment information...
==============================
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version : Could not collect
CMake version : Could not collect
Libc version : glibc-2.35
==============================
PyTorch Info
==============================
PyTorch version : 2.9.1+cu129
Is debug build : False
CUDA used to build PyTorch : 12.9
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-6.5.0-44-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.9.86
CUDA_MODULE_LOADING set to :
GPU models and configuration : GPU 0: NVIDIA RTX A4500
Nvidia driver version : 550.127.05
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7352 24-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU max MHz: 2300.0000
CPU min MHz: 1500.0000
BogoMIPS: 4599.92
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid
aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb
bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization: AMD-V
L1d cache: 768 KiB (24 instances)
L1i cache: 768 KiB (24 instances)
L2 cache: 12 MiB (24 instances)
L3 cache: 128 MiB (8 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-47
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.3
[pip3] mypy==1.11.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.17.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.5
[pip3] nvidia-ml-py==13.590.44
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] onnxruntime==1.23.2
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.1+cu129
[pip3] torchaudio==2.9.1+cu129
[pip3] torchsde==0.2.6
[pip3] torchvision==0.24.1+cu129
[pip3] transformers==4.57.6
[pip3] triton==3.5.1
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.14.0
vLLM-Omni Version : 0.14.0
vLLM Build Flags:
CUDA Archs: 7.0 7.5 8.0 8.9 9.0 10.0 12.0; ROCm: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-47 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,dr
iver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,dr
iver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551
brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,dri
ver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,d
river>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 bran
d=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,dri
ver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,dr
iver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=
cloudgaming,driver>=570,driver<571
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.9 9.0 10.0 12.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.9.1
LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/cv2/../../lib64:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root
Your code version
The commit id or version of vllm
Whichever is on the docker image vllm/vllm-omni:v0.14.0
The commit id or version of vllm-omni
vllm/vllm-omni:v0.14.0
🐛 Describe the bug
I am running vllm container vllm/vllm-omni:v0.14.0 in runpod serverless environment with the following startup command:
vllm serve qwen/qwen3-tts-12hz-1.7b-voicedesign --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml --omni --port 8000 --trust-remote-code --enforce-eagerWhen making the following request:
curl --request POST \
--url https://<redacted>.api.runpod.ai/v1/audio/speech \
--header 'content-type: application/json' \
--data '{
"input": "Newton’s First Law tells us that nature is \"lazy\" – it won’t start moving something, stop it, or turn it unless something else steps in. That “something else” is an unbalanced force. Any time you see a change in motion, ask yourself: What force caused that change? That’s the heart of physics!",
"task_type": "VoiceDesign",
"instructions": "A professor’s voice is warm and resonant, usually a mello
w mid‑range baritone with a faint, seasoned rasp. It speaks with clear enunciation and a measured cadence—steady pacing punctuated by purposeful pauses that let ideas settle. A subtle spark of enthusiasm surfaces in occasional quicker, brighter inflections, while a gentle authority steadies the tone, making the lecture both commanding and approachable."
}'I get the following error in the vllm logs:
2026-02-18T14:28:56.172572040Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:300: SyntaxWarning: invalid escape sequence '\('
2026-02-18T14:28:56.172631672Z m = re.match('([su]([0-9]{1,2})p?) \(([0-9]{1,2}) bit\)$', token)
2026-02-18T14:28:56.172637822Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:301: SyntaxWarning: invalid escape sequence '\('
2026-02-18T14:28:56.172642652Z m2 = re.match('([su]([0-9]{1,2})p?)( \(default\))?$', token)
2026-02-18T14:28:56.172647752Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:310: SyntaxWarning: invalid escape sequence '\('
2026-02-18T14:28:56.172652542Z elif re.match('(flt)p?( \(default\))?$', token):
2026-02-18T14:28:56.172657322Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:314: SyntaxWarning: invalid escape sequence '\('
2026-02-18T14:28:56.172662302Z elif re.match('(dbl)p?( \(default\))?$', token):
2026-02-18T14:28:59.732543093Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [api_server.py:1272] vLLM API server version 0.14.0
2026-02-18T14:28:59.736118247Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [utils.py:263] non-default args: {'model_tag': 'qwen/qwen3-tts-12hz-1.7b-voicedesign', 'model': 'qwen/qwen3-tts-12hz-1.7b-voicedesign', 'trust_remote_code': True, 'enforce_eager': True}
2026-02-18T14:28:59.736367153Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [omni.py:119] Initializing stages for model: qwen/qwen3-tts-12hz-1.7b-voicedesign
2026-02-18T14:28:59.743534950Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [initialization.py:234] Loaded OmniTransferConfig with 0 connector configurations
2026-02-18T14:28:59.744493553Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'qwen3_tts', 'model_arch': 'Qwen3TTSForConditionalGeneration', 'worker_type': 'generation', 'scheduler_cls': 'vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler', 'enforce_eager': True, 'trust_remote_code': True, 'async_scheduling': False, 'enable_prefix_caching': False, 'engine_output_type': 'audio', 'gpu_memory_utilization': 0.1, 'distributed_executor_backend': 'mp', 'max_num_batched_tokens': 1000000, 'max_num_seqs': 1, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'audio'}
2026-02-18T14:28:59.748831335Z �[0;36m(APIServer pid=19)�[0;0m INFO 02-18 06:28:59 [omni.py:338] [AsyncOrchestrator] Waiting for 1 stages to initialize (timeout: 60000s)
2026-02-18T14:29:08.800698630Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:09.498349622Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:10.122278785Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:10.124692471Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:10.124727842Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:10.125071770Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:10.126835832Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:22.701502257Z [Stage-0] INFO 02-18 06:29:22 [model.py:530] Resolved architecture: Qwen3TTSForConditionalGeneration
2026-02-18T14:29:23.481158823Z [Stage-0] INFO 02-18 06:29:23 [model.py:1545] Using max model len 32768
2026-02-18T14:29:23.488697880Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:24.098332118Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100565690Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100596601Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100929609Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:24.102540256Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:24.423684218Z [Stage-0] INFO 02-18 06:29:24 [model.py:212] Resolved architecture: Qwen3TTSForConditionalGeneration
2026-02-18T14:29:25.038284822Z [Stage-0] INFO 02-18 06:29:25 [model.py:1545] Using max model len 32768
2026-02-18T14:29:25.039334357Z [Stage-0] INFO 02-18 06:29:25 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=1000000.
2026-02-18T14:29:25.039382098Z [Stage-0] WARNING 02-18 06:29:25 [scheduler.py:271] max_num_batched_tokens (1000000) exceeds max_num_seqs * max_model_len (32768). This may lead to unexpected behavior.
2026-02-18T14:29:25.040208647Z [Stage-0] INFO 02-18 06:29:25 [vllm.py:630] Asynchronous scheduling is disabled.
2026-02-18T14:29:25.040227468Z [Stage-0] WARNING 02-18 06:29:25 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
2026-02-18T14:29:25.040474044Z [Stage-0] INFO 02-18 06:29:25 [vllm.py:765] Cudagraph is disabled under eager mode
2026-02-18T14:29:37.876934769Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] INFO 02-18 06:29:37 [core.py:97] Initializing a V1 LLM engine (v0.14.0) with config: model='qwen/qwen3-tts-12hz-1.7b-voicedesign', speculative_config=None, tokenizer='qwen/qwen3-tts-12hz-1.7b-voicedesign', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen/qwen3-tts-12hz-1.7b-voicedesign, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [1000000], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
2026-02-18T14:29:37.877005821Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] WARNING 02-18 06:29:37 [multiproc_executor.py:880] Reducing Torch parallelism from 24 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
2026-02-18T14:29:47.218232822Z [Stage-0] INFO 02-18 06:29:47 [parallel_state.py:1214] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:35731 backend=nccl
2026-02-18T14:29:47.295536146Z [Stage-0] INFO 02-18 06:29:47 [parallel_state.py:1425] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
2026-02-18T14:29:51.091422431Z /bin/sh: 1: sox: not found
2026-02-18T14:29:51.091741669Z [2026-02-18 06:29:51] WARNING __init__.py:10: SoX could not be found!
2026-02-18T14:29:51.091770309Z If you do not have SoX, proceed here:
2026-02-18T14:29:51.091775789Z - - - http://sox.sourceforge.net/ - - -
2026-02-18T14:29:51.091786110Z If you do (or think that you should) have SoX, double-check your
2026-02-18T14:29:51.091791360Z path variables.
2026-02-18T14:29:51.142771355Z ********
2026-02-18T14:29:51.142780295Z Warning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
2026-02-18T14:29:51.142785705Z ********
2026-02-18T14:29:51.142795646Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [gpu_model_runner.py:3808] Starting to load model qwen/qwen3-tts-12hz-1.7b-voicedesign...
2026-02-18T14:29:51.420139410Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] WARNING 02-18 06:29:51 [qwen3_tts.py:76] Flash-Attn is not installed. Using default PyTorch attention implementation.
2026-02-18T14:29:51.826539492Z �[0;36m(Worker pid=638)�[0;0m `torch_dtype` is deprecated! Use `dtype` instead!
2026-02-18T14:29:51.826773957Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829048781Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829066331Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829423870Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:51.831046948Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:53.496125569Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:53 [weight_utils.py:46] Using model weights format ['speech_tokenizer/*']
2026-02-18T14:29:54.139530029Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:54 [configuration_qwen3_tts_tokenizer_v2.py:156] encoder_config is None. Initializing encoder with default values
2026-02-18T14:29:54.139555029Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:29:54 [configuration_qwen3_tts_tokenizer_v2.py:159] decoder_config is None. Initializing decoder with default values
2026-02-18T14:30:01.221393561Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:30:01 [weight_utils.py:550] No model.safetensors.index.json found in remote.
2026-02-18T14:30:01.222608949Z �[0;36m(Worker pid=638)�[0;0m
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
2026-02-18T14:30:01.243241753Z �[0;36m(Worker pid=638)�[0;0m
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 48.80it/s]
2026-02-18T14:30:01.243269964Z �[0;36m(Worker pid=638)�[0;0m
2026-02-18T14:30:01.246706995Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:30:01 [default_loader.py:291] Loading weights took 0.02 seconds
2026-02-18T14:30:01.809345730Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:30:01 [gpu_model_runner.py:3905] Model loading took 3.89 GiB memory and 9.827631 seconds
2026-02-18T14:30:01.814070601Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] INFO 02-18 06:30:01 [qwen3_tts.py:133] Profile run detected (empty text). Capping max_new_tokens to 2.
2026-02-18T14:30:01.816257282Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] WorkerProc hit an exception.
2026-02-18T14:30:01.816279763Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] Traceback (most recent call last):
2026-02-18T14:30:01.816285573Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 817, in worker_busy_loop
2026-02-18T14:30:01.816291533Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] output = func(*args, **kwargs)
2026-02-18T14:30:01.816296713Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816302033Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 525, in compile_or_warm_up_model
2026-02-18T14:30:01.816307474Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] hidden_states, last_hidden_states = self.model_runner._dummy_run(
2026-02-18T14:30:01.816322744Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816329074Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
2026-02-18T14:30:01.816334334Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return func(*args, **kwargs)
2026-02-18T14:30:01.816370425Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816383085Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_generation_model_runner.py", line 633, in _dummy_run
2026-02-18T14:30:01.816393146Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] outputs = self.model(
2026-02-18T14:30:01.816400706Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^
2026-02-18T14:30:01.816410336Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2026-02-18T14:30:01.816416906Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return self._call_impl(*args, **kwargs)
2026-02-18T14:30:01.816426986Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816438267Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2026-02-18T14:30:01.816447817Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return forward_call(*args, **kwargs)
2026-02-18T14:30:01.816454977Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816465897Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/qwen3_tts/qwen3_tts.py", line 148, in forward
2026-02-18T14:30:01.816494648Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] raise ValueError(f"Invalid task type: {task_type}")
2026-02-18T14:30:01.816510648Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ValueError: Invalid task type: voicedesign
2026-02-18T14:30:01.816521509Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] Traceback (most recent call last):
2026-02-18T14:30:01.816531519Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 817, in worker_busy_loop
2026-02-18T14:30:01.816541989Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] output = func(*args, **kwargs)
2026-02-18T14:30:01.816552579Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816563059Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 525, in compile_or_warm_up_model
2026-02-18T14:30:01.816572820Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] hidden_states, last_hidden_states = self.model_runner._dummy_run(
2026-02-18T14:30:01.816583150Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816593130Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
2026-02-18T14:30:01.816603740Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return func(*args, **kwargs)
2026-02-18T14:30:01.816613781Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816624481Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_generation_model_runner.py", line 633, in _dummy_run
2026-02-18T14:30:01.816634421Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] outputs = self.model(
2026-02-18T14:30:01.816645211Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^
2026-02-18T14:30:01.816654862Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2026-02-18T14:30:01.816663592Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return self._call_impl(*args, **kwargs)
2026-02-18T14:30:01.816673872Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816684362Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2026-02-18T14:30:01.816692993Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] return forward_call(*args, **kwargs)
2026-02-18T14:30:01.816698043Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816703113Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/qwen3_tts/qwen3_tts.py", line 148, in forward
2026-02-18T14:30:01.816717983Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] raise ValueError(f"Invalid task type: {task_type}")
2026-02-18T14:30:01.816733163Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ValueError: Invalid task type: voicedesign
2026-02-18T14:30:01.816739334Z �[0;36m(Worker pid=638)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]
2026-02-18T14:30:01.818219598Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] EngineCore failed to start.
2026-02-18T14:30:01.818234088Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] Traceback (most recent call last):
2026-02-18T14:30:01.818240229Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 927, in run_engine_core
2026-02-18T14:30:01.818245899Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
2026-02-18T14:30:01.818251439Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.818256569Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 692, in __init__
2026-02-18T14:30:01.818261789Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] super().__init__(
2026-02-18T14:30:01.818266999Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 113, in __init__
2026-02-18T14:30:01.818272119Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
2026-02-18T14:30:01.818277349Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.818282480Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 270, in _initialize_kv_caches
2026-02-18T14:30:01.818287550Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] self.model_executor.initialize_from_config(kv_cache_configs)
2026-02-18T14:30:01.818293540Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
2026-02-18T14:30:01.818299040Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] self.collective_rpc("compile_or_warm_up_model")
2026-02-18T14:30:01.818304470Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
2026-02-18T14:30:01.818309650Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] return aggregate(get_response())
2026-02-18T14:30:01.818314780Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] ^^^^^^^^^^^^^^
2026-02-18T14:30:01.818319880Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
2026-02-18T14:30:01.818325091Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] raise RuntimeError(
2026-02-18T14:30:01.818330171Z �[0;36m(EngineCore_DP0 pid=486)�[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] RuntimeError: Worker failed with error 'Invalid task type: voicedesign', please check the stack trace above for the root cause
The error seems to be coming from the forward function in the Qwen3TTSModelForGeneration class. For some reason task_type = runtime_additional_information.pop("task_type", [self.task_type])[0] is resolving to a lowercase voicedesign.
This causes the task_type == "VoiceDesign" to fail. In my request I do have VoiceDesign capitalized correctly. I would submit my own PR request to fix the bug, but I don't see where runtime_additional_information is coming from in the code base.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working