[Bug]: Qwen3TTSModelForGeneration RuntimeError: Worker failed with error 'Invalid task type: voicedesign'

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.1+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.5.0-44-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.86
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : GPU 0: NVIDIA RTX A4500
Nvidia driver version        : 550.127.05
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      43 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             48
On-line CPU(s) list:                0-47
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7352 24-Core Processor
CPU family:                         23
Model:                              49
Thread(s) per core:                 2
Core(s) per socket:                 24
Socket(s):                          1
Stepping:                           0
Frequency boost:                    enabled
CPU max MHz:                        2300.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4599.92
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid 
aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb
 bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local 
clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                     AMD-V
L1d cache:                          768 KiB (24 instances)
L1i cache:                          768 KiB (24 instances)
L2 cache:                           12 MiB (24 instances)
L3 cache:                           128 MiB (8 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-47
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.3
[pip3] mypy==1.11.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.17.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.5
[pip3] nvidia-ml-py==13.590.44
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] onnxruntime==1.23.2
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.1+cu129
[pip3] torchaudio==2.9.1+cu129
[pip3] torchsde==0.2.6
[pip3] torchvision==0.24.1+cu129
[pip3] transformers==4.57.6
[pip3] triton==3.5.1
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.14.0
vLLM-Omni Version            : 0.14.0
vLLM Build Flags:
  CUDA Archs: 7.0 7.5 8.0 8.9 9.0 10.0 12.0; ROCm: Disabled
GPU Topology:
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-47    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,dr
iver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,dr
iver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551
 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,dri
ver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,d
river>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 bran
d=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,dri
ver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,dr
iver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=
cloudgaming,driver>=570,driver<571
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.9 9.0 10.0 12.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.9.1
LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/cv2/../../lib64:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root
```

</details>


### Your code version

<details>
<summary>The commit id or version of vllm</summary>

```text
Whichever is on the docker image vllm/vllm-omni:v0.14.0
```
</details>
<details>
<summary>The commit id or version of vllm-omni</summary>

```text
vllm/vllm-omni:v0.14.0
```
</details>


### 🐛 Describe the bug

I am running vllm container `vllm/vllm-omni:v0.14.0` in runpod serverless environment with the following startup command:

```bash
vllm serve qwen/qwen3-tts-12hz-1.7b-voicedesign --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml --omni --port 8000 --trust-remote-code --enforce-eager
```

When making the following request:

```bash
curl --request POST \
  --url https://<redacted>.api.runpod.ai/v1/audio/speech \
  --header 'content-type: application/json' \
  --data '{
    "input": "Newton’s First Law tells us that nature is \"lazy\" – it won’t start moving something, stop it, or turn it unless something else steps in. That “something else” is an unbalanced force. Any time you see a change in motion, ask yourself: What force caused that change? That’s the heart of physics!",
    "task_type": "VoiceDesign",
    "instructions": "A professor’s voice is warm and resonant, usually a mello
  w mid‑range baritone with a faint, seasoned rasp. It speaks with clear enunciation and a measured cadence—steady pacing punctuated by purposeful pauses that let ideas settle. A subtle spark of enthusiasm surfaces in occasional quicker, brighter inflections, while a gentle authority steadies the tone, making the lecture both commanding and approachable."
}'
```

I get the following error in the vllm logs:

```
2026-02-18T14:28:56.172572040Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:300: SyntaxWarning: invalid escape sequence '$'
2026-02-18T14:28:56.172631672Z   m = re.match('([su]([0-9]{1,2})p?) \(([0-9]{1,2}) bit$$', token)
2026-02-18T14:28:56.172637822Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:301: SyntaxWarning: invalid escape sequence '$'
2026-02-18T14:28:56.172642652Z   m2 = re.match('([su]([0-9]{1,2})p?)( \(default$)?$', token)
2026-02-18T14:28:56.172647752Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:310: SyntaxWarning: invalid escape sequence '$'
2026-02-18T14:28:56.172652542Z   elif re.match('(flt)p?( \(default$)?$', token):
2026-02-18T14:28:56.172657322Z /usr/local/lib/python3.12/dist-packages/pydub/utils.py:314: SyntaxWarning: invalid escape sequence '$'
2026-02-18T14:28:56.172662302Z   elif re.match('(dbl)p?( \(default$)?$', token):
2026-02-18T14:28:59.732543093Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [api_server.py:1272] vLLM API server version 0.14.0
2026-02-18T14:28:59.736118247Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [utils.py:263] non-default args: {'model_tag': 'qwen/qwen3-tts-12hz-1.7b-voicedesign', 'model': 'qwen/qwen3-tts-12hz-1.7b-voicedesign', 'trust_remote_code': True, 'enforce_eager': True}
2026-02-18T14:28:59.736367153Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [omni.py:119] Initializing stages for model: qwen/qwen3-tts-12hz-1.7b-voicedesign
2026-02-18T14:28:59.743534950Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [initialization.py:234] Loaded OmniTransferConfig with 0 connector configurations
2026-02-18T14:28:59.744493553Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [omni_stage.py:100] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'llm', 'runtime': {'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model_stage': 'qwen3_tts', 'model_arch': 'Qwen3TTSForConditionalGeneration', 'worker_type': 'generation', 'scheduler_cls': 'vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler', 'enforce_eager': True, 'trust_remote_code': True, 'async_scheduling': False, 'enable_prefix_caching': False, 'engine_output_type': 'audio', 'gpu_memory_utilization': 0.1, 'distributed_executor_backend': 'mp', 'max_num_batched_tokens': 1000000, 'max_num_seqs': 1, 'async_chunk': False}, 'final_output': True, 'final_output_type': 'audio'}
2026-02-18T14:28:59.748831335Z [0;36m(APIServer pid=19)[0;0m INFO 02-18 06:28:59 [omni.py:338] [AsyncOrchestrator] Waiting for 1 stages to initialize (timeout: 60000s)
2026-02-18T14:29:08.800698630Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:09.498349622Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:10.122278785Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:10.124692471Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:10.124727842Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:10.125071770Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:10.126835832Z [Stage-0] INFO 02-18 06:29:10 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:22.701502257Z [Stage-0] INFO 02-18 06:29:22 [model.py:530] Resolved architecture: Qwen3TTSForConditionalGeneration
2026-02-18T14:29:23.481158823Z [Stage-0] INFO 02-18 06:29:23 [model.py:1545] Using max model len 32768
2026-02-18T14:29:23.488697880Z The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
2026-02-18T14:29:24.098332118Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100565690Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100596601Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:24.100929609Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:24.102540256Z [Stage-0] INFO 02-18 06:29:24 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:24.423684218Z [Stage-0] INFO 02-18 06:29:24 [model.py:212] Resolved architecture: Qwen3TTSForConditionalGeneration
2026-02-18T14:29:25.038284822Z [Stage-0] INFO 02-18 06:29:25 [model.py:1545] Using max model len 32768
2026-02-18T14:29:25.039334357Z [Stage-0] INFO 02-18 06:29:25 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=1000000.
2026-02-18T14:29:25.039382098Z [Stage-0] WARNING 02-18 06:29:25 [scheduler.py:271] max_num_batched_tokens (1000000) exceeds max_num_seqs * max_model_len (32768). This may lead to unexpected behavior.
2026-02-18T14:29:25.040208647Z [Stage-0] INFO 02-18 06:29:25 [vllm.py:630] Asynchronous scheduling is disabled.
2026-02-18T14:29:25.040227468Z [Stage-0] WARNING 02-18 06:29:25 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
2026-02-18T14:29:25.040474044Z [Stage-0] INFO 02-18 06:29:25 [vllm.py:765] Cudagraph is disabled under eager mode
2026-02-18T14:29:37.876934769Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] INFO 02-18 06:29:37 [core.py:97] Initializing a V1 LLM engine (v0.14.0) with config: model='qwen/qwen3-tts-12hz-1.7b-voicedesign', speculative_config=None, tokenizer='qwen/qwen3-tts-12hz-1.7b-voicedesign', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen/qwen3-tts-12hz-1.7b-voicedesign, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [1000000], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
2026-02-18T14:29:37.877005821Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] WARNING 02-18 06:29:37 [multiproc_executor.py:880] Reducing Torch parallelism from 24 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
2026-02-18T14:29:47.218232822Z [Stage-0] INFO 02-18 06:29:47 [parallel_state.py:1214] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:35731 backend=nccl
2026-02-18T14:29:47.295536146Z [Stage-0] INFO 02-18 06:29:47 [parallel_state.py:1425] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
2026-02-18T14:29:51.091422431Z /bin/sh: 1: sox: not found
2026-02-18T14:29:51.091741669Z [2026-02-18 06:29:51] WARNING __init__.py:10: SoX could not be found!
2026-02-18T14:29:51.091770309Z     If you do not have SoX, proceed here:
2026-02-18T14:29:51.091775789Z      - - - http://sox.sourceforge.net/ - - -
2026-02-18T14:29:51.091786110Z     If you do (or think that you should) have SoX, double-check your
2026-02-18T14:29:51.091791360Z     path variables.
2026-02-18T14:29:51.142771355Z ********
2026-02-18T14:29:51.142780295Z Warning: flash-attn is not installed. Will only run the manual PyTorch version. Please install flash-attn for faster inference.
2026-02-18T14:29:51.142785705Z ********
2026-02-18T14:29:51.142795646Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [gpu_model_runner.py:3808] Starting to load model qwen/qwen3-tts-12hz-1.7b-voicedesign...
2026-02-18T14:29:51.420139410Z [0;36m(Worker pid=638)[0;0m [Stage-0] WARNING 02-18 06:29:51 [qwen3_tts.py:76] Flash-Attn is not installed. Using default PyTorch attention implementation.
2026-02-18T14:29:51.826539492Z [0;36m(Worker pid=638)[0;0m `torch_dtype` is deprecated! Use `dtype` instead!
2026-02-18T14:29:51.826773957Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829048781Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:489] talker_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829066331Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:492] speaker_encoder_config is None. Initializing talker model with default values
2026-02-18T14:29:51.829423870Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:51.831046948Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:51 [configuration_qwen3_tts.py:441] code_predictor_config is None. Initializing code_predictor model with default values
2026-02-18T14:29:53.496125569Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:53 [weight_utils.py:46] Using model weights format ['speech_tokenizer/*']
2026-02-18T14:29:54.139530029Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:54 [configuration_qwen3_tts_tokenizer_v2.py:156] encoder_config is None. Initializing encoder with default values
2026-02-18T14:29:54.139555029Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:29:54 [configuration_qwen3_tts_tokenizer_v2.py:159] decoder_config is None. Initializing decoder with default values
2026-02-18T14:30:01.221393561Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:30:01 [weight_utils.py:550] No model.safetensors.index.json found in remote.
2026-02-18T14:30:01.222608949Z [0;36m(Worker pid=638)[0;0m 
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
2026-02-18T14:30:01.243241753Z [0;36m(Worker pid=638)[0;0m 
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 48.80it/s]
2026-02-18T14:30:01.243269964Z [0;36m(Worker pid=638)[0;0m
2026-02-18T14:30:01.246706995Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:30:01 [default_loader.py:291] Loading weights took 0.02 seconds
2026-02-18T14:30:01.809345730Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:30:01 [gpu_model_runner.py:3905] Model loading took 3.89 GiB memory and 9.827631 seconds
2026-02-18T14:30:01.814070601Z [0;36m(Worker pid=638)[0;0m [Stage-0] INFO 02-18 06:30:01 [qwen3_tts.py:133] Profile run detected (empty text). Capping max_new_tokens to 2.
2026-02-18T14:30:01.816257282Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] WorkerProc hit an exception.
2026-02-18T14:30:01.816279763Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] Traceback (most recent call last):
2026-02-18T14:30:01.816285573Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 817, in worker_busy_loop
2026-02-18T14:30:01.816291533Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     output = func(*args, **kwargs)
2026-02-18T14:30:01.816296713Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]              ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816302033Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 525, in compile_or_warm_up_model
2026-02-18T14:30:01.816307474Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     hidden_states, last_hidden_states = self.model_runner._dummy_run(
2026-02-18T14:30:01.816322744Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816329074Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
2026-02-18T14:30:01.816334334Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return func(*args, **kwargs)
2026-02-18T14:30:01.816370425Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816383085Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_generation_model_runner.py", line 633, in _dummy_run
2026-02-18T14:30:01.816393146Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     outputs = self.model(
2026-02-18T14:30:01.816400706Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]               ^^^^^^^^^^^
2026-02-18T14:30:01.816410336Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2026-02-18T14:30:01.816416906Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return self._call_impl(*args, **kwargs)
2026-02-18T14:30:01.816426986Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816438267Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2026-02-18T14:30:01.816447817Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return forward_call(*args, **kwargs)
2026-02-18T14:30:01.816454977Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816465897Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/qwen3_tts/qwen3_tts.py", line 148, in forward
2026-02-18T14:30:01.816494648Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     raise ValueError(f"Invalid task type: {task_type}")
2026-02-18T14:30:01.816510648Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ValueError: Invalid task type: voicedesign
2026-02-18T14:30:01.816521509Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] Traceback (most recent call last):
2026-02-18T14:30:01.816531519Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 817, in worker_busy_loop
2026-02-18T14:30:01.816541989Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     output = func(*args, **kwargs)
2026-02-18T14:30:01.816552579Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]              ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816563059Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 525, in compile_or_warm_up_model
2026-02-18T14:30:01.816572820Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     hidden_states, last_hidden_states = self.model_runner._dummy_run(
2026-02-18T14:30:01.816583150Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816593130Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
2026-02-18T14:30:01.816603740Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return func(*args, **kwargs)
2026-02-18T14:30:01.816613781Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816624481Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_generation_model_runner.py", line 633, in _dummy_run
2026-02-18T14:30:01.816634421Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     outputs = self.model(
2026-02-18T14:30:01.816645211Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]               ^^^^^^^^^^^
2026-02-18T14:30:01.816654862Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2026-02-18T14:30:01.816663592Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return self._call_impl(*args, **kwargs)
2026-02-18T14:30:01.816673872Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816684362Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2026-02-18T14:30:01.816692993Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     return forward_call(*args, **kwargs)
2026-02-18T14:30:01.816698043Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.816703113Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]   File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/qwen3_tts/qwen3_tts.py", line 148, in forward
2026-02-18T14:30:01.816717983Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]     raise ValueError(f"Invalid task type: {task_type}")
2026-02-18T14:30:01.816733163Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822] ValueError: Invalid task type: voicedesign
2026-02-18T14:30:01.816739334Z [0;36m(Worker pid=638)[0;0m [Stage-0] ERROR 02-18 06:30:01 [multiproc_executor.py:822]
2026-02-18T14:30:01.818219598Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] EngineCore failed to start.
2026-02-18T14:30:01.818234088Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] Traceback (most recent call last):
2026-02-18T14:30:01.818240229Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 927, in run_engine_core
2026-02-18T14:30:01.818245899Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
2026-02-18T14:30:01.818251439Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.818256569Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 692, in __init__
2026-02-18T14:30:01.818261789Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     super().__init__(
2026-02-18T14:30:01.818266999Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 113, in __init__
2026-02-18T14:30:01.818272119Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
2026-02-18T14:30:01.818277349Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T14:30:01.818282480Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 270, in _initialize_kv_caches
2026-02-18T14:30:01.818287550Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     self.model_executor.initialize_from_config(kv_cache_configs)
2026-02-18T14:30:01.818293540Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
2026-02-18T14:30:01.818299040Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     self.collective_rpc("compile_or_warm_up_model")
2026-02-18T14:30:01.818304470Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 359, in collective_rpc
2026-02-18T14:30:01.818309650Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     return aggregate(get_response())
2026-02-18T14:30:01.818314780Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]                      ^^^^^^^^^^^^^^
2026-02-18T14:30:01.818319880Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 342, in get_response
2026-02-18T14:30:01.818325091Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936]     raise RuntimeError(
2026-02-18T14:30:01.818330171Z [0;36m(EngineCore_DP0 pid=486)[0;0m [Stage-0] ERROR 02-18 06:30:01 [core.py:936] RuntimeError: Worker failed with error 'Invalid task type: voicedesign', please check the stack trace above for the root cause
```

The error seems to be coming from the `forward` function in the `Qwen3TTSModelForGeneration` class. For some reason `task_type = runtime_additional_information.pop("task_type", [self.task_type])[0]` is resolving to a lowercase `voicedesign`.

This causes the `task_type == "VoiceDesign"` to fail. In my request I do have VoiceDesign capitalized correctly. I would submit my own PR request to fix the bug, but I don't see where runtime_additional_information is coming from in the code base.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Qwen3TTSModelForGeneration RuntimeError: Worker failed with error 'Invalid task type: voicedesign' #1403

Your current environment

Your code version

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Qwen3TTSModelForGeneration RuntimeError: Worker failed with error 'Invalid task type: voicedesign' #1403

Description

Your current environment

Your code version

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions