[Feat][Qwen3-tts]: Add Gradio demo for online serving by lishunyang12 · Pull Request #1231 · vllm-project/vllm-omni

lishunyang12 · 2026-02-05T20:01:28Z

Closes part of #938 (Gradio Demo)

Summary

Add interactive Gradio web UI for Qwen3-TTS at examples/online_serving/qwen3_tts/
Support all 3 task types: CustomVoice, VoiceDesign, Base (voice cloning)
Dynamic UI that shows/hides fields based on selected task type
Fetches available speakers from /v1/audio/voices endpoint
Add run_gradio_demo.sh to launch server + demo together

Files Changed

examples/online_serving/qwen3_tts/gradio_demo.py (new)
examples/online_serving/qwen3_tts/run_gradio_demo.sh (new)
examples/online_serving/qwen3_tts/README.md (updated)

Test plan

Start server with ./run_server.sh CustomVoice, run python gradio_demo.py, generate speech with Vivian/Ryan speakers
Start server with VoiceDesign model, verify instructions field is required
Start server with Base model, upload reference audio and verify voice cloning
Test run_gradio_demo.sh launches both server and Gradio
Verify error messages when server is down or inputs are invalid

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b729f7602b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-05T20:03:51Z

examples/online_serving/qwen3_tts/run_gradio_demo.sh

+    --omni 2>&1 | tee "$LOG_FILE" &
+SERVER_PID=$!


Capture vLLM PID directly instead of pipeline tail process

The launcher backgrounds vllm-omni ... | tee ... and then stores SERVER_PID=$!, but in bash $! for a background pipeline is the last command (tee), not the server process. This means the readiness loop and cleanup (kill "$SERVER_PID") are monitoring/killing tee while the real vLLM server can keep running after Ctrl+C or error paths, leaving orphaned servers bound to the port.

Useful? React with 👍 / 👎.

Fixed in 2f70230 — backgrounding vllm-omni directly now (no pipe through tee), so $\! captures the actual server PID. Added a separate tail -f for log output with its own PID tracked for cleanup.

chatgpt-codex-connector · 2026-02-05T20:03:51Z

examples/online_serving/qwen3_tts/gradio_demo.py

+
+    # Decode audio response
+    try:
+        audio_np, sample_rate = sf.read(io.BytesIO(resp.content))


Handle raw PCM responses before decoding audio bytes

The UI allows response_format="pcm", but the response is always decoded via sf.read(io.BytesIO(resp.content)). For PCM, the server emits RAW bytes (no container/header), so this decode path cannot infer format/sample rate and fails with "Failed to decode audio response" whenever users pick PCM. This makes one of the advertised output formats unusable in the demo.

Useful? React with 👍 / 👎.

Fixed in 2f70230 — added a special path for response_format="pcm": decode raw bytes as int16 samples at 24 kHz (Qwen3-TTS default sample rate), bypassing sf.read() which needs a container header.

congw729 · 2026-02-06T02:59:58Z

Hi, if you add a markdown document under ./examples/*, please also run mkdocs serve to sync those editions to ./docs/ before merging this PR.

linyueqian · 2026-02-06T21:14:06Z

The task type selector (CustomVoice / VoiceDesign / Base) in the Gradio UI feels a bit off. Since each checkpoint is already a specific task type, having users pick it again client-side can lead to mismatches, e.g. selecting VoiceDesign when the server loaded CustomVoice. Could we auto-detect which model the server is running and just show the right fields? That way we avoid confusion and the UI stays in sync with whatever checkpoint is actually loaded.

Also, this PR depends on #1203 (the multimodal_output fix) which hasn't been merged yet, so the server currently returns "TTS model did not produce audio output" instead of actual audio. I wasn't able to test this end to end. Was this tested before submitting?

hsliuustc0106 · 2026-02-06T23:17:03Z

shall we move gradio to app folder after comfyui PR merged?

linyueqian · 2026-02-07T01:10:36Z

shall we move gradio to app folder after comfyui PR merged?

makes sense.

Gaohan123

Could you post some visual demo results on the PR description?

Gaohan123 · 2026-02-10T07:40:49Z

@linyueqian @SamitHuang PTAL

hsliuustc0106 · 2026-02-24T08:02:43Z

@vllm-omni-reviewer

lishunyang12 · 2026-02-24T15:20:52Z

Addressed review feedback in 2f70230:

@congw729: if you add a markdown document under ./examples/*, please also run mkdocs serve to sync those editions to ./docs/ before merging this PR.

Will sync docs before merge. Noted.

@linyueqian: Could we auto-detect which task type the server loaded?

Done — added detect_task_type() that queries /v1/models at startup and infers the task type from the model name. The UI now auto-selects the correct radio button.

@hsliuustc0106: shall we move gradio to app folder after comfyui PR merged?

Agreed — will move after comfyui PR lands.

@Gaohan123: demo results

Will add screenshots to the PR description once I have access to GPU for a full run.

Signed-off-by: lishunyang <lishunyang12@163.com>

lishunyang12 requested a review from hsliuustc0106 as a code owner February 5, 2026 20:01

chatgpt-codex-connector bot reviewed Feb 5, 2026

View reviewed changes

Gaohan123 reviewed Feb 10, 2026

View reviewed changes

Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026

lishunyang12 force-pushed the tts_gradio branch from 2f70230 to 6ea38ad Compare February 24, 2026 15:36

Gaohan123 added the ready label to trigger buildkite CI label Feb 25, 2026

hsliuustc0106 removed this from the v0.16.0 milestone Feb 26, 2026

Gaohan123 mentioned this pull request Feb 26, 2026

[RFC]: Qwen3-TTS Production Ready - February Milestone #938

Open

lishunyang12 added 3 commits February 28, 2026 20:42

feat: add gradio demo for qwen tts

0f59a85

Signed-off-by: lishunyang <lishunyang12@163.com>

chore: delete description

1b42e4f

Signed-off-by: lishunyang <lishunyang12@163.com>

fix: pre-commit

31974c5

Signed-off-by: lishunyang <lishunyang12@163.com>

lishunyang12 force-pushed the tts_gradio branch from 9c56fc6 to 9af4eb0 Compare February 28, 2026 12:43

feat: add streaming output and online voice clone to gradio demo

86b5291

Signed-off-by: lishunyang <lishunyang12@163.com>

lishunyang12 force-pushed the tts_gradio branch from 9af4eb0 to 86b5291 Compare February 28, 2026 12:47

Conversation

lishunyang12 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

congw729 commented Feb 6, 2026

Uh oh!

linyueqian commented Feb 6, 2026

Uh oh!

hsliuustc0106 commented Feb 6, 2026

Uh oh!

linyueqian commented Feb 7, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Feb 10, 2026

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

lishunyang12 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lishunyang12 commented Feb 5, 2026 •

edited

Loading