[Feat][Qwen3-tts]: Add Gradio demo for online serving#1231
[Feat][Qwen3-tts]: Add Gradio demo for online serving#1231lishunyang12 wants to merge 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b729f7602b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| --omni 2>&1 | tee "$LOG_FILE" & | ||
| SERVER_PID=$! |
There was a problem hiding this comment.
Capture vLLM PID directly instead of pipeline tail process
The launcher backgrounds vllm-omni ... | tee ... and then stores SERVER_PID=$!, but in bash $! for a background pipeline is the last command (tee), not the server process. This means the readiness loop and cleanup (kill "$SERVER_PID") are monitoring/killing tee while the real vLLM server can keep running after Ctrl+C or error paths, leaving orphaned servers bound to the port.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 2f70230 — backgrounding vllm-omni directly now (no pipe through tee), so $\! captures the actual server PID. Added a separate tail -f for log output with its own PID tracked for cleanup.
|
|
||
| # Decode audio response | ||
| try: | ||
| audio_np, sample_rate = sf.read(io.BytesIO(resp.content)) |
There was a problem hiding this comment.
Handle raw PCM responses before decoding audio bytes
The UI allows response_format="pcm", but the response is always decoded via sf.read(io.BytesIO(resp.content)). For PCM, the server emits RAW bytes (no container/header), so this decode path cannot infer format/sample rate and fails with "Failed to decode audio response" whenever users pick PCM. This makes one of the advertised output formats unusable in the demo.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 2f70230 — added a special path for response_format="pcm": decode raw bytes as int16 samples at 24 kHz (Qwen3-TTS default sample rate), bypassing sf.read() which needs a container header.
|
Hi, if you add a markdown document under |
|
The task type selector (CustomVoice / VoiceDesign / Base) in the Gradio UI feels a bit off. Since each checkpoint is already a specific task type, having users pick it again client-side can lead to mismatches, e.g. selecting VoiceDesign when the server loaded CustomVoice. Could we auto-detect which model the server is running and just show the right fields? That way we avoid confusion and the UI stays in sync with whatever checkpoint is actually loaded. Also, this PR depends on #1203 (the multimodal_output fix) which hasn't been merged yet, so the server currently returns "TTS model did not produce audio output" instead of actual audio. I wasn't able to test this end to end. Was this tested before submitting? |
|
shall we move gradio to app folder after comfyui PR merged? |
makes sense. |
Gaohan123
left a comment
There was a problem hiding this comment.
Could you post some visual demo results on the PR description?
|
@linyueqian @SamitHuang PTAL |
|
@vllm-omni-reviewer |
|
Addressed review feedback in 2f70230:
Will sync docs before merge. Noted.
Done — added
Agreed — will move after comfyui PR lands.
Will add screenshots to the PR description once I have access to GPU for a full run. |
2f70230 to
6ea38ad
Compare
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
9c56fc6 to
9af4eb0
Compare
Signed-off-by: lishunyang <lishunyang12@163.com>
9af4eb0 to
86b5291
Compare
Closes part of #938 (Gradio Demo)
Summary
examples/online_serving/qwen3_tts//v1/audio/voicesendpointrun_gradio_demo.shto launch server + demo togetherFiles Changed
examples/online_serving/qwen3_tts/gradio_demo.py(new)examples/online_serving/qwen3_tts/run_gradio_demo.sh(new)examples/online_serving/qwen3_tts/README.md(updated)Test plan
./run_server.sh CustomVoice, runpython gradio_demo.py, generate speech with Vivian/Ryan speakersrun_gradio_demo.shlaunches both server and Gradio