Skip to content

[Feature] Support Stage Based Deployment CLI#939

Merged
hsliuustc0106 merged 31 commits intovllm-project:mainfrom
wuhang2014:stagecli
Feb 24, 2026
Merged

[Feature] Support Stage Based Deployment CLI#939
hsliuustc0106 merged 31 commits intovllm-project:mainfrom
wuhang2014:stagecli

Conversation

@wuhang2014
Copy link
Contributor

@wuhang2014 wuhang2014 commented Jan 25, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Background is described in #870.

For now, only support single node, multiprocessing:

  • Multiple node is not supported;
  • Ray backend is not supported;
  • DP for diffusion model is not supported;

Test Plan

model: Qwen3-Omni

deployment CLI:

  • stage-0
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 0 --data-parallel-size 2 --omni-master-address 127.0.0.1 --omni-master-port 33567
  • stage-1
CUDA_VISIBLE_DEVICES=2 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 1 --headless --omni-master-address 127.0.0.1 --omni-master-port 33567
  • stage-2
CUDA_VISIBLE_DEVICES=3 vllm serve /data/models/Qwen3-Omni-30B-A3B-Instruct/ --omni --stage-id 2 --headless --omni-master-address 127.0.0.1 --omni-master-port 33567

test script:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
          {
            "role": "user",
            "content": [
              { "type": "text", "text": "What’s in this image?" },
              {
                "type": "image_url",
                "image_url": {
                  "url": "file:///data/wuhang/dog-4988985_960_720.jpg"
                }
              }
            ]
          }
    ],
    "audio": { "voice": "alloy", "format": "wav" }
  }'

Test Result

(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --model /data/models/Qwen3-Omni-30B-A3B-Instruct/ --image-path /data/wuhang/dog-4988985_960_720.jpg 
Chat completion output from text: Based on the image provided, here is a detailed description of its content:

This is a professionally taken, close-up photograph of a happy dog lying in a field of green grass.

*   **Main Subject:** The central focus is a Pembroke Welsh Corgi. It has a classic tan and white coat, with tan fur covering its head, ears, and back, and white fur on its chest, neck, and muzzle.
*   **Expression and Pose:** The corgi is lying down but looking directly at the camera with an alert and joyful expression. Its mouth is open in what appears to be a smile, with its pink tongue slightly visible. Its large, erect ears are pointed forward, indicating it is attentive.
*   **Setting and Lighting:** The dog is in a lush, sunlit grassy area. The lighting suggests it's either early morning or late afternoon (golden hour), casting a warm, soft glow over the scene. The background is softly blurred (a shallow depth of field), showing out-of-focus trees and foliage, which helps to emphasize the dog as the main subject.
*   **Details:** The corgi is wearing a dark green collar around its neck.
Audio saved to audio_0.wav
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# ls -l
total 2920
-rw-r--r-- 1 root root 2918954 Jan 26 08:57 audio_0.wav
-rw-r--r-- 1 root root   19876 Jan 22 12:00 gradio_demo.py
-rw-r--r-- 1 root root   16995 Jan 25 11:14 openai_chat_completion_client_for_multimodal_generation.py
-rw-r--r-- 1 root root    1177 Jan 22 12:00 qwen3_omni_moe_thinking.yaml
-rw-r--r-- 1 root root    7166 Jan 22 12:00 README.md
-rw-r--r-- 1 root root    4359 Jan 22 12:00 run_curl_multimodal_generation.sh
-rwxr-xr-x 1 root root    6123 Jan 22 12:00 run_gradio_demo.sh
(wuhang) (base) root@huawei:/data/wuhang/vllm-omni/examples/online_serving/qwen3_omni# 

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Silent error handling - Multiple except Exception: pass blocks

    • Fix: Add logging: except Exception as e: logger.debug(f"Error: {e}")
  2. Log spam - logger.info() in hot paths (line 1466)

    • Fix: Change to logger.debug()
  3. PR description incomplete - "Test Result" section is empty

    • Fix: Add actual test output, performance metrics

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements stage-based deployment CLI support for vLLM-Omni, enabling independent deployment of pipeline stages across processes using ZMQ-based IPC. This is part of the larger effort described in issue #870 to support data parallelism for pipeline stages.

Changes:

  • Added ZMQ-based queue utilities to replace multiprocessing queues for inter-stage communication
  • Implemented headless mode for deploying individual stages independently
  • Added dynamic port allocation and handshake protocol for stage coordination

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 47 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/zmq_utils.py New file providing ZMQ queue wrapper and handshake utilities for stage communication
vllm_omni/entrypoints/omni_stage.py Modified to support both ZMQ and multiprocessing queues, added cleanup handlers and queue spec support
vllm_omni/entrypoints/omni.py Added ZMQ context management, handshake server for stage coordination, and dynamic port allocation
vllm_omni/entrypoints/cli/serve.py Added headless mode and stage-id CLI arguments for independent stage deployment
vllm_omni/entrypoints/async_omni.py Updated cleanup handlers to support ZMQ queues
pyproject.toml Added pyzmq>=25.0.0 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wuhang2014 wuhang2014 force-pushed the stagecli branch 5 times, most recently from 4e7aff3 to 9e39c1f Compare February 5, 2026 10:25
@wuhang2014 wuhang2014 marked this pull request as ready for review February 5, 2026 10:27
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff2d5c10ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106 hsliuustc0106 requested a review from Copilot February 5, 2026 15:05
wuhang2014 and others added 23 commits February 24, 2026 09:40
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: wuhang <whlbx@hotmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: wuhang <whlbx@hotmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: wuhang <whlbx@hotmail.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
@princepride
Copy link
Collaborator

@hsliuustc0106 ready for merge?

@hsliuustc0106 hsliuustc0106 merged commit 36b8f80 into vllm-project:main Feb 24, 2026
7 checks passed
lishunyang12 added a commit to lishunyang12/vllm-omni that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority high priority issue, needs to be done asap ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants