[Bugfix] Allow to enable HSDP alone by gcanlin · Pull Request #1567 · vllm-project/vllm-omni

gcanlin · 2026-02-28T08:31:31Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

For standalone HSDP: when all other parallelism dimensions are 1, but fully_shard_degree > 1, use fully_shard_degree as dit_parallel_size. This ensures orthogonal rank generation works correctly for HSDP workers

Test Plan

vllm serve Wan-AI/Wan2.2-TI2V-5B-Diffusers --omni --port 8091  --use-hsdp --hsdp-shard-size 4

Test Result

[Stage-0] INFO 02-28 08:29:42 [diffusers_loader.py:301] Loading weights took 2.36 seconds
[Stage-0] INFO 02-28 08:29:42 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=4, world_size=4, rank=0, fs_world_size=4, fs_rank=0
[Stage-0] INFO 02-28 08:29:42 [diffusers_loader.py:301] Loading weights took 2.17 seconds
[Stage-0] INFO 02-28 08:29:42 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=4, world_size=4, rank=2, fs_world_size=4, fs_rank=2
[Stage-0] INFO 02-28 08:29:42 [diffusers_loader.py:301] Loading weights took 2.32 seconds
[Stage-0] INFO 02-28 08:29:42 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=4, world_size=4, rank=1, fs_world_size=4, fs_rank=1
[Stage-0] INFO 02-28 08:29:42 [diffusers_loader.py:301] Loading weights took 2.20 seconds
[Stage-0] INFO 02-28 08:29:42 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=4, world_size=4, rank=3, fs_world_size=4, fs_rank=3

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74fbf6fa6f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/distributed/parallel_state.py

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 · 2026-03-01T13:21:43Z

vllm_omni/diffusion/distributed/parallel_state.py

+    is_standalone_hsdp = dit_parallel_size == 1 and fully_shard_degree > 1
+
+    # For standalone HSDP: use (fully_shard_degree * hsdp_replicate_size) as dit_parallel_size
+    # This ensures orthogonal rank generation works correctly for all HSDP workers


Missing regression test for standalone HSDP mode. Consider adding a unit test that verifies is_standalone_hsdp detection and dit_parallel_size calculation with various parallel configs (standalone HSDP, HSDP+SP, HSDP+TP) to prevent future regressions.

Thanks for reminding! Done now.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 2 commits February 28, 2026 08:17

[Bugfix] Allow to enable HSDP alone

c181085

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

refactor

74fbf6f

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner February 28, 2026 08:31

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

vllm_omni/diffusion/distributed/parallel_state.py Outdated Show resolved Hide resolved

gcanlin changed the title ~~[Bugfix] Allow to enable HSDP alone~~ [WIP][Bugfix] Allow to enable HSDP alone Feb 28, 2026

gcanlin marked this pull request as draft February 28, 2026 08:37

gcanlin added 2 commits February 28, 2026 09:22

fix

eb01df6

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix replication

e6fc29c

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin changed the title ~~[WIP][Bugfix] Allow to enable HSDP alone~~ [Bugfix] Allow to enable HSDP alone Feb 28, 2026

gcanlin marked this pull request as ready for review February 28, 2026 16:59

hsliuustc0106 reviewed Mar 1, 2026

View reviewed changes

gcanlin added 2 commits March 2, 2026 09:23

add tests

23a7992

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix lint

db0c1a8

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Allow to enable HSDP alone#1567

[Bugfix] Allow to enable HSDP alone#1567
gcanlin wants to merge 6 commits intovllm-project:mainfrom
gcanlin:hsdp-bugfix

gcanlin commented Feb 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

hsliuustc0106 Mar 1, 2026

Uh oh!

gcanlin Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gcanlin commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanlin commented Feb 28, 2026 •

edited

Loading