[Bugfix] Allow to enable HSDP alone#1567
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74fbf6fa6f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
| is_standalone_hsdp = dit_parallel_size == 1 and fully_shard_degree > 1 | ||
|
|
||
| # For standalone HSDP: use (fully_shard_degree * hsdp_replicate_size) as dit_parallel_size | ||
| # This ensures orthogonal rank generation works correctly for all HSDP workers |
There was a problem hiding this comment.
Missing regression test for standalone HSDP mode. Consider adding a unit test that verifies is_standalone_hsdp detection and dit_parallel_size calculation with various parallel configs (standalone HSDP, HSDP+SP, HSDP+TP) to prevent future regressions.
There was a problem hiding this comment.
Thanks for reminding! Done now.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
For standalone HSDP: when all other parallelism dimensions are 1, but fully_shard_degree > 1, use
fully_shard_degreeasdit_parallel_size. This ensures orthogonal rank generation works correctly for HSDP workersTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)