Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where preempted prompts were incorrectly classified as decode requests, causing runtime errors. The fix introduces a dedicated method to properly identify prompt requests by considering both computed tokens and scheduled tokens, which is especially important for handling preempted sequences.
Changes:
- Added
_is_prompt()method to accurately determine if a request is a prompt, accounting for preempted prompts - Refactored prompt/decode classification logic to use the new helper method
- Added preemption handling test to CI
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_model_runner.py | Implements _is_prompt() helper method and refactors prompt/decode classification logic to fix preempted prompt handling |
| tests/full_tests/preemption.py | Adds new test file to verify preemption handling with high memory pressure conditions |
| tests/full_tests/ci_gsm8k_tests.sh | Adds preemption test function to CI test suite |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b0f4a6e to
06b9b4a
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
f414a28 to
a456bef
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
6bcc307 to
2b0f48c
Compare
The preempted prompts might failed to mitch the `num_computed_tokens < num_prompt_tokens` test and be treated as decoding then cause runtime error. - add `_is_prompt()` to check if a request is prompt or not. - consider the `num_scheduled_tokens` to handle the preempted prompts. - add test for preemption handling to the CI. --------- Signed-off-by: Youlei Yang <youlei.yang@intel.com>
a86d05d to
f1bf91d
Compare
Motivation
The preempted prompts might failed to meet the
num_computed_tokens < num_prompt_tokenstest and be treated as decoding then cause runtime error.Changes
_is_prompt()to check if a request is prompt or not.num_scheduled_tokensto handle the preempted prompts.