Skip to content

fix preempted prompts#928

Merged
czhu15 merged 5 commits intovllm-project:aicefrom
yangulei:preempted_prompt_aice
Feb 5, 2026
Merged

fix preempted prompts#928
czhu15 merged 5 commits intovllm-project:aicefrom
yangulei:preempted_prompt_aice

Conversation

@yangulei
Copy link
Collaborator

@yangulei yangulei commented Feb 4, 2026

Motivation

The preempted prompts might failed to mitch the num_computed_tokens < num_prompt_tokens test and be treated as decoding then cause runtime error.

Changes

  • add _is_prompt() to check if a request is prompt or not.
  • consider the num_scheduled_tokens to handle the preempted prompts.
  • add test for preemption handling to the CI.

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Copilot AI review requested due to automatic review settings February 4, 2026 08:57
@yangulei
Copy link
Collaborator Author

yangulei commented Feb 4, 2026

Porting #830

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where preempted prompts were incorrectly classified as decode requests, causing runtime errors. The fix introduces a dedicated method to properly identify prompt vs decode requests by considering scheduled tokens alongside computed tokens.

Changes:

  • Added _is_prompt() method to correctly identify prompt requests, including preempted ones
  • Refactored prompt/decode classification logic to use the new method
  • Added preemption handling test to CI pipeline

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Implements _is_prompt() method and refactors prompt/decode classification logic to fix preemption handling
tests/full_tests/preemption.py Adds new test script to verify preemption handling works correctly
tests/full_tests/ci_gsm8k_tests.sh Integrates preemption test into CI pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1492 to 1497
def _is_prompt(self, i: int, scheduler_output: "SchedulerOutput") -> bool:
req_id = self.input_batch.req_ids[i]
num_computed_tokens = int(self.input_batch.num_computed_tokens_cpu[i])
num_prompt_tokens = int(self.input_batch.num_prompt_tokens[i])
num_scheduled_tokens = scheduler_output.num_scheduled_tokens.get(req_id)
spec_decode_tokens = scheduler_output.scheduled_spec_decode_tokens.get(req_id)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method could fail without a helpful error message if req_id is None or if dictionary lookups return None. Consider adding validation and raising descriptive errors when required values are missing.

Copilot uses AI. Check for mistakes.
if self._is_prompt(i, scheduler_output):
break

# This is decode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is decode

why remove this comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a comment for the older statement of assert num_scheduled_tokens == 1 which is not relevant to the current codes anymore.

for layer in model.language_model.model.layers:
if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:
layer.self_attn.attn.impl.is_chunked_attention = True
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there will be an exception?
pass an exception with any warning is dangerous in most cases. suggest to either add some warning message or make sure there is no execption.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. It's redundant codes from cherry-picking which is not relevant to this PR.
Fixed.

except Exception:
pass

def _is_prompt(self, i: int, scheduler_output: "SchedulerOutput") -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to rename i to a more meaningful name in the funtion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks~

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Copy link
Collaborator

@czhu15 czhu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@czhu15 czhu15 merged commit c7e9143 into vllm-project:aice Feb 5, 2026
1 check passed
czhu15 pushed a commit that referenced this pull request Feb 9, 2026
### Motivation
The preempted prompts might failed to mitch the `num_computed_tokens <
num_prompt_tokens` test and be treated as decoding then cause runtime
error.

### Changes
- add `_is_prompt()` to check if a request is prompt or not.
- consider the `num_scheduled_tokens` to handle the preempted prompts.
- add test for preemption handling to the CI.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
yangulei added a commit to yangulei/vllm-gaudi that referenced this pull request Feb 26, 2026
The preempted prompts might failed to mitch the `num_computed_tokens <
num_prompt_tokens` test and be treated as decoding then cause runtime
error.

- add `_is_prompt()` to check if a request is prompt or not.
- consider the `num_scheduled_tokens` to handle the preempted prompts.
- add test for preemption handling to the CI.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants