fix preempted prompts#928

Merged

czhu15 merged 5 commits intovllm-project:aicefrom

yangulei:preempted_prompt_aice

Feb 5, 2026

Collaborator

yangulei commented Feb 4, 2026

Motivation

The preempted prompts might failed to mitch the num_computed_tokens < num_prompt_tokens test and be treated as decoding then cause runtime error.

Changes

add _is_prompt() to check if a request is prompt or not.
consider the num_scheduled_tokens to handle the preempted prompts.
add test for preemption handling to the CI.

yangulei added 2 commits

February 4, 2026 16:45


          fix preempted prompts

6d6872a

Signed-off-by: Youlei Yang <youlei.yang@intel.com>


          add preemption handling to CI

6fb24fc

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

yangulei requested review from mgawarkiewicz-intel and wpyszka as code owners

February 4, 2026 08:57

Copilot AI review requested due to automatic review settings

February 4, 2026 08:57

yangulei requested review from Wei-Lin-Intel, czhu15, piotrbocian and taotod as code owners

February 4, 2026 08:57

Collaborator Author

yangulei commented Feb 4, 2026

Porting #830

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR fixes a bug where preempted prompts were incorrectly classified as decode requests, causing runtime errors. The fix introduces a dedicated method to properly identify prompt vs decode requests by considering scheduled tokens alongside computed tokens.

Changes:

Added _is_prompt() method to correctly identify prompt requests, including preempted ones
Refactored prompt/decode classification logic to use the new method
Added preemption handling test to CI pipeline

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Implements `_is_prompt()` method and refactors prompt/decode classification logic to fix preemption handling
tests/full_tests/preemption.py	Adds new test script to verify preemption handling works correctly
tests/full_tests/ci_gsm8k_tests.sh	Integrates preemption test into CI pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment on lines 1492 to 1497

+                  def _is_prompt(self, i: int, scheduler_output: "SchedulerOutput") -> bool:
+                      req_id = self.input_batch.req_ids[i]
+                      num_computed_tokens = int(self.input_batch.num_computed_tokens_cpu[i])
+                      num_prompt_tokens = int(self.input_batch.num_prompt_tokens[i])
+                      num_scheduled_tokens = scheduler_output.num_scheduled_tokens.get(req_id)
+                      spec_decode_tokens = scheduler_output.scheduled_spec_decode_tokens.get(req_id)

Copilot AI Feb 4, 2026

The method could fail without a helpful error message if req_id is None or if dictionary lookups return None. Consider adding validation and raising descriptive errors when required values are missing.

Copilot uses AI. Check for mistakes.

vllm_gaudi/v1/worker/hpu_model_runner.py Show resolved Hide resolved

vllm_gaudi/v1/worker/hpu_model_runner.py Show resolved Hide resolved

tests/full_tests/preemption.py Show resolved Hide resolved

tests/full_tests/preemption.py Show resolved Hide resolved

github-actions bot mentioned this pull request

🚦 Team Review Dashboard #701

Open

czhu15 reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py

+                          if self._is_prompt(i, scheduler_output):
                               break
-                          # This is decode

Collaborator

czhu15 Feb 4, 2026

This is decode

why remove this comment?

Collaborator Author

yangulei Feb 5, 2026

It's a comment for the older statement of assert num_scheduled_tokens == 1 which is not relevant to the current codes anymore.

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

+                              for layer in model.language_model.model.layers:
+                                  if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:
+                                      layer.self_attn.attn.impl.is_chunked_attention = True
+                          except Exception:

Collaborator

czhu15 Feb 4, 2026

when there will be an exception?
pass an exception with any warning is dangerous in most cases. suggest to either add some warning message or make sure there is no execption.

Collaborator Author

yangulei Feb 5, 2026

Oh, sorry. It's redundant codes from cherry-picking which is not relevant to this PR.
Fixed.

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

+                          except Exception:
+                              pass
+                  def _is_prompt(self, i: int, scheduler_output: "SchedulerOutput") -> bool:

Collaborator

czhu15 Feb 4, 2026

suggest to rename i to a more meaningful name in the funtion.

Collaborator Author

yangulei Feb 5, 2026

Done, thanks~

yangulei added 3 commits

February 5, 2026 10:53


          remove redundant codes

ba6c841

Signed-off-by: Youlei Yang <youlei.yang@intel.com>


          rename

9f7ac53

Signed-off-by: Youlei Yang <youlei.yang@intel.com>


          add comment for the test

ab22cef

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

czhu15 approved these changes

View reviewed changes

Collaborator

czhu15 left a comment

LGTM

czhu15 merged commit c7e9143 into vllm-project:aice

1 check passed

czhu15 pushed a commit that referenced this pull request


          fix preempted prompts (#928)

2ad7047

### Motivation
The preempted prompts might failed to mitch the `num_computed_tokens <
num_prompt_tokens` test and be treated as decoding then cause runtime
error.

### Changes
- add `_is_prompt()` to check if a request is prompt or not.
- consider the `num_scheduled_tokens` to handle the preempted prompts.
- add test for preemption handling to the CI.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

yangulei added a commit to yangulei/vllm-gaudi that referenced this pull request


          fix preempted prompts (vllm-project#928)

f1bf91d

The preempted prompts might failed to mitch the `num_computed_tokens <
num_prompt_tokens` test and be treated as decoding then cause runtime
error.

- add `_is_prompt()` to check if a request is prompt or not.
- consider the `num_scheduled_tokens` to handle the preempted prompts.
- add test for preemption handling to the CI.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

czhu15 czhu15 approved these changes

mgawarkiewicz-intel Awaiting requested review from mgawarkiewicz-intel mgawarkiewicz-intel is a code owner

wpyszka Awaiting requested review from wpyszka wpyszka is a code owner

piotrbocian Awaiting requested review from piotrbocian piotrbocian is a code owner

taotod Awaiting requested review from taotod taotod is a code owner

Wei-Lin-Intel Awaiting requested review from Wei-Lin-Intel Wei-Lin-Intel is a code owner

Labels

None yet