Skip to content

fix preempted prompts#830

Open
yangulei wants to merge 1 commit intovllm-project:mainfrom
yangulei:preempted_prompt
Open

fix preempted prompts#830
yangulei wants to merge 1 commit intovllm-project:mainfrom
yangulei:preempted_prompt

Conversation

@yangulei
Copy link
Collaborator

@yangulei yangulei commented Jan 16, 2026

Motivation

The preempted prompts might failed to meet the num_computed_tokens < num_prompt_tokens test and be treated as decoding then cause runtime error.

Changes

  • add _is_prompt() to check if a request is prompt or not.
  • consider the num_scheduled_tokens to handle the preempted prompts.
  • add test for preemption handling to the CI.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where preempted prompts were incorrectly classified as decode requests, causing runtime errors. The fix introduces a dedicated method to properly identify prompt requests by considering both computed tokens and scheduled tokens, which is especially important for handling preempted sequences.

Changes:

  • Added _is_prompt() method to accurately determine if a request is a prompt, accounting for preempted prompts
  • Refactored prompt/decode classification logic to use the new helper method
  • Added preemption handling test to CI

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Implements _is_prompt() helper method and refactors prompt/decode classification logic to fix preempted prompt handling
tests/full_tests/preemption.py Adds new test file to verify preemption handling with high memory pressure conditions
tests/full_tests/ci_gsm8k_tests.sh Adds preemption test function to CI test suite

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yangulei yangulei force-pushed the preempted_prompt branch 4 times, most recently from b0f4a6e to 06b9b4a Compare January 19, 2026 09:56
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
6218034dd7f9a56596e4fd8c8c8fc1d8011ed9c2

@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
6218034dd7f9a56596e4fd8c8c8fc1d8011ed9c2

@yangulei yangulei force-pushed the preempted_prompt branch 2 times, most recently from 6bcc307 to 2b0f48c Compare February 3, 2026 01:31
@yangulei yangulei mentioned this pull request Feb 4, 2026
@adobrzyn adobrzyn self-assigned this Feb 5, 2026
The preempted prompts might failed to mitch the `num_computed_tokens <
num_prompt_tokens` test and be treated as decoding then cause runtime
error.

- add `_is_prompt()` to check if a request is prompt or not.
- consider the `num_scheduled_tokens` to handle the preempted prompts.
- add test for preemption handling to the CI.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
@yangulei yangulei force-pushed the preempted_prompt branch 2 times, most recently from a86d05d to f1bf91d Compare February 26, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants