Skip to content

feat: Implement PoC state management and priority handling across components#17

Merged
gmorgachev merged 1 commit intogonka-ai:gm/fix-poc-priorityfrom
qdanik:fix/poc-chat-priority
Feb 22, 2026
Merged

feat: Implement PoC state management and priority handling across components#17
gmorgachev merged 1 commit intogonka-ai:gm/fix-poc-priorityfrom
qdanik:fix/poc-chat-priority

Conversation

@qdanik
Copy link

@qdanik qdanik commented Feb 21, 2026

This pull request implements a unified PoC (Proof of Compute) priority mode for both the async and multiprocessing engines, ensuring that PoC actions always take precedence over chat requests. It removes legacy chat-priority modes, introduces explicit session management for PoC, and updates test coverage to reflect the new behavior. The changes provide clear guard conditions for PoC actions and enforce rejection of chat requests while PoC is active.

Unified PoC priority mode and session management:

  • Both async_llm_engine.py and multiprocessing.engine now enforce PoC priority: all chat requests are aborted when PoC is active, and PoC actions are rejected if guard conditions are not met. The legacy POC_ENABLE_CHAT_PRIORITY mode and related environment variable checks are removed. [1] [2] [3] [4] [5]

Explicit PoC session control:

  • Added support for start_session and end_session actions in async_llm_engine.py, which set and clear the PoC active flag, abort chat requests, and allow chat to resume after PoC ends. [1] [2]

Chat request rejection during PoC:

  • Both engines now reject chat requests while PoC is active, with tests verifying that chat RPCs are rejected and not added to the engine queue. [1] [2] [3]

Comprehensive guard conditions and skip reasons:

  • PoC actions return explicit skip reasons for pending input, unfinished chat, or engine step in progress, with test coverage for each guard condition. [1] [2]

Test suite refactor and coverage improvements:

  • Tests in test_coexist.py are updated to reflect unified PoC priority, session control, and guard conditions. Legacy chat-priority tests are removed, and new tests are added for session actions, chat rejection, and skip reasons. [1] [2] [3] [4] Fe09a898L301R301)

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@gmorgachev
Copy link

Seems like set_poc_active(True) is per process, so we must set it to True also in routes.py, not only in engine

Merging, i'll that inside gm/fix-poc-priority branch

@gmorgachev gmorgachev merged commit 375635a into gonka-ai:gm/fix-poc-priority Feb 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants