feat: Implement PoC state management and priority handling across components#17
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Seems like set_poc_active(True) is per process, so we must set it to True also in routes.py, not only in engine Merging, i'll that inside gm/fix-poc-priority branch |
This pull request implements a unified PoC (Proof of Compute) priority mode for both the async and multiprocessing engines, ensuring that PoC actions always take precedence over chat requests. It removes legacy chat-priority modes, introduces explicit session management for PoC, and updates test coverage to reflect the new behavior. The changes provide clear guard conditions for PoC actions and enforce rejection of chat requests while PoC is active.
Unified PoC priority mode and session management:
async_llm_engine.pyandmultiprocessing.enginenow enforce PoC priority: all chat requests are aborted when PoC is active, and PoC actions are rejected if guard conditions are not met. The legacyPOC_ENABLE_CHAT_PRIORITYmode and related environment variable checks are removed. [1] [2] [3] [4] [5]Explicit PoC session control:
start_sessionandend_sessionactions inasync_llm_engine.py, which set and clear the PoC active flag, abort chat requests, and allow chat to resume after PoC ends. [1] [2]Chat request rejection during PoC:
Comprehensive guard conditions and skip reasons:
Test suite refactor and coverage improvements:
test_coexist.pyare updated to reflect unified PoC priority, session control, and guard conditions. Legacy chat-priority tests are removed, and new tests are added for session actions, chat rejection, and skip reasons. [1] [2] [3] [4] Fe09a898L301R301)