-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
v1: Offloading connector #22595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
v1: Offloading connector #22595
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new offloading connector. The implementation is extensive and adds a lot of new components. My review found several critical issues that need to be addressed. These include a race condition in the tests, a critical assertion that would crash workers on transfer failures, a resource leak due to unjoined threads, and an incorrect list slicing that would lead to errors. These issues affect both the correctness of the new feature and the reliability of its tests.
vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py
Outdated
Show resolved
Hide resolved
vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py
Outdated
Show resolved
Hide resolved
4b24d03
to
4fca175
Compare
mark, will take a look and review after this PR gets stable. |
4fca175
to
8d7a0d7
Compare
8d7a0d7
to
866a51c
Compare
This commit adds a new offloading component, composed of: 1. A scheduler side OffloadingManager (abstract) which kicks-off KV data transfers and keeps track of offloaded data. 2. A worker side OffloadingQueueManager which asynchronously manages KV transfers. Signed-off-by: Or Ozeri <[email protected]>
This commit move the request block hashes from the KVCacheManager to the Request object itself. In particular, this will allow connectors to access the request block hashes. Signed-off-by: Or Ozeri <[email protected]>
This commit adds a new scheduler-side connector API to collect KV cache events. Additionally, we add a medium field to KV events, to allow distinguishing KV events on different mediums (e.g. blocks stored on cpu, disk, or gpu (default)). Signed-off-by: Or Ozeri <[email protected]>
This commit introduces a new OffloadingConnector for offloading blocks of KV data via a generic interface. Signed-off-by: Or Ozeri <[email protected]>
866a51c
to
4872976
Compare
This PR adds an offloading connector that delegates to a generic API introduced in #19848.
The actual implementation of this API is built using a factory which is currently empty.
A follow-up small PR will register a CPU implementation based on #20075 (scheduler-side implementation) and #21448 (worker-side implementation).
Part of RFC #19854.
Depends on PRs #19728, #19848, #19737.