-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Milestone
Description
Native GPU - CPU Offloading (North/South)
Fast and native caching within a single vLLM instance. Provided out-of-the-box using vLLM’s KVConnector abstraction and integrates with KVEvents
vLLM Issues
vLLM PRs
- v1: Offloading connector vllm-project/vllm#22595
- v1/offloading: Add worker-side CPU support vllm-project/vllm#21448
- v1: Introduce LRU-based CPU offloading management vllm-project/vllm#20075
- v1: Introduce an offloading component vllm-project/vllm#19848
- v1: Support KV events from connectors vllm-project/vllm#19737
- v1: Pass KVConnectorOutput to scheduler-side vllm-project/vllm#22157
- [v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) vllm-project/vllm#19728
- [KVConnector] Aggregate finished requests on the scheduler vllm-project/vllm#19555
cc @njhill (reviewer)