vLLM Native CPU Offloading Connector

## Native GPU - CPU Offloading (North/South)

Fast and native caching within a single vLLM instance. Provided out-of-the-box using vLLM’s KVConnector abstraction and integrates with KVEvents
- [[Doc] N/S KV Caching in vLLM](https://docs.google.com/document/d/1uZXPHDn9PoTxDBBNc9UEW7AyCGpSuY7t4ynsBSuPUSQ/edit?usp=sharing)

vLLM Issues
- https://github.com/vllm-project/vllm/issues/19854

vLLM PRs
- https://github.com/vllm-project/vllm/pull/22595
- https://github.com/vllm-project/vllm/pull/21448
- https://github.com/vllm-project/vllm/pull/20075
- https://github.com/vllm-project/vllm/pull/19848
- https://github.com/vllm-project/vllm/pull/19737
- https://github.com/vllm-project/vllm/pull/22157
- https://github.com/vllm-project/vllm/pull/19728
- https://github.com/vllm-project/vllm/pull/19555 

cc @njhill (reviewer)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM Native CPU Offloading Connector #67

Native GPU - CPU Offloading (North/South)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM Native CPU Offloading Connector #67

Description

Native GPU - CPU Offloading (North/South)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions