Shared CPU Memory Connector for KV Transfer

This project demonstrates how to implement an external connector for KV (Key-Value) transfer in vLLM, using shared CPU memory. It's particularly useful for scenarios where GPU memory is limited and cannot hold the transfer buffer.

This connector is adapted from the SharedStorageMemoryConnector provided by vLLM.

Why Shared CPU Memory?

By utilizing CPU shared memory for KV transfer, this approach:

Enables memory sharing between processes without requiring additional GPU memory.
Supports low-GPU-memory environments while maintaining performance.

Setup

Prefill Node

PYTHONPATH=$PWD CUDA_VISIBLE_DEVICES=0 vllm serve Qwen/Qwen3-0.6B \
    --port 8200 \
    --max-model-len 2048 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code \
    --kv-transfer-config \
    '{
        "kv_connector": "SharedCPUMemoryConnector",
        "kv_rank": 0,
        "kv_parallel_size": 2,
        "kv_role": "kv_producer",
        "kv_connector_module_path": "cpu_memory_connector"
    }'

Decode Node

PYTHONPATH=$PWD CUDA_VISIBLE_DEVICES=1 vllm serve Qwen/Qwen3-0.6B \
    --port 8200 \
    --max-model-len 2048 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code \
    --kv-transfer-config \
    '{
        "kv_connector": "SharedCPUMemoryConnector",
        "kv_rank": 1,
        "kv_parallel_size": 2,
        "kv_role": "kv_consumer",
        "kv_connector_module_path": "cpu_memory_connector"
    }'

Notes

Ensure that both the prefill and decode nodes are configured consistently, particularly in terms of kv_parallel_size.
The kv_rank should be unique for each process (e.g., 0 for producer, 1 for consumer).
This connector must be placed in a Python module or script accessible via the PYTHONPATH.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cpu_memory_connector.py		cpu_memory_connector.py
disagg_proxy_server.py		disagg_proxy_server.py
run_d.sh		run_d.sh
run_p.sh		run_p.sh
test.sh		test.sh
test_cpu_memory_connector.py		test_cpu_memory_connector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shared CPU Memory Connector for KV Transfer

Why Shared CPU Memory?

Setup

Prefill Node

Decode Node

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shared CPU Memory Connector for KV Transfer

Why Shared CPU Memory?

Setup

Prefill Node

Decode Node

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages