Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 0 additions & 37 deletions .github/workflows/unifiedcache_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,40 +18,3 @@ jobs:

call-lint:
uses: ./.github/workflows/pre-commit.yml

unit-test:
needs: call-lint
name: Run Unittests
runs-on: ubuntu-latest
steps:
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
docker system prune -af
df -h

- name: Checkout unified-cache-management repo
uses: actions/checkout@v4

- name: Run unit test inside vLLM container
run: |
docker run --rm \
-e VLLM_USE_PRECOMPILED=1 \
-e PLATFORM=cuda \
-v ${{ github.workspace }}:/workspace/unified-cache-management \
-w /workspace/unified-cache-management \
--entrypoint /bin/bash \
vllm/vllm-openai:v0.9.2 \
-c "
set -euo pipefail
pip install -v -e . --no-build-isolation
cd \$(pip show vllm | grep Location | awk '{print \$2}') &&
git apply /workspace/unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-pc.patch
git apply /workspace/unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-aggre.patch
git apply /workspace/unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
cd /workspace/unified-cache-management
python3 -m unittest discover -s test
"
2 changes: 1 addition & 1 deletion docs/source/getting-started/installation_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ export PLATFORM=cuda
pip install -v -e . --no-build-isolation
```

**Note:** Patches are now applied automatically via dynamic patching when you import the unified-cache-management package. You no longer need to manually apply patches using `git apply`. The patches are automatically applied when you use the `UnifiedCacheConnectorV1` connector.
**Note:** Patches are now applied automatically via dynamic patching when you import the unified-cache-management package. You no longer need to manually apply patches using `git apply`. The patches are automatically applied when you use the `UCMConnector` connector.


## Setup from docker
Expand Down
8 changes: 4 additions & 4 deletions docs/source/user-guide/pd-disaggregation/1p1d.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down Expand Up @@ -55,8 +55,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down
8 changes: 4 additions & 4 deletions docs/source/user-guide/pd-disaggregation/npgd.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--dtype bfloat16 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down Expand Up @@ -63,8 +63,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--dtype bfloat16 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down
16 changes: 8 additions & 8 deletions docs/source/user-guide/pd-disaggregation/xpyd.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand All @@ -54,8 +54,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down Expand Up @@ -83,8 +83,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand All @@ -110,8 +110,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
--block-size 128 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user-guide/prefix-cache/nfs_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ vllm serve /home/models/Qwen2.5-14B-Instruct \
--port 7800 \
--kv-transfer-config \
'{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector": "UCMConnector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_both",
"kv_connector_extra_config": {"UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"}
}'
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user-guide/sparse-attention/gsa.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Similar to UCM's `offline_inference_esa.py` examples. We only need to specify `u
...
ktc = KVTransferConfig(
kv_connector=name,
kv_connector_module_path="ucm.integration.vllm.uc_connector",
kv_connector_module_path="ucm.integration.vllm.ucm_connector",
kv_role="kv_both",
kv_connector_extra_config={
"ucm_connector_name": "UcmNfsStore",
Expand Down Expand Up @@ -121,7 +121,7 @@ vllm serve /home/models/DeepSeek-R1-Distill-Qwen-32B \
--kv-transfer-config \
'{
"kv_connector": name,
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
"kv_role": "kv_both",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmNfsStore",
Expand Down
2 changes: 1 addition & 1 deletion examples/ucm_config_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ load_only_first_rank: false
# GSA: {}


# Whether to use layerwise loading/saving (optional, default: True for UnifiedCacheConnectorV1)
# Whether to use layerwise loading/saving (optional, default: True for UCMConnector)
# use_layerwise: true
# hit_ratio: 0.9

Loading