Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions docs/source/getting-started/quickstart_vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,33 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
pip install -v -e . --no-build-isolation
```

3. Apply vLLM Integration Patches (Required)

To enable Unified Cache Management (UCM) integration with vLLM, you must **manually apply the corresponding vLLM patch**.

You may directly navigate to the vLLM source directory:
```bash
cd <path_to_vllm>
```
Apply the patch that matches your development needs:

- Full UCM integration (recommended):
```bash
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
```

- Sparse attention only:
```bash
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
```

- ReRoPE support only:
```bash
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-rerope.patch
```

Choose the patch according to your development needs.
If you are working on **sparse attention** or **ReRoPE** independently, applying only the corresponding patch is sufficient.


### Option 3: Install by pip
Expand Down
27 changes: 26 additions & 1 deletion docs/source/getting-started/quickstart_vllm_ascend.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ We offer 3 options to install UCM.

### Option 1: Build from source

Follow commands below to install unified-cache-management from source code:
1、Follow commands below to install unified-cache-management from source code:
**Note:** The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` before you build.
```bash
# Replace <branch_or_tag_name> with the branch or tag name needed
Expand All @@ -23,6 +23,31 @@ pip install -v -e . --no-build-isolation
cd ..
```

2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.

**Step 1:** Apply the vLLM Patch

First, apply the standard vLLM integration patch in the vLLM source directory:

```bash
cd <path_to_vllm>
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
```

**Step 2:** Apply the vLLM-Ascend Patch

Then, switch to the vLLM-Ascend source directory and apply the Ascend-specific patch:

```bash
cd <path_to_vllm_ascend>
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
```

**Note:**
The ReRoPE algorithm is not supported on Ascend at the moment.
Only the standard UCM integration is applicable for vLLM-Ascend.


### Option 2: Install by pip
Install by pip or find the pre-build wheels on [Pypi](https://pypi.org/project/uc-manager/).
Expand Down
2 changes: 1 addition & 1 deletion examples/offline_inference_kvcomphbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
},
}
],
"ucm_sparse_config": {"GSA": {}},
"ucm_sparse_config": {"KvCompOnDevice": {}},
},
)

Expand Down
3 changes: 1 addition & 2 deletions examples/ucm_config_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ load_only_first_rank: false
# Or for GSA:
# GSA: {}
# Or for KvCompOnDevice:
# KvCompOnDevice:
# "kvcompOnDevice_config_path": "workspace/unified-cache-management/ucm/sparse/kvcomp/configs/kvcomp_qwen3_32B_config.json"
# KvCompOnDevice: {}


# Whether to use layerwise loading/saving (optional, default: True for UCMConnector)
Expand Down
Loading