Skip to content

Commit ddfc936

Browse files
[feat] Cherry-pick updates from 0.2.0-release to develop (patches and docs) (#623)
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ OUR OFFICIAL WEBSITE. --> # Purpose What this PR does / why we need it? All patches have been updated. Now, the integration can be completed by applying a single patch: `git apply vllm-adapt.patch` At the same time, the workflow still supports independent development and maintenance using separate patches, such as: vllm-adapt-sparse.patch or vllm-rerope-adapt.patch <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> # Modifications Does this PR introduce _any_ user-facing change? - Remove redundant patches: vllm-adapt-aggre.patch and vllm-adapt-pc.patch - Update vllm-adapt.patch vllm-adapt-sparse.patch and vllm-rerope-adapt.patch <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> # Test How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Co-authored-by: wangxin <1848802892@qq.com>
1 parent c98f134 commit ddfc936

File tree

10 files changed

+2423
-1304
lines changed

10 files changed

+2423
-1304
lines changed

docs/source/getting-started/quickstart_vllm.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,33 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
7777
pip install -v -e . --no-build-isolation
7878
```
7979

80+
3. Apply vLLM Integration Patches (Required)
81+
82+
To enable Unified Cache Management (UCM) integration with vLLM, you must **manually apply the corresponding vLLM patch**.
83+
84+
You may directly navigate to the vLLM source directory:
85+
```bash
86+
cd <path_to_vllm>
87+
```
88+
Apply the patch that matches your development needs:
89+
90+
- Full UCM integration (recommended):
91+
```bash
92+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
93+
```
94+
95+
- Sparse attention only:
96+
```bash
97+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
98+
```
99+
100+
- ReRoPE support only:
101+
```bash
102+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-rerope.patch
103+
```
104+
105+
Choose the patch according to your development needs.
106+
If you are working on **sparse attention** or **ReRoPE** independently, applying only the corresponding patch is sufficient.
80107

81108

82109
### Option 3: Install by pip

docs/source/getting-started/quickstart_vllm_ascend.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ We offer 3 options to install UCM.
1212

1313
### Option 1: Build from source
1414

15-
Follow commands below to install unified-cache-management from source code:
15+
1、Follow commands below to install unified-cache-management from source code:
1616
**Note:** The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` before you build.
1717
```bash
1818
# Replace <branch_or_tag_name> with the branch or tag name needed
@@ -23,6 +23,31 @@ pip install -v -e . --no-build-isolation
2323
cd ..
2424
```
2525

26+
2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
27+
To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.
28+
29+
**Step 1:** Apply the vLLM Patch
30+
31+
First, apply the standard vLLM integration patch in the vLLM source directory:
32+
33+
```bash
34+
cd <path_to_vllm>
35+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
36+
```
37+
38+
**Step 2:** Apply the vLLM-Ascend Patch
39+
40+
Then, switch to the vLLM-Ascend source directory and apply the Ascend-specific patch:
41+
42+
```bash
43+
cd <path_to_vllm_ascend>
44+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
45+
```
46+
47+
**Note:**
48+
The ReRoPE algorithm is not supported on Ascend at the moment.
49+
Only the standard UCM integration is applicable for vLLM-Ascend.
50+
2651

2752
### Option 2: Install by pip
2853
Install by pip or find the pre-build wheels on [Pypi](https://pypi.org/project/uc-manager/).

examples/offline_inference_kvcomphbm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
7777
},
7878
}
7979
],
80-
"ucm_sparse_config": {"GSA": {}},
80+
"ucm_sparse_config": {"KvCompOnDevice": {}},
8181
},
8282
)
8383

examples/ucm_config_example.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,7 @@ load_only_first_rank: false
3131
# Or for GSA:
3232
# GSA: {}
3333
# Or for KvCompOnDevice:
34-
# KvCompOnDevice:
35-
# "kvcompOnDevice_config_path": "workspace/unified-cache-management/ucm/sparse/kvcomp/configs/kvcomp_qwen3_32B_config.json"
34+
# KvCompOnDevice: {}
3635

3736

3837
# Whether to use layerwise loading/saving (optional, default: True for UCMConnector)

0 commit comments

Comments
 (0)