ModelEngine-Group · yuanzhg078 · Jan 5, 2026 · Jan 4, 2026 · Jan 5, 2026 · Jan 5, 2026
@@ -77,6 +77,33 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
     pip install -v -e . --no-build-isolation
     ```
 
+3. Apply vLLM Integration Patches (Required)
+
+    To enable Unified Cache Management (UCM) integration with vLLM, you must **manually apply the corresponding vLLM patch**.
+
+    You may directly navigate to the vLLM source directory:
+    ```bash
+    cd <path_to_vllm>
+    ```
+    Apply the patch that matches your development needs:
+
+    - Full UCM integration (recommended):
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
+    ```
+
+    - Sparse attention only:
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
+    ```
+
+    - ReRoPE support only:
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-rerope.patch
+    ```
+
+    Choose the patch according to your development needs.
+    If you are working on **sparse attention** or **ReRoPE** independently, applying only the corresponding patch is sufficient.
 
 
 ### Option 3: Install by pip

@@ -12,7 +12,7 @@ We offer 3 options to install UCM.
 
 ### Option 1: Build from source
 
-Follow commands below to install unified-cache-management from source code:
+1、Follow commands below to install unified-cache-management from source code:
 **Note:** The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` before you build.
 ```bash
 # Replace <branch_or_tag_name> with the branch or tag name needed
@@ -23,6 +23,31 @@ pip install -v -e . --no-build-isolation
 cd ..
 ```
 
+2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
+To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.
+
+**Step 1:** Apply the vLLM Patch
+
+First, apply the standard vLLM integration patch in the vLLM source directory:
+
+```bash
+cd <path_to_vllm>
+git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
+```
+
+**Step 2:** Apply the vLLM-Ascend Patch
+
+Then, switch to the vLLM-Ascend source directory and apply the Ascend-specific patch:
+
+```bash
+cd <path_to_vllm_ascend>
+git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
+```
+
+**Note:**
+    The ReRoPE algorithm is not supported on Ascend at the moment.
+    Only the standard UCM integration is applicable for vLLM-Ascend.
+
 
 ### Option 2: Install by pip
 Install by pip or find the pre-build wheels on [Pypi](https://pypi.org/project/uc-manager/).

@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
                     },
                 }
             ],
-            "ucm_sparse_config": {"GSA": {}},
+            "ucm_sparse_config": {"KvCompOnDevice": {}},
         },
     )
 

@@ -31,8 +31,7 @@ load_only_first_rank: false
   # Or for GSA:
   # GSA: {}
   # Or for KvCompOnDevice:
-  # KvCompOnDevice:
-  #   "kvcompOnDevice_config_path": "workspace/unified-cache-management/ucm/sparse/kvcomp/configs/kvcomp_qwen3_32B_config.json"
+  # KvCompOnDevice: {}
 
 
 # Whether to use layerwise loading/saving (optional, default: True for UCMConnector)
-Original file line number
+Diff line change
@@ Expand Up @@
                         },
                     }
                 ],
-                "ucm_sparse_config": {"GSA": {}},
+                "ucm_sparse_config": {"KvCompOnDevice": {}},
             },
         )
@@ Expand Down @@