ModelEngine-Group
diff --git a/‎docs/source/getting-started/quickstart_vllm.md‎
Lines changed: 27 additions & 0 deletions b/‎docs/source/getting-started/quickstart_vllm.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎docs/source/getting-started/quickstart_vllm_ascend.md‎
Lines changed: 26 additions & 1 deletion b/‎docs/source/getting-started/quickstart_vllm_ascend.md‎
Lines changed: 26 additions & 1 deletion
diff --git a/‎examples/offline_inference_kvcomphbm.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/offline_inference_kvcomphbm.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/ucm_config_example.yaml‎
Lines changed: 1 addition & 2 deletions b/‎examples/ucm_config_example.yaml‎
Lines changed: 1 addition & 2 deletions
@@ -77,6 +77,33 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
     pip install -v -e . --no-build-isolation
     ```
 
+3. Apply vLLM Integration Patches (Required)
+
+    To enable Unified Cache Management (UCM) integration with vLLM, you must **manually apply the corresponding vLLM patch**.
+
+    You may directly navigate to the vLLM source directory:
+    ```bash
+    cd <path_to_vllm>
+    ```
+    Apply the patch that matches your development needs:
+
+    - Full UCM integration (recommended):
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
+    ```
+
+    - Sparse attention only:
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
+    ```
+
+    - ReRoPE support only:
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-rerope.patch
+    ```
+
+    Choose the patch according to your development needs.
+    If you are working on **sparse attention** or **ReRoPE** independently, applying only the corresponding patch is sufficient.
 
 
 ### Option 3: Install by pip
 
@@ -12,7 +12,7 @@ We offer 3 options to install UCM.
 
 ### Option 1: Build from source
 
-Follow commands below to install unified-cache-management from source code:
+1、Follow commands below to install unified-cache-management from source code:
 **Note:** The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` before you build.
 ```bash
 # Replace <branch_or_tag_name> with the branch or tag name needed
@@ -23,6 +23,31 @@ pip install -v -e . --no-build-isolation
 cd ..
 ```
 
+2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
+To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.
+
+**Step 1:** Apply the vLLM Patch
+
+First, apply the standard vLLM integration patch in the vLLM source directory:
+    
+```bash
+cd <path_to_vllm>
+git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
+```
+    
+**Step 2:** Apply the vLLM-Ascend Patch
+
+Then, switch to the vLLM-Ascend source directory and apply the Ascend-specific patch:
+
+```bash
+cd <path_to_vllm_ascend>
+git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
+```
+
+**Note:**
+    The ReRoPE algorithm is not supported on Ascend at the moment.
+    Only the standard UCM integration is applicable for vLLM-Ascend.
+
 
 ### Option 2: Install by pip
 Install by pip or find the pre-build wheels on [Pypi](https://pypi.org/project/uc-manager/).
 
@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
                     },
                 }
             ],
-            "ucm_sparse_config": {"GSA": {}},
+            "ucm_sparse_config": {"KvCompOnDevice": {}},
         },
     )
 
 
@@ -31,8 +31,7 @@ load_only_first_rank: false
   # Or for GSA:
   # GSA: {}
   # Or for KvCompOnDevice:
-  # KvCompOnDevice:
-  #   "kvcompOnDevice_config_path": "workspace/unified-cache-management/ucm/sparse/kvcomp/configs/kvcomp_qwen3_32B_config.json"
+  # KvCompOnDevice: {}
 
 
 # Whether to use layerwise loading/saving (optional, default: True for UCMConnector)
Original file line number	Diff line number	Diff line change
`@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):`
`77`	`77`	`},`
`78`	`78`	`}`
`79`	`79`	`],`
`80`		`- "ucm_sparse_config": {"GSA": {}},`
	`80`	`+ "ucm_sparse_config": {"KvCompOnDevice": {}},`
`81`	`81`	`},`
`82`	`82`	`)`
`83`	`83`