fix docs

lijiachen19 · lijiachen19 · commit 33d19c51796b · 2025-12-09T19:26:52.000-08:00
diff --git a/README.md b/README.md
@@ -68,7 +68,7 @@ in either a local filesystem for single-machine scenarios or through NFS mount p
 
 ## Quick Start
 
-please refer to [Quick Start](https://ucm.readthedocs.io/en/latest/getting-started/quick_start.html).
+please refer to [Quick Start for vLLM](https://ucm.readthedocs.io/en/latest/getting-started/quickstart_vllm.html) and [Quick Start for vLLM-Ascend](https://ucm.readthedocs.io/en/latest/getting-started/quickstart_vllm_ascend.html).
 
 ---
 
diff --git a/docs/source/getting-started/quickstart_vllm.md b/docs/source/getting-started/quickstart_vllm.md
@@ -163,12 +163,14 @@ vllm serve Qwen/Qwen2.5-14B-Instruct \
 --kv-transfer-config \
 '{
     "kv_connector": "UCMConnector",
+    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
     "kv_role": "kv_both",
-    "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/vllm-workspace/unified-cache-management/examples/ucm_config_example.yaml"}
+    "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"}
 }'
 ```
+**⚠️ The parameter `--no-enable-prefix-caching` is for SSD performance testing, please remove it for production.**
 
-**⚠️ Make sure to replace `"/vllm-workspace/unified-cache-management/examples/ucm_config_example.yaml"` with your actual config file path.**
+**⚠️ Make sure to replace `"/workspace/unified-cache-management/examples/ucm_config_example.yaml"` with your actual config file path.**
 
 
 If you see log as below:
diff --git a/docs/source/getting-started/quickstart_vllm_ascend.md b/docs/source/getting-started/quickstart_vllm_ascend.md
@@ -131,12 +131,14 @@ vllm serve Qwen/Qwen2.5-14B-Instruct \
 --kv-transfer-config \
 '{
     "kv_connector": "UCMConnector",
+    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
     "kv_role": "kv_both",
-    "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/vllm-workspace/unified-cache-management/examples/ucm_config_example.yaml"}
+    "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/workspace/unified-cache-management/examples/ucm_config_example.yaml"}
 }'
 ```
+**⚠️ The parameter `--no-enable-prefix-caching` is for SSD performance testing, please remove it for production.**
 
-**⚠️ Make sure to replace `"/vllm-workspace/unified-cache-management/examples/ucm_config_example.yaml"` with your actual config file path.**
+**⚠️ Make sure to replace `"/workspace/unified-cache-management/examples/ucm_config_example.yaml"` with your actual config file path.**
 
 
 If you see log as below:
diff --git a/docs/source/user-guide/prefix-cache/nfs_store.md b/docs/source/user-guide/prefix-cache/nfs_store.md
@@ -109,8 +109,6 @@ Explanation:
 
 ## Launching Inference
 
-### Offline Inference
-
 In this guide, we describe **online inference** using vLLM with the UCM connector, deployed as an OpenAI-compatible server. For best performance with UCM, it is recommended to set `block_size` to 128.
 
 To start the vLLM server with the Qwen/Qwen2.5-14B-Instruct model, run:
@@ -129,6 +127,7 @@ vllm serve Qwen/Qwen2.5-14B-Instruct \
 '{
     "kv_connector": "UCMConnector",
     "kv_role": "kv_both",
+    "kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
     "kv_connector_extra_config": {"UCM_CONFIG_FILE": "/vllm-workspace/unified-cache-management/examples/ucm_config_example.yaml"}
 }'
 ```