Skip to content

Commit f5205ce

Browse files
[Docs] Modify the kvaware routing doc (#652)
Signed-off-by: Rui Zhang <[email protected]>
1 parent 073df4e commit f5205ce

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source/use_cases/kv-cache-aware-routing.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
KV Cache Aware Routing
22
======================
33

4-
This tutorial demonstrates how to use KV cache aware routing in the vLLM Production Stack. KV cache aware routing ensures that subsequent requests with the same prompt prefix are routed to the same instance, maximizing KV cache utilization and improving performance.
4+
In this tutorial, you'll learn how to enable and use KV cache aware routing in the vLLM Production Stack. With KV cache aware routing, incoming requests are routed to the instance with the highest KV cache hit rate, which helps maximize cache efficiency and boost overall performance. Unlike prefix aware routing—which always sends requests with the same prefix to the same instance, even if the cache has been evicted—KV cache aware routing prioritizes cache hits to optimize resource usage.
55

66
Table of Contents
77
-----------------
@@ -78,7 +78,7 @@ Then, send another request with the same prompt prefix:
7878
"max_tokens": 100
7979
}'
8080
81-
You should observe that the second request is routed to the same instance as the first request. This is because the KV cache aware router detects that the second request shares a prefix with the first request and routes it to the same instance to maximize KV cache utilization.
81+
You should observe that the second request is routed to the same instance as the first request. This is because the KV cache aware router detects that the second request has a higher KV cache hit rate in the instance of the first request and routes it to the same instance to maximize KV cache utilization.
8282

8383
Step 4: Clean Up
8484
-----------------
@@ -98,4 +98,4 @@ In this tutorial, we've demonstrated how to:
9898
2. Set up port forwarding to access the router
9999
3. Test the KV cache aware routing functionality
100100

101-
The KV cache aware routing feature helps improve performance by ensuring that requests with shared prefixes are routed to the same instance, maximizing KV cache utilization.
101+
The KV cache aware routing feature helps improve performance by ensuring that requests will be routed to the instance with the highest KV cache hit rate, maximizing KV cache utilization.

0 commit comments

Comments
 (0)