You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,24 +47,24 @@ graph TD
47
47
G[...]
48
48
end
49
49
50
-
A -->|"1. Score(prompt, pods)"| B
51
-
B -->|2. Query Index| C
52
-
B -->|3. Return Scores| A
50
+
A -->|"(1) Score(prompt, pods)"| B
51
+
B -->|"(2) Query Index"| C
52
+
B -->|"(3) Return Scores"| A
53
53
54
-
E -->|A. Emit KVEvents| D
55
-
F -->|A. Emit KVEvents| D
56
-
D -->|B. Update Index| C
54
+
E -->|"(A) Emit KVEvents"| D
55
+
F -->|"(A) Emit KVEvents"| D
56
+
D -->|"(B) Update Index"| C
57
57
```
58
-
_Note: 1-3 represent the Read Path for scoring pods, while A-B represent the Write Path for ingesting KVEvents._
58
+
**Read Path:**
59
+
- (1) **Scoring Request**: A scheduler asks the **KVCache Indexer** to score a set of pods for a given prompt
60
+
- (2) **Index Query**: The indexer calculates the necessary KV-block keys from the prompt and queries the **KV-Block Index** to see which pods have those blocks
61
+
- (3) **Return Scores**: The indexer returns a map of pods and their corresponding KV-cache-hit scores to the scheduler
59
62
60
-
1.**Scoring Request**: A scheduler asks the **KVCache Indexer** to score a set of pods for a given prompt
61
-
2.**Index Query**: The indexer calculates the necessary KV-block keys from the prompt and queries the **KV-Block Index** to see which pods have those blocks
62
-
3.**Return Scores**: The indexer returns a map of pods and their corresponding KV-cache-hit scores to the scheduler
63
-
4.**Event Ingestion**: As vLLM pods create or evict KV-blocks, they emit `KVEvents` containing metadata about these changes
64
-
5.**Index Update**: The **Event Subscriber** consumes these events and updates the **KV-Block Index** in near-real-time
63
+
**Write Path:**
64
+
- (A) **Event Ingestion**: As vLLM pods create or evict KV-blocks, they emit `KVEvents` containing metadata about these changes
65
+
- (B) **Index Update**: The **Event Subscriber** consumes these events and updates the **KV-Block Index** in near-real-time
65
66
66
-
* For a more detailed breakdown, please see the high-level [Architecture Document](docs/architecture.md).
67
-
* For configuration details, see the [Configuration Document](docs/configuration.md).
67
+
> For a more detailed breakdown, please see the high-level [Architecture](docs/architecture.md) and the [Configuration](docs/configuration.md) docs.
68
68
69
69
-----
70
70
@@ -75,4 +75,4 @@ _Note: 1-3 represent the Read Path for scoring pods, while A-B represent the Wri
A reference implementation of how to integrate the `kvcache.Indexer` into a scheduler like the `llm-d-inference-scheduler`
77
77
*[**KV-Events**](examples/kv_events/README.md):
78
-
Demonstrates how the KV-Cache Manager handles KV-Events through both an offline example with a dummy ZMQ publisher and an online example using a vLLM Helm chart.
78
+
Demonstrates how the KV-Cache Manager handles KV-Events through both an offline example with a dummy ZMQ publisher and an online example using a vLLM Helm chart.
0 commit comments