You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/components/router/standalone-indexer.md
+140-5Lines changed: 140 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,10 @@ subtitle: Run the KV cache indexer as an independent HTTP service for querying b
7
7
8
8
## Overview
9
9
10
-
The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight HTTP binary that subscribes to ZMQ KV event streams from workers, maintains a radix tree of cached blocks, and exposes HTTP endpoints for querying and managing workers.
10
+
The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight binary that maintains a radix tree of cached blocks and exposes HTTP endpoints for querying and managing workers. It supports two operational modes:
11
+
12
+
-**Standalone mode** (default): Subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime dependencies required.
13
+
-**Dynamo runtime mode** (`--dynamo-runtime`): Integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and serves indexer queries over the request plane for remote frontends.
11
14
12
15
This is distinct from the [Standalone Router](../../../components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.
13
16
@@ -23,14 +26,17 @@ The indexer maintains one radix tree per `(model_name, tenant_id)` pair. Workers
23
26
24
27
## Compatibility
25
28
26
-
The standalone indexer works with any engine that publishes KV cache events over ZMQ in the expected msgpack format. This includes bare vLLM and SGLang engines, which emit ZMQ KV events natively — no Dynamo-specific wrapper is required.
29
+
In standalone mode, the indexer works with any engine that publishes KV cache events over ZMQ in the expected msgpack format. This includes bare vLLM and SGLang engines, which emit ZMQ KV events natively — no Dynamo-specific wrapper is required.
30
+
31
+
In Dynamo runtime mode, the indexer discovers workers automatically via MDC and receives KV events through the event plane. It also registers a query endpoint on the request plane, allowing frontends to query overlap scores remotely without needing direct HTTP access.
27
32
28
33
## Use Cases
29
34
30
35
-**Debugging**: Inspect the radix tree state to verify which blocks are cached on which workers.
31
36
-**State verification**: Confirm that the indexer's view of KV cache state matches the router's internal state (used in integration tests).
32
37
-**Custom routing**: Build external routing logic that queries the indexer for overlap scores and makes its own worker selection decisions.
33
38
-**Monitoring**: Observe KV cache distribution across workers without running a full router.
39
+
-**Remote indexing**: In Dynamo runtime mode, frontends can offload KV cache indexing to a dedicated service and query it over the request plane.
34
40
35
41
## P2P Recovery
36
42
@@ -75,18 +81,56 @@ Peers can be registered at startup via `--peers` or dynamically via the HTTP API
75
81
76
82
## Building
77
83
78
-
The binary is a feature-gated target in the `dynamo-kv-router` crate:
84
+
The binary is a feature-gated target in the `dynamo-kv-router` crate. The available cargo features control which capabilities are compiled in:
In runtime mode, workers are discovered automatically via MDC. The `--workers` flag can still be used to register additional static workers alongside discovered ones.
133
+
90
134
| Flag | Default | Description |
91
135
|------|---------|-------------|
92
136
|`--block-size`| (none) | KV cache block size for initial `--workers` (required when `--workers` is set) |
Returns metrics in Prometheus text exposition format. Available when the binary is built with the `metrics`feature (enabled by default via `standalone-indexer`).
160
+
Returns metrics in Prometheus text exposition format. Available when the binary is built with the `metrics`or `indexer-runtime` feature.
113
161
114
162
```bash
115
163
curl http://localhost:8090/metrics
@@ -313,13 +361,44 @@ If no `replay_endpoint` is configured, gaps are logged as warnings but not recov
313
361
314
362
The sequence counter (`last_seq`) persists across unregister/register cycles, so re-registering a worker after a gap will trigger replay on the first batch received by the new listener.
315
363
364
+
## Dynamo Runtime Mode
365
+
366
+
When started with `--dynamo-runtime`, the indexer integrates with the Dynamo distributed runtime:
367
+
368
+
### Worker Discovery
369
+
370
+
The indexer watches MDC (Model Discovery Catalog) for worker additions and removals. When a worker registers with MDC, the indexer automatically creates an indexer for its model and block size. Workers discovered via MDC are tracked separately from those registered via `--workers` or the `/register` HTTP API — a worker cannot be registered through both paths simultaneously.
371
+
372
+
### Event Plane Subscription
373
+
374
+
Instead of connecting directly to ZMQ PUB sockets on each worker, the indexer subscribes to KV events through the Dynamo event plane. The transport (NATS or ZMQ) is determined by the `DYNAMO_EVENT_TRANSPORT` environment variable. Events are routed to the appropriate indexer based on the worker ID.
375
+
376
+
### Request Plane Query Endpoint
377
+
378
+
The indexer registers a query endpoint on the Dynamo request plane, allowing frontends to send `IndexerQueryRequest` messages containing a model name, namespace, and block hashes. The indexer looks up the appropriate radix tree and returns overlap scores. This enables frontends to use a remote indexer for KV-aware routing without direct HTTP access.
379
+
380
+
### Example
381
+
382
+
```bash
383
+
# Start the indexer with runtime integration
384
+
dynamo-kv-indexer --dynamo-runtime \
385
+
--namespace my-namespace \
386
+
--component-name kv-indexer \
387
+
--worker-component backend \
388
+
--port 8090 --threads 4
389
+
```
390
+
391
+
The HTTP API remains fully available in runtime mode. Static workers can be added via `--workers` alongside discovered workers.
392
+
316
393
## Limitations
317
394
318
-
-**ZMQ only**: Workers must publish KV events via ZMQ PUB sockets. The standalone indexer does not subscribe to NATS event streams.
395
+
-**Standalone mode is ZMQ only**: In standalone mode, workers must publish KV events via ZMQ PUB sockets. Build with `indexer-runtime` and use `--dynamo-runtime` to receive events via the event plane (NATS or ZMQ).
319
396
-**No routing logic**: The indexer only maintains the radix tree and answers queries. It does not track active blocks, manage request lifecycle, or perform worker selection.
0 commit comments