Skip to content
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
a39c26f
remove native kv-indexer binary, use maturin-built one everywhere
PeaBrane Mar 13, 2026
e6cf7cc
silence machete's clap false positive
PeaBrane Mar 13, 2026
36d09c2
Merge branch 'main' into rupei/nuke-native-kv-indexer-binary
PeaBrane Mar 13, 2026
04b9aed
remove test-endpoints feature gate, always expose pause/resume listen…
PeaBrane Mar 13, 2026
c32605c
include dynamo-kv-indexer binary in maturin wheel build
PeaBrane Mar 13, 2026
59a9ecc
fix(kv-indexer): keep runtime in the maturin lane
PeaBrane Mar 15, 2026
64869f3
chore(kv-router): untangle runtime from metrics
PeaBrane Mar 15, 2026
a584cd1
docs(kv-indexer): restore runtime guide
PeaBrane Mar 15, 2026
cdc3670
fix(kv-indexer): drop the extra native bin
PeaBrane Mar 15, 2026
9fba762
ci: teach filters about indexer
PeaBrane Mar 15, 2026
bffee2d
chore: peel off indexer launcher follow-up
PeaBrane Mar 15, 2026
523ce18
Revert "chore: peel off indexer launcher follow-up"
PeaBrane Mar 15, 2026
2c7a8b8
chore(kv-indexer): remove launcher and test path
PeaBrane Mar 15, 2026
85f6a55
Merge branch 'rupei/nuke-native-kv-indexer-binary' into rupei/kv-inde…
PeaBrane Mar 15, 2026
83cad0b
chore(bindings): hush cargo-machete
PeaBrane Mar 15, 2026
a5a8c93
chore(bindings): drop stale kv-indexer feature flags
PeaBrane Mar 15, 2026
6796dc1
Merge branch 'rupei/nuke-native-kv-indexer-binary' into rupei/kv-inde…
PeaBrane Mar 15, 2026
670923f
fix(bindings): restore kv-indexer build path
PeaBrane Mar 15, 2026
d7d763d
chore(bindings): drop stale machete ignore
PeaBrane Mar 15, 2026
083c104
chore(bindings): refresh python lockfile
PeaBrane Mar 15, 2026
e55dd2b
Merge branch 'rupei/nuke-native-kv-indexer-binary' into rupei/kv-inde…
PeaBrane Mar 15, 2026
b01ff51
chore(bindings): refresh child lockfile
PeaBrane Mar 15, 2026
cfb6889
test(router): drop stray mocker matrix
PeaBrane Mar 15, 2026
99ec893
Merge branch 'rupei/nuke-native-kv-indexer-binary' into rupei/kv-inde…
PeaBrane Mar 15, 2026
237a365
Merge remote-tracking branch 'origin/main' into rupei/kv-indexer-pyth…
PeaBrane Mar 15, 2026
a1b5a68
Merge branch 'main' into rupei/kv-indexer-python-launcher-pr
PeaBrane Mar 15, 2026
323fc1e
fix(indexer): handle cli parse errors
PeaBrane Mar 15, 2026
414b2ce
Merge remote-tracking branch 'origin/rupei/kv-indexer-python-launcher…
PeaBrane Mar 15, 2026
00d34d0
chore(indexer): move launcher into bindings
PeaBrane Mar 15, 2026
813480c
docs(indexer): sync standalone guide
PeaBrane Mar 15, 2026
098ba3f
fix(indexer): call manual workers manual
PeaBrane Mar 15, 2026
389525c
fix(bindings): gate pyerr import and refresh locks
PeaBrane Mar 15, 2026
0dd9b99
ci(indexer): build launcher-capable wheel
PeaBrane Mar 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/filters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ core:
- 'lib/**'
- 'tests/**'
- 'components/src/dynamo/router/**'
- 'components/src/dynamo/indexer/**'
- 'components/src/dynamo/mocker/**'
- 'components/src/dynamo/frontend/**'
- 'components/src/dynamo/common/**'
Expand Down Expand Up @@ -157,6 +158,7 @@ frontend:
- 'container/deps/*'
- 'container/compliance/**'
- 'components/src/dynamo/router/**'
- 'components/src/dynamo/indexer/**'
- 'components/src/dynamo/mocker/**'
- 'components/src/dynamo/frontend/**'
- 'components/src/dynamo/common/**'
Expand Down
4 changes: 0 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ dynamo-config = { path = "lib/config", version = "1.0.0" }
dynamo-tokens = { path = "lib/tokens", version = "1.0.0" }
dynamo-memory = { path = "lib/memory", version = "1.0.0" }
dynamo-mocker = { path = "lib/mocker", version = "1.0.0" }
dynamo-kv-router = { path = "lib/kv-router", version = "1.0.0", features = ["metrics"] }
dynamo-kv-router = { path = "lib/kv-router", version = "1.0.0", features = ["metrics", "runtime-protocols"] }
dynamo-async-openai = { path = "lib/async-openai", version = "1.0.0", features = ["byot"] }
dynamo-parsers = { path = "lib/parsers", version = "1.0.0" }

Expand Down
2 changes: 2 additions & 0 deletions components/src/dynamo/indexer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
7 changes: 7 additions & 0 deletions components/src/dynamo/indexer/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

from dynamo.indexer.main import main

if __name__ == "__main__":
raise SystemExit(main())
22 changes: 22 additions & 0 deletions components/src/dynamo/indexer/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

import os
import sys
from collections.abc import Sequence

os.environ.setdefault("DYNAMO_SKIP_PYTHON_LOG_INIT", "1")

from dynamo.llm import run_kv_indexer


def main(argv: Sequence[str] | None = None) -> int:
args = list(sys.argv[1:] if argv is None else argv)
try:
run_kv_indexer(args)
except Exception as exc:
if "-h" in args or "--help" in args:
print(exc)
return 0
raise
return 0
97 changes: 63 additions & 34 deletions docs/components/router/standalone-indexer.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ subtitle: Run the KV cache indexer as an independent HTTP service for querying b

## Overview

The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight binary that maintains a radix tree of cached blocks and exposes HTTP endpoints for querying and managing workers. It supports two operational modes:
The standalone KV indexer (`python -m dynamo.indexer`) is a lightweight service that maintains a radix tree of cached blocks and exposes HTTP endpoints for querying and managing workers. It supports two operational modes:

- **Standalone mode** (default): Subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime dependencies required.
- **Dynamo runtime mode** (`--dynamo-runtime`): Integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and serves indexer queries over the request plane for remote frontends.
- **Standalone mode** (default): subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime dependencies required.
- **Dynamo runtime mode** (`--dynamo-runtime`): integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and overlap queries over the request plane for remote frontends.

This is distinct from the [Standalone Router](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.
This is distinct from the [Standalone Router](../../../components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.

The HTTP API follows the [Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403) conventions.

Expand Down Expand Up @@ -56,11 +56,11 @@ If no peers are reachable, the indexer starts with an empty state.

```bash
# Replica A (first instance, no peers)
dynamo-kv-indexer --port 8090 --block-size 16 \
python -m dynamo.indexer --port 8090 --block-size 16 \
--workers "1=tcp://worker1:5557,2=tcp://worker2:5558"

# Replica B (recovers from A on startup)
dynamo-kv-indexer --port 8091 --block-size 16 \
python -m dynamo.indexer --port 8091 --block-size 16 \
--workers "1=tcp://worker1:5557,2=tcp://worker2:5558" \
--peers "http://localhost:8090"
```
Expand All @@ -81,52 +81,50 @@ Peers can be registered at startup via `--peers` or dynamically via the HTTP API

## Building

The binary is a feature-gated target in the `dynamo-kv-router` crate. The available cargo features control which capabilities are compiled in:
The service is exposed through the Python package after building the bindings with maturin. Feature flags control which capabilities are compiled in:

| Feature | Description |
|---------|-------------|
| `standalone-indexer` | Core standalone indexer library (HTTP server, ZMQ listeners, P2P recovery) |
| `metrics` | Prometheus metrics (`/metrics` endpoint, request/worker gauges) |
| `indexer-bin` | CLI binary target |
| `indexer-runtime` | Dynamo runtime integration (discovery, event plane, request plane) |
| `test-endpoints` | Test-only endpoints (`/test/pause_listener`, `/test/resume_listener`) |
| `kv-indexer` | Core standalone indexer binary (HTTP API, ZMQ listeners, P2P recovery) |
| `kv-indexer-metrics` | Optional `/metrics` endpoint |
| `kv-indexer-runtime` | Dynamo runtime integration (`--dynamo-runtime`, discovery, event plane, request plane) |

### Standalone build (no runtime dependency)
### Standalone build

```bash
cargo build -p dynamo-kv-router --features indexer-bin --bin dynamo-kv-indexer
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer
```

This produces a binary with no `dynamo-runtime` dependency. It supports ZMQ event listeners, HTTP API, and P2P recovery.
After installation, launch the service with `python -m dynamo.indexer`.

### Standalone build with metrics

```bash
cargo build -p dynamo-kv-router --features indexer-bin,metrics --bin dynamo-kv-indexer
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer,kv-indexer-metrics
```

Adds Prometheus metrics support (`/metrics` endpoint). Pulls in `dynamo-runtime` for the metrics implementation.
This keeps the default `kv-indexer` build lean while still allowing Prometheus metrics when needed.

### Runtime-enabled build

```bash
cargo build -p dynamo-kv-router --features indexer-bin,indexer-runtime --bin dynamo-kv-indexer
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer,kv-indexer-runtime
```

Enables the `--dynamo-runtime` CLI flag for MDC discovery, event plane subscription, and request plane query endpoint. Includes metrics.
This enables the `--dynamo-runtime` CLI flag for MDC discovery, event-plane subscription, and request-plane queries. It also includes the metrics endpoint.

## CLI

### Standalone mode (default)

```bash
dynamo-kv-indexer --port 8090 [--threads 4] [--block-size 16 --model-name my-model --tenant-id default --workers "1=tcp://host:5557,2:1=tcp://host:5558"] [--peers "http://peer1:8090,http://peer2:8091"]
python -m dynamo.indexer --port 8090 [--threads 4] [--block-size 16 --model-name my-model --tenant-id default --workers "1=tcp://host:5557,2:1=tcp://host:5558"] [--peers "http://peer1:8090,http://peer2:8091"]
```

### Dynamo runtime mode (requires `indexer-runtime` feature)
### Dynamo runtime mode

```bash
dynamo-kv-indexer --dynamo-runtime --namespace default --component-name kv-indexer --worker-component backend --port 8090 [--threads 4]
python -m dynamo.indexer --dynamo-runtime --namespace default --component-name kv-indexer --worker-component backend --port 8090 [--threads 4]
```

In runtime mode, workers are discovered automatically via MDC. The `--workers` flag can still be used to register additional static workers alongside discovered ones.
Expand All @@ -140,10 +138,10 @@ In runtime mode, workers are discovered automatically via MDC. The `--workers` f
| `--model-name` | `default` | Model name for initial `--workers` |
| `--tenant-id` | `default` | Tenant ID for initial `--workers` |
| `--peers` | (none) | Comma-separated peer indexer URLs for P2P recovery on startup |
| `--dynamo-runtime` | `false` | Enable Dynamo runtime integration (requires `indexer-runtime` feature) |
| `--namespace` | `default` | Dynamo namespace to register the indexer component under (runtime mode) |
| `--component-name` | `kv-indexer` | Component name for this indexer in the Dynamo runtime (runtime mode) |
| `--worker-component` | `backend` | Component name that workers register under, for event plane subscription (runtime mode) |
| `--dynamo-runtime` | `false` | Enable Dynamo runtime integration (requires `kv-indexer-runtime`) |
| `--namespace` | `default` | Dynamo namespace to register the indexer component under |
| `--component-name` | `kv-indexer` | Component name for this indexer in the Dynamo runtime |
| `--worker-component` | `backend` | Component name that workers register under for event-plane subscription |

## HTTP API

Expand All @@ -157,7 +155,7 @@ curl http://localhost:8090/health

### `GET /metrics` — Prometheus metrics

Returns metrics in Prometheus text exposition format. Available when the binary is built with the `metrics` or `indexer-runtime` feature.
Returns metrics in Prometheus text exposition format. Available when the binary is built with the `kv-indexer-metrics` or `kv-indexer-runtime` feature.

```bash
curl http://localhost:8090/metrics
Expand All @@ -170,10 +168,12 @@ curl http://localhost:8090/metrics
| `dynamo_kvindexer_errors_total` | Counter | `endpoint`, `status_class` | HTTP error responses (4xx/5xx) |
| `dynamo_kvindexer_models` | Gauge | — | Number of active model+tenant indexers |
| `dynamo_kvindexer_workers` | Gauge | — | Number of registered worker instances |
| `dynamo_kvindexer_listeners` | Gauge | `status` | Number of ZMQ listeners by status (`pending`, `active`, `paused`, `failed`) |

### `POST /register` — Register an endpoint

Register a ZMQ endpoint for an instance. Each call creates or reuses the indexer for the given `(model_name, tenant_id)` pair.
Registration is non-blocking: if the worker is not up yet, the listener is accepted in `pending` state and transitions to `active` once the initial ZMQ connection succeeds.

```bash
# Single model, default tenant
Expand Down Expand Up @@ -245,9 +245,38 @@ curl http://localhost:8090/workers

Returns:
```json
[{"instance_id": 1, "endpoints": {"0": "tcp://127.0.0.1:5557", "1": "tcp://127.0.0.1:5558"}}]
[
{
"instance_id": 1,
"source": "zmq",
"status": "active",
"endpoints": {
"0": "tcp://127.0.0.1:5557",
"1": "tcp://127.0.0.1:5558"
},
"listeners": {
"0": {
"endpoint": "tcp://127.0.0.1:5557",
"status": "active"
},
"1": {
"endpoint": "tcp://127.0.0.1:5558",
"status": "active"
}
}
},
{
"instance_id": 2,
"source": "discovery",
"status": "active",
"endpoints": {},
"listeners": {}
}
]
```

For ZMQ-managed workers, `status` is aggregated across listeners with priority `failed > pending > active > paused`. Each listener entry may also expose a `last_error` field when the most recent startup or recv-loop attempt failed.

### `POST /query` — Query overlap for token IDs

Given raw token IDs, compute block hashes and return per-instance overlap scores (in matched tokens):
Expand Down Expand Up @@ -367,7 +396,7 @@ When started with `--dynamo-runtime`, the indexer integrates with the Dynamo dis

### Worker Discovery

The indexer watches MDC (Model Discovery Catalog) for worker additions and removals. When a worker registers with MDC, the indexer automatically creates an indexer for its model and block size. Workers discovered via MDC are tracked separately from those registered via `--workers` or the `/register` HTTP API a worker cannot be registered through both paths simultaneously.
The indexer watches MDC (Model Discovery Catalog) for worker additions and removals. When a worker registers with MDC, the indexer automatically creates an indexer for its model and block size. Workers discovered via MDC are tracked separately from those registered via `--workers` or the `/register` HTTP API; a worker cannot be registered through both paths simultaneously.

### Event Plane Subscription

Expand All @@ -381,7 +410,7 @@ The indexer registers a query endpoint on the Dynamo request plane, allowing fro

```bash
# Start the indexer with runtime integration
dynamo-kv-indexer --dynamo-runtime \
python -m dynamo.indexer --dynamo-runtime \
--namespace my-namespace \
--component-name kv-indexer \
--worker-component backend \
Expand All @@ -392,7 +421,7 @@ The HTTP API remains fully available in runtime mode. Static workers can be adde

## Limitations

- **Standalone mode is ZMQ only**: In standalone mode, workers must publish KV events via ZMQ PUB sockets. Build with `indexer-runtime` and use `--dynamo-runtime` to receive events via the event plane (NATS or ZMQ).
- **Standalone mode is ZMQ only**: In standalone mode, workers must publish KV events via ZMQ PUB sockets. Build with `kv-indexer-runtime` and use `--dynamo-runtime` to receive events via the event plane (NATS or ZMQ).
- **No routing logic**: The indexer only maintains the radix tree and answers queries. It does not track active blocks, manage request lifecycle, or perform worker selection.

## Architecture
Expand All @@ -410,7 +439,7 @@ graph TD
REG[Worker Registry]
ZMQ[ZMQ SUB Listeners]
IDX["Indexer Map<br/>(model, tenant) → Radix Tree"]
HTTP[HTTP API<br/>/query /dump /register /metrics /health]
HTTP[HTTP API<br/>/query /dump /register /health]
end

CLIENT[External Client]
Expand Down Expand Up @@ -453,7 +482,7 @@ graph TD
REG[Worker Registry]
IDX["Indexer Map<br/>(model, tenant) → Radix Tree"]
QE[Query Endpoint]
HTTP[HTTP API<br/>/query /dump /register]
HTTP[HTTP API<br/>/query /dump /register /metrics]
end

FRONTEND[Frontend / Router]
Expand Down Expand Up @@ -511,4 +540,4 @@ sequenceDiagram
- **[Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403)**: Community API standardization for KV cache indexers
- **[Router Guide](router-guide.md)**: Full KV router configuration and tuning
- **[Router Design](../../design-docs/router-design.md)**: Architecture and event transport modes
- **[Standalone Router](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers)
- **[Standalone Router](../../../components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers)
7 changes: 2 additions & 5 deletions lib/bindings/python/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ crate-type = ["cdylib", "rlib"]
default = []
media-ffmpeg = ["dynamo-llm/media-ffmpeg"]
kv-indexer = ["dep:dynamo-kv-router", "dep:clap", "dep:tracing-subscriber"]
kv-indexer-runtime = ["kv-indexer", "dynamo-kv-router/indexer-runtime"]
kv-indexer-metrics = ["kv-indexer", "dynamo-kv-router/metrics"]

[dependencies]
dynamo-runtime = { path = "../../runtime" }
Expand Down Expand Up @@ -76,9 +78,4 @@ dynamo-llm = { path = "../../llm" }
[target.'cfg(not(target_os = "linux"))'.dependencies]
dynamo-llm = { path = "../../llm", default-features = false }

[[bin]]
name = "dynamo-kv-indexer"
path = "rust/bin/kv_indexer.rs"
required-features = ["kv-indexer"]

[dev-dependencies]
61 changes: 0 additions & 61 deletions lib/bindings/python/rust/bin/kv_indexer.rs

This file was deleted.

Loading
Loading