Skip to content

Control Plane for vLLM Semantic Router #85

@rootfs

Description

@rootfs

Is your feature request related to a problem? Please describe.
A Kubernetes-native Control Plane that automatically tunes the vLLM Semantic Router by selecting and applying the best router config (ConfigMap-backed config.yaml) based on live metrics and SLOs. The control plane continuously evaluates workload characteristics (prompt/response length distributions, classified category mix, and TPOT/time-per-output-token, TTFT, latency/error rates) via PromQL against Prometheus and GitOps-applies the chosen ConfigMap through Argo CD (or directly via Kubernetes API), supporting multiple routers and safe rollout/rollback.

Describe the solution you'd like

flowchart TD
  A[Prometheus] -->|PromQL| B(Control Plane)
  B --> C{Features: prompt/resp dist, category mix, TPOT, SLOs}
  C --> D[Identify Candidates ConfigMap]
  D -->|best| E{GitOps?}
  E -- Yes --> F[Create/Update ConfigMap manifest]
  F --> G[Argo CD]
  G --> H[Patch Router Config]
  E -- No --> H

  H --> I[Canary Observe: p95, TPOT, errors]
  I -->|SLO Pass| J[Promote to Stable for all routers in scope]
  I -->|SLO Fail| K[Rollback& Penalize Candidate]
  J --> L[Update CRD Status & Emit Events]
  K --> L
  L --> B

  subgraph Fleet
    H --> R1[Router A]
    H --> R2[Router B]
    H --> R3[Router C]
  end
Loading

Sub-issues

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions