-
Notifications
You must be signed in to change notification settings - Fork 180
Open
1 / 31 of 3 issues completedLabels
Milestone
Description
Is your feature request related to a problem? Please describe.
A Kubernetes-native Control Plane that automatically tunes the vLLM Semantic Router by selecting and applying the best router config (ConfigMap-backed config.yaml) based on live metrics and SLOs. The control plane continuously evaluates workload characteristics (prompt/response length distributions, classified category mix, and TPOT/time-per-output-token, TTFT, latency/error rates) via PromQL against Prometheus and GitOps-applies the chosen ConfigMap through Argo CD (or directly via Kubernetes API), supporting multiple routers and safe rollout/rollback.
Describe the solution you'd like
flowchart TD
A[Prometheus] -->|PromQL| B(Control Plane)
B --> C{Features: prompt/resp dist, category mix, TPOT, SLOs}
C --> D[Identify Candidates ConfigMap]
D -->|best| E{GitOps?}
E -- Yes --> F[Create/Update ConfigMap manifest]
F --> G[Argo CD]
G --> H[Patch Router Config]
E -- No --> H
H --> I[Canary Observe: p95, TPOT, errors]
I -->|SLO Pass| J[Promote to Stable for all routers in scope]
I -->|SLO Fail| K[Rollback& Penalize Candidate]
J --> L[Update CRD Status & Emit Events]
K --> L
L --> B
subgraph Fleet
H --> R1[Router A]
H --> R2[Router B]
H --> R3[Router C]
end