vllm-project · JaredforReal · Oct 13, 2025 · Oct 13, 2025 · Oct 13, 2025 · Oct 13, 2025
@@ -35,7 +35,20 @@ Example mappings:
 ## Profiles
 
 - `testing` : enables `mock-vllm` and `llm-katan`
-- `llm-katan` : enables only `llm-katan`
+- `llm-katan` : only `llm-katan`
+
+## Services and Ports
+
+These host ports are exposed when you bring the stack up:
+
+- Dashboard: http://localhost:8700 (Semantic Router Dashboard)
+- Envoy proxy: http://localhost:8801
+- Envoy admin: http://localhost:19000
+- Grafana: http://localhost:3000 (admin/admin)
+- Prometheus: http://localhost:9090
+- Open WebUI: http://localhost:3001
+- Mock vLLM (testing profile): http://localhost:8000
+- LLM Katan (testing/llm-katan profiles): http://localhost:8002
 
 ## Quick Start
 
@@ -71,6 +84,8 @@ docker compose -f deploy/docker-compose/docker-compose.yml --profile testing up
 docker compose -f deploy/docker-compose/docker-compose.yml down
 ```
 
+After the stack is healthy, open the Dashboard at http://localhost:8700.
+
 ## Overrides
 
 You can place a `docker-compose.override.yml` at repo root and combine:
@@ -130,18 +145,3 @@ All services join the `semantic-network` bridge network with a fixed subnet to m
 
 - Local observability only: `tools/observability/docker-compose.obs.yml`
 - Tracing stack: `tools/tracing/docker-compose.tracing.yaml`
-
-## Related Stacks
-
-- Local observability only: `tools/observability/docker-compose.obs.yml`
-- Tracing stack (standalone, dev): `tools/tracing/docker-compose.tracing.yaml`
-
-## Tracing & Grafana
-
-- Jaeger UI: http://localhost:16686
-- Grafana: http://localhost:3000 (admin/admin)
-  - Prometheus datasource (default) for metrics
-  - Jaeger datasource for exploring traces (search service `vllm-semantic-router`)
-
-By default, the router container uses `config/config.tracing.yaml` (enabled tracing, exporter to Jaeger).
-Override with `CONFIG_FILE=/app/config/config.yaml` if you don’t want tracing.
@@ -1,6 +1,6 @@
 # Semantic Router Kubernetes Deployment
 
-This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize.
+Kustomize manifests for deploying the Semantic Router and its observability stack (Prometheus, Grafana, Dashboard, optional Open WebUI + Pipelines) on Kubernetes.
 
 ## Architecture
 
@@ -12,8 +12,9 @@ The deployment consists of:
   - **Init Container**: Downloads/copies model files to persistent volume
   - **Main Container**: Runs the semantic router service
 - **Services**:
-  - Main service exposing gRPC port (50051), Classification API (8080), and metrics port (9190)
-  - Separate metrics service for monitoring
+  - Main service exposing gRPC (50051), Classification API (8080), and metrics (9190)
+  - Separate metrics service for monitoring (`semantic-router-metrics`)
+  - Observability services (Grafana, Prometheus, Dashboard, optional Open WebUI)
 
 ## Ports
 
@@ -23,17 +24,40 @@ The deployment consists of:
 
 ## Quick Start
 
-### Standard Kubernetes Deployment
+### Deploy Core (Router)
 
 ```bash
 kubectl apply -k deploy/kubernetes/
 
 # Check deployment status
-kubectl get pods -l app=semantic-router -n semantic-router
-kubectl get services -l app=semantic-router -n semantic-router
+kubectl get pods -l app=semantic-router -n vllm-semantic-router-system
+kubectl get services -l app=semantic-router -n vllm-semantic-router-system
 
 # View logs
-kubectl logs -l app=semantic-router -n semantic-router -f
+kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
+
+### Add Observability (Prometheus + Grafana + Dashboard + Playground)
+
+```bash
+kubectl apply -k deploy/kubernetes/observability/
+```
+
+Port-forward to UIs (local dev):
+
+```bash
+kubectl port-forward -n vllm-semantic-router-system svc/prometheus 9090:9090
+kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
+kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
+```
+
+Then open:
+
+- Prometheus → http://localhost:9090
+- Grafana → http://localhost:3000
+- Dashboard → http://localhost:8700
+- Open WebUI (Playground) → http://localhost:3001
+
 ```
 
 ### Kind (Kubernetes in Docker) Deployment
@@ -86,20 +110,20 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
 kubectl apply -k deploy/kubernetes/
 
 # Wait for deployment to be ready
-kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
+kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semantic-router-system --timeout=600s
 ```
 
 **Step 3: Check deployment status**
 
 ```bash
 # Check pods
-kubectl get pods -n semantic-router -o wide
+kubectl get pods -n vllm-semantic-router-system -o wide
 
 # Check services
-kubectl get services -n semantic-router
+kubectl get services -n vllm-semantic-router-system
 
 # View logs
-kubectl logs -l app=semantic-router -n semantic-router -f
+kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
 ```
 
 #### Resource Requirements for Kind
@@ -131,19 +155,30 @@ make port-forward-grpc
 
 # Access metrics
 make port-forward-metrics
+
+# Access Dashboard / Grafana / Open WebUI
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
+kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
+kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
 ```
 
 Or using kubectl directly:
 
 ```bash
 # Access Classification API (HTTP REST)
-kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080
 
 # Access gRPC API
-kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 50051:50051
 
 # Access metrics
-kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-metrics 9190:9190
+
+# Access Prometheus/Grafana/Dashboard/Open WebUI
+kubectl port-forward -n vllm-semantic-router-system svc/prometheus 9090:9090
+kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
+kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
+kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
 ```
 
 #### Testing the Deployment
@@ -313,7 +348,10 @@ Edit the `resources` section in `deployment.yaml` accordingly.
 - `namespace.yaml` - Dedicated namespace for the application
 - `config.yaml` - Application configuration
 - `tools_db.json` - Tools database for semantic routing
-- `kustomization.yaml` - Kustomize configuration for easy deployment
+- `kustomization.yaml` - Kustomize configuration for core deployment
+- `observability/` - Prometheus, Grafana, Dashboard, optional Open WebUI + Pipelines (with its own `kustomization.yaml`)
+
+For detailed observability setup and screenshots, see `deploy/kubernetes/observability/README.md`.
 
 ### Development Tools
 

@@ -10,6 +10,9 @@ This guide adds a production-ready Prometheus + Grafana stack to the existing Se
 |--------------|---------|-----------|
 | Prometheus   | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
 | Grafana      | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
+| Dashboard    | Unified UI that links Router, Prometheus, and embeds Grafana; reads Router config | `dashboard/` (`configmap.yaml`, `deployment.yaml`, `service.yaml`)|
+| Open WebUI | Playground UI for interacting with the router via a Manifold Pipeline | `openwebui/` (`deployment.yaml`, `service.yaml`)|
+| Pipelines | Executes the `vllm_semantic_router_pipe.py` manifold for Open WebUI | `pipelines/deployment.yaml` (includes a ConfigMap with the pipeline code) |
 | Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`|
 | Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`|
 
@@ -110,7 +113,7 @@ Verify pods:
 kubectl get pods -n vllm-semantic-router-system
 ```
 
-You should see `prometheus-...` and `grafana-...` pods in `Running` state.
+You should see `prometheus-...`, `grafana-...`, and `semantic-router-dashboard-...` pods in `Running` state.
 
 ### 5.3. Integration with the core deployment
 
@@ -133,9 +136,11 @@ You should see `prometheus-...` and `grafana-...` pods in `Running` state.
   ```bash
   kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
   kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
+  kubectl port-forward svc/semantic-router-dashboard 8700:80 -n vllm-semantic-router-system
+  kubectl port-forward svc/openwebui 3001:8080 -n vllm-semantic-router-system
   ```
 
-  Prometheus → http://localhost:9090, Grafana → http://localhost:3000
+  Prometheus → http://localhost:9090, Grafana → http://localhost:3000, Dashboard → http://localhost:8700, Open WebUI → http://localhost:3001
 
 - **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.
 
@@ -145,6 +150,7 @@ You should see `prometheus-...` and `grafana-...` pods in `Running` state.
 2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
 3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
 4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
+5.Playground: open Open WebUI (port-forward or ingress), select the `vllm-semantic-router/auto` model (from the Manifold pipeline), and send prompts. The Dashboard Monitoring page should reflect traffic, and the pipeline will display VSR decision headers inline.
    - Prompt Category counts
    - Token usage rate per model
    - Routing modifications between models

@@ -0,0 +1,14 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: semantic-router-dashboard-config
+  labels:
+    app: semantic-router-dashboard
+    app.kubernetes.io/part-of: semantic-router
+    app.kubernetes.io/component: observability
+data:
+  TARGET_GRAFANA_URL: http://grafana.vllm-semantic-router-system.svc.cluster.local:3000
+  TARGET_PROMETHEUS_URL: http://prometheus.vllm-semantic-router-system.svc.cluster.local:9090
+  TARGET_ROUTER_API_URL: http://semantic-router.vllm-semantic-router-system.svc.cluster.local:8080
+  TARGET_ROUTER_METRICS_URL: http://semantic-router-metrics.vllm-semantic-router-system.svc.cluster.local:9190/metrics
+  TARGET_OPENWEBUI_URL: http://openwebui.vllm-semantic-router-system.svc.cluster.local:8080
@@ -0,0 +1,60 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: semantic-router-dashboard
+  labels:
+    app: semantic-router-dashboard
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: semantic-router-dashboard
+  template:
+    metadata:
+      labels:
+        app: semantic-router-dashboard
+    spec:
+      containers:
+        - name: dashboard
+          image: ghcr.io/vllm-project/semantic-router/dashboard:latest
+          imagePullPolicy: IfNotPresent
+          args: ["-port=8700", "-static=/app/frontend", "-config=/app/config/config.yaml"]
+          env:
+            - name: TARGET_GRAFANA_URL
+              valueFrom:
+                configMapKeyRef:
+                  name: semantic-router-dashboard-config
+                  key: TARGET_GRAFANA_URL
+            - name: TARGET_PROMETHEUS_URL
+              valueFrom:
+                configMapKeyRef:
+                  name: semantic-router-dashboard-config
+                  key: TARGET_PROMETHEUS_URL
+            - name: TARGET_ROUTER_API_URL
+              valueFrom:
+                configMapKeyRef:
+                  name: semantic-router-dashboard-config
+                  key: TARGET_ROUTER_API_URL
+            - name: TARGET_ROUTER_METRICS_URL
+              valueFrom:
+                configMapKeyRef:
+                  name: semantic-router-dashboard-config
+                  key: TARGET_ROUTER_METRICS_URL
+            - name: TARGET_OPENWEBUI_URL
+              valueFrom:
+                configMapKeyRef:
+                  name: semantic-router-dashboard-config
+                  key: TARGET_OPENWEBUI_URL
+            - name: ROUTER_CONFIG_PATH
+              value: /app/config/config.yaml
+          ports:
+            - name: http
+              containerPort: 8700
+          volumeMounts:
+            - name: router-config
+              mountPath: /app/config
+              readOnly: true
+      volumes:
+        - name: router-config
+          configMap:
+            name: semantic-router-config
@@ -0,0 +1,14 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: semantic-router-dashboard
+  labels:
+    app: semantic-router-dashboard
+spec:
+  type: ClusterIP
+  selector:
+    app: semantic-router-dashboard
+  ports:
+    - name: http
+      port: 80
+      targetPort: http
@@ -51,3 +51,59 @@ spec:
                 name: prometheus
                 port:
                   name: http
+
+---
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: dashboard
+  labels:
+    app: semantic-router-dashboard
+  annotations:
+    kubernetes.io/ingress.class: nginx
+    nginx.ingress.kubernetes.io/backend-protocol: HTTP
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+spec:
+  tls:
+    - hosts:
+        - dashboard.example.com
+      secretName: dashboard-tls
+  rules:
+    - host: dashboard.example.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: semantic-router-dashboard
+                port:
+                  name: http
+
+---
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: openwebui
+  labels:
+    app: openwebui
+  annotations:
+    kubernetes.io/ingress.class: nginx
+    nginx.ingress.kubernetes.io/backend-protocol: HTTP
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+spec:
+  tls:
+    - hosts:
+        - openwebui.example.com
+      secretName: openwebui-tls
+  rules:
+    - host: openwebui.example.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: openwebui
+                port:
+                  name: http
@@ -19,4 +19,18 @@ resources:
   - grafana/configmap-dashboard.yaml
   - grafana/deployment.yaml
   - grafana/service.yaml
+  - dashboard/configmap.yaml
+  - dashboard/deployment.yaml
+  - dashboard/service.yaml
+  - pipelines/deployment.yaml
+  - openwebui/deployment.yaml
   - ingress.yaml
+
+# Generate ConfigMaps from source files
+generatorOptions:
+  disableNameSuffixHash: true
+
+configMapGenerator:
+  - name: openwebui-pipelines-config
+    files:
+      - vllm_semantic_router_pipe.py=pipelines/vllm_semantic_router_pipe.py