Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions deploy/docker-compose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,20 @@ Example mappings:
## Profiles

- `testing` : enables `mock-vllm` and `llm-katan`
- `llm-katan` : enables only `llm-katan`
- `llm-katan` : only `llm-katan`

## Services and Ports

These host ports are exposed when you bring the stack up:

- Dashboard: http://localhost:8700 (Semantic Router Dashboard)
- Envoy proxy: http://localhost:8801
- Envoy admin: http://localhost:19000
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Open WebUI: http://localhost:3001
- Mock vLLM (testing profile): http://localhost:8000
- LLM Katan (testing/llm-katan profiles): http://localhost:8002

## Quick Start

Expand Down Expand Up @@ -71,6 +84,8 @@ docker compose -f deploy/docker-compose/docker-compose.yml --profile testing up
docker compose -f deploy/docker-compose/docker-compose.yml down
```

After the stack is healthy, open the Dashboard at http://localhost:8700.

## Overrides

You can place a `docker-compose.override.yml` at repo root and combine:
Expand Down Expand Up @@ -130,18 +145,3 @@ All services join the `semantic-network` bridge network with a fixed subnet to m

- Local observability only: `tools/observability/docker-compose.obs.yml`
- Tracing stack: `tools/tracing/docker-compose.tracing.yaml`

## Related Stacks

- Local observability only: `tools/observability/docker-compose.obs.yml`
- Tracing stack (standalone, dev): `tools/tracing/docker-compose.tracing.yaml`

## Tracing & Grafana

- Jaeger UI: http://localhost:16686
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus datasource (default) for metrics
- Jaeger datasource for exploring traces (search service `vllm-semantic-router`)

By default, the router container uses `config/config.tracing.yaml` (enabled tracing, exporter to Jaeger).
Override with `CONFIG_FILE=/app/config/config.yaml` if you don’t want tracing.
68 changes: 53 additions & 15 deletions deploy/kubernetes/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Semantic Router Kubernetes Deployment

This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize.
Kustomize manifests for deploying the Semantic Router and its observability stack (Prometheus, Grafana, Dashboard, optional Open WebUI + Pipelines) on Kubernetes.

## Architecture

Expand All @@ -12,8 +12,9 @@ The deployment consists of:
- **Init Container**: Downloads/copies model files to persistent volume
- **Main Container**: Runs the semantic router service
- **Services**:
- Main service exposing gRPC port (50051), Classification API (8080), and metrics port (9190)
- Separate metrics service for monitoring
- Main service exposing gRPC (50051), Classification API (8080), and metrics (9190)
- Separate metrics service for monitoring (`semantic-router-metrics`)
- Observability services (Grafana, Prometheus, Dashboard, optional Open WebUI)

## Ports

Expand All @@ -23,17 +24,40 @@ The deployment consists of:

## Quick Start

### Standard Kubernetes Deployment
### Deploy Core (Router)

```bash
kubectl apply -k deploy/kubernetes/

# Check deployment status
kubectl get pods -l app=semantic-router -n semantic-router
kubectl get services -l app=semantic-router -n semantic-router
kubectl get pods -l app=semantic-router -n vllm-semantic-router-system
kubectl get services -l app=semantic-router -n vllm-semantic-router-system

# View logs
kubectl logs -l app=semantic-router -n semantic-router -f
kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f

### Add Observability (Prometheus + Grafana + Dashboard + Playground)

```bash
kubectl apply -k deploy/kubernetes/observability/
```

Port-forward to UIs (local dev):

```bash
kubectl port-forward -n vllm-semantic-router-system svc/prometheus 9090:9090
kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
```

Then open:

- Prometheus → http://localhost:9090
- Grafana → http://localhost:3000
- Dashboard → http://localhost:8700
- Open WebUI (Playground) → http://localhost:3001

```

### Kind (Kubernetes in Docker) Deployment
Expand Down Expand Up @@ -86,20 +110,20 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
kubectl apply -k deploy/kubernetes/

# Wait for deployment to be ready
kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semantic-router-system --timeout=600s
```

**Step 3: Check deployment status**

```bash
# Check pods
kubectl get pods -n semantic-router -o wide
kubectl get pods -n vllm-semantic-router-system -o wide

# Check services
kubectl get services -n semantic-router
kubectl get services -n vllm-semantic-router-system

# View logs
kubectl logs -l app=semantic-router -n semantic-router -f
kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
```

#### Resource Requirements for Kind
Expand Down Expand Up @@ -131,19 +155,30 @@ make port-forward-grpc

# Access metrics
make port-forward-metrics

# Access Dashboard / Grafana / Open WebUI
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
```

Or using kubectl directly:

```bash
# Access Classification API (HTTP REST)
kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080

# Access gRPC API
kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 50051:50051

# Access metrics
kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-metrics 9190:9190

# Access Prometheus/Grafana/Dashboard/Open WebUI
kubectl port-forward -n vllm-semantic-router-system svc/prometheus 9090:9090
kubectl port-forward -n vllm-semantic-router-system svc/grafana 3000:3000
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-dashboard 8700:80
kubectl port-forward -n vllm-semantic-router-system svc/openwebui 3001:8080
```

#### Testing the Deployment
Expand Down Expand Up @@ -313,7 +348,10 @@ Edit the `resources` section in `deployment.yaml` accordingly.
- `namespace.yaml` - Dedicated namespace for the application
- `config.yaml` - Application configuration
- `tools_db.json` - Tools database for semantic routing
- `kustomization.yaml` - Kustomize configuration for easy deployment
- `kustomization.yaml` - Kustomize configuration for core deployment
- `observability/` - Prometheus, Grafana, Dashboard, optional Open WebUI + Pipelines (with its own `kustomization.yaml`)

For detailed observability setup and screenshots, see `deploy/kubernetes/observability/README.md`.

### Development Tools

Expand Down
10 changes: 8 additions & 2 deletions deploy/kubernetes/observability/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ This guide adds a production-ready Prometheus + Grafana stack to the existing Se
|--------------|---------|-----------|
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
| Dashboard | Unified UI that links Router, Prometheus, and embeds Grafana; reads Router config | `dashboard/` (`configmap.yaml`, `deployment.yaml`, `service.yaml`)|
| Open WebUI | Playground UI for interacting with the router via a Manifold Pipeline | `openwebui/` (`deployment.yaml`, `service.yaml`)|
| Pipelines | Executes the `vllm_semantic_router_pipe.py` manifold for Open WebUI | `pipelines/deployment.yaml` (includes a ConfigMap with the pipeline code) |
| Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`|
| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`|

Expand Down Expand Up @@ -110,7 +113,7 @@ Verify pods:
kubectl get pods -n vllm-semantic-router-system
```

You should see `prometheus-...` and `grafana-...` pods in `Running` state.
You should see `prometheus-...`, `grafana-...`, and `semantic-router-dashboard-...` pods in `Running` state.

### 5.3. Integration with the core deployment

Expand All @@ -133,9 +136,11 @@ You should see `prometheus-...` and `grafana-...` pods in `Running` state.
```bash
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
kubectl port-forward svc/semantic-router-dashboard 8700:80 -n vllm-semantic-router-system
kubectl port-forward svc/openwebui 3001:8080 -n vllm-semantic-router-system
```

Prometheus → http://localhost:9090, Grafana → http://localhost:3000
Prometheus → http://localhost:9090, Grafana → http://localhost:3000, Dashboard → http://localhost:8700, Open WebUI → http://localhost:3001

- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.

Expand All @@ -145,6 +150,7 @@ You should see `prometheus-...` and `grafana-...` pods in `Running` state.
2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
5.Playground: open Open WebUI (port-forward or ingress), select the `vllm-semantic-router/auto` model (from the Manifold pipeline), and send prompts. The Dashboard Monitoring page should reflect traffic, and the pipeline will display VSR decision headers inline.
- Prompt Category counts
- Token usage rate per model
- Routing modifications between models
Expand Down
14 changes: 14 additions & 0 deletions deploy/kubernetes/observability/dashboard/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: semantic-router-dashboard-config
labels:
app: semantic-router-dashboard
app.kubernetes.io/part-of: semantic-router
app.kubernetes.io/component: observability
data:
TARGET_GRAFANA_URL: http://grafana.vllm-semantic-router-system.svc.cluster.local:3000
TARGET_PROMETHEUS_URL: http://prometheus.vllm-semantic-router-system.svc.cluster.local:9090
TARGET_ROUTER_API_URL: http://semantic-router.vllm-semantic-router-system.svc.cluster.local:8080
TARGET_ROUTER_METRICS_URL: http://semantic-router-metrics.vllm-semantic-router-system.svc.cluster.local:9190/metrics
TARGET_OPENWEBUI_URL: http://openwebui.vllm-semantic-router-system.svc.cluster.local:8080
60 changes: 60 additions & 0 deletions deploy/kubernetes/observability/dashboard/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: semantic-router-dashboard
labels:
app: semantic-router-dashboard
spec:
replicas: 1
selector:
matchLabels:
app: semantic-router-dashboard
template:
metadata:
labels:
app: semantic-router-dashboard
spec:
containers:
- name: dashboard
image: ghcr.io/vllm-project/semantic-router/dashboard:latest
imagePullPolicy: IfNotPresent
args: ["-port=8700", "-static=/app/frontend", "-config=/app/config/config.yaml"]
env:
- name: TARGET_GRAFANA_URL
valueFrom:
configMapKeyRef:
name: semantic-router-dashboard-config
key: TARGET_GRAFANA_URL
- name: TARGET_PROMETHEUS_URL
valueFrom:
configMapKeyRef:
name: semantic-router-dashboard-config
key: TARGET_PROMETHEUS_URL
- name: TARGET_ROUTER_API_URL
valueFrom:
configMapKeyRef:
name: semantic-router-dashboard-config
key: TARGET_ROUTER_API_URL
- name: TARGET_ROUTER_METRICS_URL
valueFrom:
configMapKeyRef:
name: semantic-router-dashboard-config
key: TARGET_ROUTER_METRICS_URL
- name: TARGET_OPENWEBUI_URL
valueFrom:
configMapKeyRef:
name: semantic-router-dashboard-config
key: TARGET_OPENWEBUI_URL
- name: ROUTER_CONFIG_PATH
value: /app/config/config.yaml
ports:
- name: http
containerPort: 8700
volumeMounts:
- name: router-config
mountPath: /app/config
readOnly: true
volumes:
- name: router-config
configMap:
name: semantic-router-config
14 changes: 14 additions & 0 deletions deploy/kubernetes/observability/dashboard/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: semantic-router-dashboard
labels:
app: semantic-router-dashboard
spec:
type: ClusterIP
selector:
app: semantic-router-dashboard
ports:
- name: http
port: 80
targetPort: http
56 changes: 56 additions & 0 deletions deploy/kubernetes/observability/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,59 @@ spec:
name: prometheus
port:
name: http

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dashboard
labels:
app: semantic-router-dashboard
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- dashboard.example.com
secretName: dashboard-tls
rules:
- host: dashboard.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: semantic-router-dashboard
port:
name: http

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openwebui
labels:
app: openwebui
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- openwebui.example.com
secretName: openwebui-tls
rules:
- host: openwebui.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openwebui
port:
name: http
14 changes: 14 additions & 0 deletions deploy/kubernetes/observability/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,18 @@ resources:
- grafana/configmap-dashboard.yaml
- grafana/deployment.yaml
- grafana/service.yaml
- dashboard/configmap.yaml
- dashboard/deployment.yaml
- dashboard/service.yaml
- pipelines/deployment.yaml
- openwebui/deployment.yaml
- ingress.yaml

# Generate ConfigMaps from source files
generatorOptions:
disableNameSuffixHash: true

configMapGenerator:
- name: openwebui-pipelines-config
files:
- vllm_semantic_router_pipe.py=pipelines/vllm_semantic_router_pipe.py
Loading
Loading