Openshift observability #381

yossiovadia · 2025-10-09T19:41:00Z

Adding observabiloity to the openshift depyloment
it is part of the existing deploy-to-openshift script

Metrics:
├─ llm_category_classifications_count{category="math"}
├─ llm_model_routing_modifications_total{source="auto",target="Model-A"}
├─ llm_request_errors_total{reason="pii_policy_denied"}
├─ llm_request_errors_total{reason="jailbreak_block"}
├─ llm_reasoning_decisions_total{enabled="true",effort="high"}
├─ llm_model_completion_latency_seconds
├─ llm_model_ttft_seconds
├─ llm_model_tpot_seconds
└─ llm_model_cost_total{currency="USD"}

Sample output:

Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]>

…del names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]>

- Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]>

…rd layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]>

- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>

Signed-off-by: Yossi Ovadia <[email protected]>

netlify · 2025-10-09T19:41:07Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`6125b5a`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e80fcf51412b000774213d
😎 Deploy Preview	https://deploy-preview-381--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-09T19:41:12Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/openshift/demo-routing-test.sh
deploy/openshift/observability/README.md
deploy/openshift/observability/grafana/configmap-dashboard.yaml
deploy/openshift/observability/grafana/configmap-datasource.yaml
deploy/openshift/observability/grafana/configmap-provisioning.yaml
deploy/openshift/observability/grafana/deployment.yaml
deploy/openshift/observability/grafana/pvc.yaml
deploy/openshift/observability/grafana/route.yaml
deploy/openshift/observability/grafana/secret.yaml
deploy/openshift/observability/grafana/service.yaml
deploy/openshift/observability/kustomization.yaml
deploy/openshift/observability/prometheus/configmap.yaml
deploy/openshift/observability/prometheus/deployment.yaml
deploy/openshift/observability/prometheus/pvc.yaml
deploy/openshift/observability/prometheus/rbac.yaml
deploy/openshift/observability/prometheus/route.yaml
deploy/openshift/observability/prometheus/service.yaml
deploy/openshift/config-openshift.yaml
deploy/openshift/deploy-to-openshift.sh

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

yossiovadia · 2025-10-09T19:45:55Z

➜ openshift git:(openshift-observability) ✗ oc get pods -n vllm-semantic-router-system
NAME READY STATUS RESTARTS AGE
grafana-ff4df9ffc-qzmll 1/1 Running 0 84m
llm-katan-1-build 0/1 Completed 0 92m
llm-katan-2-build 0/1 Completed 0 87m
prometheus-5bd5bc7788-z2j6k 1/1 Running 0 84m
semantic-router-6647fccd6c-cnm4j 4/4 Running 0 87m

Copilot

Pull Request Overview

This PR adds comprehensive observability capabilities to the OpenShift deployment of the semantic router, providing monitoring and visualization through Prometheus and Grafana. The implementation includes a complete observability stack with pre-configured dashboards for tracking LLM metrics, model routing, PII protection, and jailbreak detection.

Adds full Prometheus and Grafana observability stack for semantic router monitoring
Implements comprehensive Grafana dashboard with 12 panels tracking LLM metrics
Enhances deployment script with observability-only and cleanup-observability options
Includes demo testing script for validating all observability scenarios

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
deploy/openshift/observability/	Complete observability stack with Prometheus, Grafana configs and RBAC
deploy/openshift/deploy-to-openshift.sh	Enhanced with observability deployment and management functions
deploy/openshift/demo-routing-test.sh	New comprehensive test script for observability validation
deploy/openshift/config-openshift.yaml	Updated PII policy configuration for consistent security testing
deploy/openshift/observability/README.md	Detailed documentation for observability stack usage

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-09T20:13:02Z

deploy/openshift/deploy-to-openshift.sh

+            --with-observability)
+                WITH_OBSERVABILITY="true"
+                shift
+                ;;


The --with-observability option is defined but not documented in the usage function, while --no-observability is documented but not implemented. This creates inconsistent command-line interface behavior.

Copilot · 2025-10-09T20:13:02Z

deploy/openshift/deploy-to-openshift.sh

 DRY_RUN="false"
 PORT_FORWARD="false"
 PORT_FORWARD_PORTS="8080:8080 8000:8000 8001:8001 50051:50051 8801:8801 19000:19000"
+WITH_OBSERVABILITY="true"


The default value for WITH_OBSERVABILITY is set to 'true' but there's no --no-observability option implementation to disable it, creating a potential inconsistency with the documented --no-observability flag.

Copilot · 2025-10-09T20:13:03Z

deploy/openshift/observability/grafana/secret.yaml

+type: Opaque
+stringData:
+  admin-user: admin
+  admin-password: admin


Using default credentials 'admin/admin' in production deployments poses a security risk. Consider using stronger default passwords or generating random credentials during deployment.

Suggested change

admin-password: admin

# WARNING: Replace the following password with a strong, unique value for production deployments!

admin-password: "A7f9kL2pQ8zX5vB1cD3eR6tY0wS4uH9j"

* feat(openshift): add observability stack (Prometheus + Grafana) Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): simplify for demo - strict PII, 2 categories, better model names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]> * feat(grafana): relabel auto→semantic-router and update dashboard title - Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): restore 15 categories for richer demo experience - Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): revert to Model-A and Model-B naming - Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(demo): enhance test script with Model-B traffic and better coverage - Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix(grafana): prevent refusal rate panels from showing stale data Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]> * feat: enable observability by default with HTTPS and improved dashboard layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit markdown formatting fixes - Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]> * fix: update deployment output URLs to HTTPS and correct demo script path Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>

yossiovadia and others added 10 commits October 9, 2025 08:51

fix: apply pre-commit markdown formatting fixes

dfc95da

- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>

fix: update deployment output URLs to HTTPS and correct demo script path

6125b5a

Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia requested review from Xunzhuo and rootfs as code owners October 9, 2025 19:41

github-actions bot assigned rootfs and Xunzhuo Oct 9, 2025

rootfs requested a review from Copilot October 9, 2025 20:11

Copilot AI reviewed Oct 9, 2025

View reviewed changes

rootfs approved these changes Oct 9, 2025

View reviewed changes

rootfs merged commit b7f5c61 into vllm-project:main Oct 9, 2025
9 checks passed

rootfs mentioned this pull request Oct 9, 2025

Openshift openweb UI integration #383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Openshift observability #381

Openshift observability #381

Uh oh!

yossiovadia commented Oct 9, 2025

Uh oh!

netlify bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

yossiovadia commented Oct 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 9, 2025

Uh oh!

Copilot AI Oct 9, 2025

Uh oh!

Copilot AI Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	admin-password: admin
	# WARNING: Replace the following password with a strong, unique value for production deployments!
	admin-password: "A7f9kL2pQ8zX5vB1cD3eR6tY0wS4uH9j"

Uh oh!

Openshift observability #381

Openshift observability #381

Uh oh!

Conversation

yossiovadia commented Oct 9, 2025

Uh oh!

netlify bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 9, 2025

👥 vLLM Semantic Team Notification

📁 deploy

🎉 Thanks for your contributions!

Uh oh!

yossiovadia commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Oct 9, 2025 •

edited

Loading

📁 `deploy`

yossiovadia commented Oct 9, 2025 •

edited

Loading