-
Couldn't load subscription status.
- Fork 267
Openshift observability #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openshift observability #381
Conversation
Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]>
…del names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]>
- Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Reverted from 2 categories back to full 15 categories
- Kept model name changes: coding-model, general-model (not Model-A/B)
- Kept strict PII policy for both models (only EMAIL allowed)
- Categories now route to appropriate models:
* coding-model: biology, chemistry, history, other, economics, math,
physics, computer science, engineering
* general-model: business, law, psychology, health, philosophy
This provides a much better demo showing the rich classification
capabilities, even though the classifier model needs retraining.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]>
…rd layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]>
- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
➜ openshift git:(openshift-observability) ✗ oc get pods -n vllm-semantic-router-system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive observability capabilities to the OpenShift deployment of the semantic router, providing monitoring and visualization through Prometheus and Grafana. The implementation includes a complete observability stack with pre-configured dashboards for tracking LLM metrics, model routing, PII protection, and jailbreak detection.
- Adds full Prometheus and Grafana observability stack for semantic router monitoring
- Implements comprehensive Grafana dashboard with 12 panels tracking LLM metrics
- Enhances deployment script with observability-only and cleanup-observability options
- Includes demo testing script for validating all observability scenarios
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/openshift/observability/ | Complete observability stack with Prometheus, Grafana configs and RBAC |
| deploy/openshift/deploy-to-openshift.sh | Enhanced with observability deployment and management functions |
| deploy/openshift/demo-routing-test.sh | New comprehensive test script for observability validation |
| deploy/openshift/config-openshift.yaml | Updated PII policy configuration for consistent security testing |
| deploy/openshift/observability/README.md | Detailed documentation for observability stack usage |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| --with-observability) | ||
| WITH_OBSERVABILITY="true" | ||
| shift | ||
| ;; |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --with-observability option is defined but not documented in the usage function, while --no-observability is documented but not implemented. This creates inconsistent command-line interface behavior.
| DRY_RUN="false" | ||
| PORT_FORWARD="false" | ||
| PORT_FORWARD_PORTS="8080:8080 8000:8000 8001:8001 50051:50051 8801:8801 19000:19000" | ||
| WITH_OBSERVABILITY="true" |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for WITH_OBSERVABILITY is set to 'true' but there's no --no-observability option implementation to disable it, creating a potential inconsistency with the documented --no-observability flag.
| type: Opaque | ||
| stringData: | ||
| admin-user: admin | ||
| admin-password: admin |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using default credentials 'admin/admin' in production deployments poses a security risk. Consider using stronger default passwords or generating random credentials during deployment.
| admin-password: admin | |
| # WARNING: Replace the following password with a strong, unique value for production deployments! | |
| admin-password: "A7f9kL2pQ8zX5vB1cD3eR6tY0wS4uH9j" |
* feat(openshift): add observability stack (Prometheus + Grafana) Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): simplify for demo - strict PII, 2 categories, better model names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]> * feat(grafana): relabel auto→semantic-router and update dashboard title - Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): restore 15 categories for richer demo experience - Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): revert to Model-A and Model-B naming - Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(demo): enhance test script with Model-B traffic and better coverage - Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix(grafana): prevent refusal rate panels from showing stale data Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]> * feat: enable observability by default with HTTPS and improved dashboard layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit markdown formatting fixes - Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]> * fix: update deployment output URLs to HTTPS and correct demo script path Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>

Adding observabiloity to the openshift depyloment
it is part of the existing deploy-to-openshift script
Metrics:
├─ llm_category_classifications_count{category="math"}
├─ llm_model_routing_modifications_total{source="auto",target="Model-A"}
├─ llm_request_errors_total{reason="pii_policy_denied"}
├─ llm_request_errors_total{reason="jailbreak_block"}
├─ llm_reasoning_decisions_total{enabled="true",effort="high"}
├─ llm_model_completion_latency_seconds
├─ llm_model_ttft_seconds
├─ llm_model_tpot_seconds
└─ llm_model_cost_total{currency="USD"}
Sample output: