Skip to content

Conversation

@yossiovadia
Copy link
Collaborator

Adding observabiloity to the openshift depyloment
it is part of the existing deploy-to-openshift script

Metrics:
├─ llm_category_classifications_count{category="math"}
├─ llm_model_routing_modifications_total{source="auto",target="Model-A"}
├─ llm_request_errors_total{reason="pii_policy_denied"}
├─ llm_request_errors_total{reason="jailbreak_block"}
├─ llm_reasoning_decisions_total{enabled="true",effort="high"}
├─ llm_model_completion_latency_seconds
├─ llm_model_ttft_seconds
├─ llm_model_tpot_seconds
└─ llm_model_cost_total{currency="USD"}

Sample output:

image

yossiovadia and others added 10 commits October 9, 2025 08:51
Add comprehensive observability monitoring for OpenShift deployments including:
- Prometheus for metrics collection with 15-day retention
- Grafana with pre-configured LLM Router dashboard
- Model routing tracking (auto -> Model-A/B selection)
- PII protection monitoring (violations by type)
- Jailbreak detection and blocking metrics
- Performance metrics (TTFT, TPOT, latency, tokens, cost)

New deployment flags:
- --with-observability: Deploy observability with semantic-router
- --observability-only: Deploy only observability stack
- --cleanup-observability: Remove only observability components

All manifests under deploy/openshift/observability/ with kustomize support.
OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped).

Dashboard includes 12 panels tracking:
- Prompt categories
- Model routing rate (source -> target)
- PII/Jailbreak refusal rates by model
- Token usage, latency percentiles, costs
- Security effectiveness (combined refusal %)

Resolves monitoring requirements for model selection visibility and
content safety tracking in OpenShift environments.

Signed-off-by: Yossi Ovadia <[email protected]>
…del names

Changes for cleaner observability demo:

PII Policy:
- Both models now strict (allow_by_default: false)
- Only EMAIL_ADDRESS allowed for both coding-model and general-model
- Makes PII violations easier to demonstrate consistently

Model Renaming:
- Model-A → coding-model (optimized for code/algorithms)
- Model-B → general-model (general knowledge/business)
- More intuitive names for demo purposes

Categories Simplified (15 → 2):
- coding: routes to coding-model (score 0.95, reasoning enabled)
- general: routes to general-model (score 0.9)
- Clearer routing behavior for demonstrations

This configuration makes it easier to demonstrate:
1. Model routing based on category classification
2. PII detection and blocking (both models strict)
3. Jailbreak protection
4. Observability metrics in Grafana

No Go code changes - config-only updates.

Signed-off-by: Yossi Ovadia <[email protected]>
- Add label_replace() to all panels to show "auto" as "semantic-router"
- Update dashboard title to reflect new model names (coding-model, general-model)
- All metrics now display consistent model naming across panels
- Fixes confusion between "auto" routing and actual model names

Affected panels:
- Token Usage Rate by Model
- Model Routing Rate (source_model and target_model)
- Model Completion Latency (p95, p50/p90/p99)
- TTFT/TPOT by Model
- Reasoning Rate by Model
- Model Cost Rate
- Refusal Rates by Model (PII + Jailbreak)
- Refusal Rate Percentage
- Total Cost by Model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Reverted from 2 categories back to full 15 categories
- Kept model name changes: coding-model, general-model (not Model-A/B)
- Kept strict PII policy for both models (only EMAIL allowed)
- Categories now route to appropriate models:
  * coding-model: biology, chemistry, history, other, economics, math,
    physics, computer science, engineering
  * general-model: business, law, psychology, health, philosophy

This provides a much better demo showing the rich classification
capabilities, even though the classifier model needs retraining.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Changed back from coding-model/general-model to Model-A/Model-B
- Kept 15 categories for rich demo experience
- Kept strict PII policy for both models (only EMAIL allowed)
- Updated Grafana dashboard title to reflect Model-A & Model-B
- Dashboard label relabeling still shows "semantic-router" for "auto"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Moved script to deploy/openshift/ folder
- Added Model-B prompts (psychology, business, health, philosophy, law)
- Send 10 jailbreak attempts (better visibility in Grafana)
- Send 10 PII test prompts (various PII types)
- Use chat completions instead of just classification (triggers routing)
- Updated help text to reflect Model-A/Model-B naming
- All tests now send requests in parallel for better performance

This ensures both Model-A and Model-B appear in Grafana dashboards.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Issue: Refusal Rates and Refusal Rate Percentage panels kept showing
increasing values even when no traffic was present.

Root cause: rate() returns empty results when no activity in the time
window, but Grafana was showing last non-zero values or interpolating.

Fix:
- Added 'or vector(0)' to refusal rate queries to explicitly return 0
  when no errors in the time window
- Added 'or vector(1)' to denominator to prevent division by zero
- Added interval and intervalFactor parameters for better scraping

Affected panels:
- Refusal Rates by Model (time series)
- Refusal Rate Percentage by Model (bar gauge)

Now panels correctly drop to 0 when traffic stops.

Signed-off-by: Yossi Ovadia <[email protected]>
…rd layout

- Enable observability (Prometheus + Grafana) by default in deployment
- Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect
- Reorganize Grafana dashboard panels by function:
  * Semantic-router features on top (category, routing, refusal, reasoning)
  * Performance metrics in middle (latency, TTFT, TPOT, tokens)
  * Cost metrics at bottom (cost rate, total cost)
- Update deployment script help text to reflect observability enabled by default
- Fix dashboard YAML indentation for proper embedding

Signed-off-by: Yossi Ovadia <[email protected]>
- Fix blank lines around code fences
- Remove multiple consecutive blank lines
- Ensure proper spacing around lists

Signed-off-by: Yossi Ovadia <[email protected]>
@netlify
Copy link

netlify bot commented Oct 9, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 6125b5a
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68e80fcf51412b000774213d
😎 Deploy Preview https://deploy-preview-381--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/openshift/demo-routing-test.sh
  • deploy/openshift/observability/README.md
  • deploy/openshift/observability/grafana/configmap-dashboard.yaml
  • deploy/openshift/observability/grafana/configmap-datasource.yaml
  • deploy/openshift/observability/grafana/configmap-provisioning.yaml
  • deploy/openshift/observability/grafana/deployment.yaml
  • deploy/openshift/observability/grafana/pvc.yaml
  • deploy/openshift/observability/grafana/route.yaml
  • deploy/openshift/observability/grafana/secret.yaml
  • deploy/openshift/observability/grafana/service.yaml
  • deploy/openshift/observability/kustomization.yaml
  • deploy/openshift/observability/prometheus/configmap.yaml
  • deploy/openshift/observability/prometheus/deployment.yaml
  • deploy/openshift/observability/prometheus/pvc.yaml
  • deploy/openshift/observability/prometheus/rbac.yaml
  • deploy/openshift/observability/prometheus/route.yaml
  • deploy/openshift/observability/prometheus/service.yaml
  • deploy/openshift/config-openshift.yaml
  • deploy/openshift/deploy-to-openshift.sh

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@yossiovadia
Copy link
Collaborator Author

yossiovadia commented Oct 9, 2025

➜ openshift git:(openshift-observability) ✗ oc get pods -n vllm-semantic-router-system
NAME READY STATUS RESTARTS AGE
grafana-ff4df9ffc-qzmll 1/1 Running 0 84m
llm-katan-1-build 0/1 Completed 0 92m
llm-katan-2-build 0/1 Completed 0 87m
prometheus-5bd5bc7788-z2j6k 1/1 Running 0 84m
semantic-router-6647fccd6c-cnm4j 4/4 Running 0 87m

@rootfs rootfs requested a review from Copilot October 9, 2025 20:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive observability capabilities to the OpenShift deployment of the semantic router, providing monitoring and visualization through Prometheus and Grafana. The implementation includes a complete observability stack with pre-configured dashboards for tracking LLM metrics, model routing, PII protection, and jailbreak detection.

  • Adds full Prometheus and Grafana observability stack for semantic router monitoring
  • Implements comprehensive Grafana dashboard with 12 panels tracking LLM metrics
  • Enhances deployment script with observability-only and cleanup-observability options
  • Includes demo testing script for validating all observability scenarios

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deploy/openshift/observability/ Complete observability stack with Prometheus, Grafana configs and RBAC
deploy/openshift/deploy-to-openshift.sh Enhanced with observability deployment and management functions
deploy/openshift/demo-routing-test.sh New comprehensive test script for observability validation
deploy/openshift/config-openshift.yaml Updated PII policy configuration for consistent security testing
deploy/openshift/observability/README.md Detailed documentation for observability stack usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +214 to +217
--with-observability)
WITH_OBSERVABILITY="true"
shift
;;
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --with-observability option is defined but not documented in the usage function, while --no-observability is documented but not implemented. This creates inconsistent command-line interface behavior.

Copilot uses AI. Check for mistakes.
DRY_RUN="false"
PORT_FORWARD="false"
PORT_FORWARD_PORTS="8080:8080 8000:8000 8001:8001 50051:50051 8801:8801 19000:19000"
WITH_OBSERVABILITY="true"
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for WITH_OBSERVABILITY is set to 'true' but there's no --no-observability option implementation to disable it, creating a potential inconsistency with the documented --no-observability flag.

Copilot uses AI. Check for mistakes.
type: Opaque
stringData:
admin-user: admin
admin-password: admin
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using default credentials 'admin/admin' in production deployments poses a security risk. Consider using stronger default passwords or generating random credentials during deployment.

Suggested change
admin-password: admin
# WARNING: Replace the following password with a strong, unique value for production deployments!
admin-password: "A7f9kL2pQ8zX5vB1cD3eR6tY0wS4uH9j"

Copilot uses AI. Check for mistakes.
@rootfs rootfs merged commit b7f5c61 into vllm-project:main Oct 9, 2025
9 checks passed
joyful-ii-V-I pushed a commit to joyful-ii-V-I/semantic-router that referenced this pull request Oct 13, 2025
* feat(openshift): add observability stack (Prometheus + Grafana)

Add comprehensive observability monitoring for OpenShift deployments including:
- Prometheus for metrics collection with 15-day retention
- Grafana with pre-configured LLM Router dashboard
- Model routing tracking (auto -> Model-A/B selection)
- PII protection monitoring (violations by type)
- Jailbreak detection and blocking metrics
- Performance metrics (TTFT, TPOT, latency, tokens, cost)

New deployment flags:
- --with-observability: Deploy observability with semantic-router
- --observability-only: Deploy only observability stack
- --cleanup-observability: Remove only observability components

All manifests under deploy/openshift/observability/ with kustomize support.
OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped).

Dashboard includes 12 panels tracking:
- Prompt categories
- Model routing rate (source -> target)
- PII/Jailbreak refusal rates by model
- Token usage, latency percentiles, costs
- Security effectiveness (combined refusal %)

Resolves monitoring requirements for model selection visibility and
content safety tracking in OpenShift environments.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): simplify for demo - strict PII, 2 categories, better model names

Changes for cleaner observability demo:

PII Policy:
- Both models now strict (allow_by_default: false)
- Only EMAIL_ADDRESS allowed for both coding-model and general-model
- Makes PII violations easier to demonstrate consistently

Model Renaming:
- Model-A → coding-model (optimized for code/algorithms)
- Model-B → general-model (general knowledge/business)
- More intuitive names for demo purposes

Categories Simplified (15 → 2):
- coding: routes to coding-model (score 0.95, reasoning enabled)
- general: routes to general-model (score 0.9)
- Clearer routing behavior for demonstrations

This configuration makes it easier to demonstrate:
1. Model routing based on category classification
2. PII detection and blocking (both models strict)
3. Jailbreak protection
4. Observability metrics in Grafana

No Go code changes - config-only updates.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat(grafana): relabel auto→semantic-router and update dashboard title

- Add label_replace() to all panels to show "auto" as "semantic-router"
- Update dashboard title to reflect new model names (coding-model, general-model)
- All metrics now display consistent model naming across panels
- Fixes confusion between "auto" routing and actual model names

Affected panels:
- Token Usage Rate by Model
- Model Routing Rate (source_model and target_model)
- Model Completion Latency (p95, p50/p90/p99)
- TTFT/TPOT by Model
- Reasoning Rate by Model
- Model Cost Rate
- Refusal Rates by Model (PII + Jailbreak)
- Refusal Rate Percentage
- Total Cost by Model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): restore 15 categories for richer demo experience

- Reverted from 2 categories back to full 15 categories
- Kept model name changes: coding-model, general-model (not Model-A/B)
- Kept strict PII policy for both models (only EMAIL allowed)
- Categories now route to appropriate models:
  * coding-model: biology, chemistry, history, other, economics, math,
    physics, computer science, engineering
  * general-model: business, law, psychology, health, philosophy

This provides a much better demo showing the rich classification
capabilities, even though the classifier model needs retraining.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): revert to Model-A and Model-B naming

- Changed back from coding-model/general-model to Model-A/Model-B
- Kept 15 categories for rich demo experience
- Kept strict PII policy for both models (only EMAIL allowed)
- Updated Grafana dashboard title to reflect Model-A & Model-B
- Dashboard label relabeling still shows "semantic-router" for "auto"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(demo): enhance test script with Model-B traffic and better coverage

- Moved script to deploy/openshift/ folder
- Added Model-B prompts (psychology, business, health, philosophy, law)
- Send 10 jailbreak attempts (better visibility in Grafana)
- Send 10 PII test prompts (various PII types)
- Use chat completions instead of just classification (triggers routing)
- Updated help text to reflect Model-A/Model-B naming
- All tests now send requests in parallel for better performance

This ensures both Model-A and Model-B appear in Grafana dashboards.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix(grafana): prevent refusal rate panels from showing stale data

Issue: Refusal Rates and Refusal Rate Percentage panels kept showing
increasing values even when no traffic was present.

Root cause: rate() returns empty results when no activity in the time
window, but Grafana was showing last non-zero values or interpolating.

Fix:
- Added 'or vector(0)' to refusal rate queries to explicitly return 0
  when no errors in the time window
- Added 'or vector(1)' to denominator to prevent division by zero
- Added interval and intervalFactor parameters for better scraping

Affected panels:
- Refusal Rates by Model (time series)
- Refusal Rate Percentage by Model (bar gauge)

Now panels correctly drop to 0 when traffic stops.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat: enable observability by default with HTTPS and improved dashboard layout

- Enable observability (Prometheus + Grafana) by default in deployment
- Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect
- Reorganize Grafana dashboard panels by function:
  * Semantic-router features on top (category, routing, refusal, reasoning)
  * Performance metrics in middle (latency, TTFT, TPOT, tokens)
  * Cost metrics at bottom (cost rate, total cost)
- Update deployment script help text to reflect observability enabled by default
- Fix dashboard YAML indentation for proper embedding

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit markdown formatting fixes

- Fix blank lines around code fences
- Remove multiple consecutive blank lines
- Ensure proper spacing around lists

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: update deployment output URLs to HTTPS and correct demo script path

Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants