-
Couldn't load subscription status.
- Fork 267
Openshift openweb UI integration #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openshift openweb UI integration #383
Conversation
Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]>
…del names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]>
- Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Reverted from 2 categories back to full 15 categories
- Kept model name changes: coding-model, general-model (not Model-A/B)
- Kept strict PII policy for both models (only EMAIL allowed)
- Categories now route to appropriate models:
* coding-model: biology, chemistry, history, other, economics, math,
physics, computer science, engineering
* general-model: business, law, psychology, health, philosophy
This provides a much better demo showing the rich classification
capabilities, even though the classifier model needs retraining.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]>
…rd layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]>
- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Add complete OpenWebUI deployment for OpenShift integration: - OpenWebUI deployment manifests with OpenShift security contexts - Automated deployment script with prerequisite validation - Safe uninstall script with single confirmation prompt - Internal service discovery (no hardcoded URLs) - Integration with Envoy proxy for model load balancing - Persistent storage for user data and configurations - HTTPS external access via OpenShift routes - Support for auto, Model-A, and Model-B endpoints Files added: - deploy/openshift/openwebui/deployment.yaml - deploy/openshift/openwebui/service.yaml - deploy/openshift/openwebui/route.yaml - deploy/openshift/openwebui/pvc.yaml - deploy/openshift/openwebui/kustomization.yaml - deploy/openshift/openwebui/deploy-openwebui-on-openshift.sh - deploy/openshift/openwebui/uninstall-openwebui.sh - deploy/openshift/openwebui/README.md Features: - Zero-config setup with automatic model discovery - OpenShift-compatible security contexts - Rich user feedback with colored output - Complete validation and connectivity testing - Safe cleanup with data preservation options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
@yossiovadia it looks among these commits, the open webui commit 486fa9b is new, the others are already merged in #381 |

Adding automatic way to dpeloy openshift alongside to the semantic-router pod , under same namespace.
➜ openshift git:(openshift-observability) oc get pods -n vllm-semantic-router-system
NAME READY STATUS RESTARTS AGE
grafana-ff4df9ffc-qzmll 1/1 Running 0 99m
llm-katan-1-build 0/1 Completed 0 106m
llm-katan-2-build 0/1 Completed 0 102m
openwebui-8db67977b-tlvrz 1/1 Running 0 111s
prometheus-5bd5bc7788-z2j6k 1/1 Running 0 99m
semantic-router-6647fccd6c-cnm4j 4/4 Running 0 102m
it also automatically configures the openwebUI with the right semantic-router endpoint ( e.g http://semantic-router.vllm-semantic-router-system.svc.cluster.local:8801/v1 )
has install and uninstall scripts.
Release Notes: No