Openshift openweb UI integration #383

yossiovadia · 2025-10-09T20:34:18Z

Adding automatic way to dpeloy openshift alongside to the semantic-router pod , under same namespace.

➜ openshift git:(openshift-observability) oc get pods -n vllm-semantic-router-system
NAME READY STATUS RESTARTS AGE
grafana-ff4df9ffc-qzmll 1/1 Running 0 99m
llm-katan-1-build 0/1 Completed 0 106m
llm-katan-2-build 0/1 Completed 0 102m
openwebui-8db67977b-tlvrz 1/1 Running 0 111s
prometheus-5bd5bc7788-z2j6k 1/1 Running 0 99m
semantic-router-6647fccd6c-cnm4j 4/4 Running 0 102m

it also automatically configures the openwebUI with the right semantic-router endpoint ( e.g http://semantic-router.vllm-semantic-router-system.svc.cluster.local:8801/v1 )

has install and uninstall scripts.

Release Notes: No

Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]>

…del names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]>

- Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]>

…rd layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]>

- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>

Signed-off-by: Yossi Ovadia <[email protected]>

Add complete OpenWebUI deployment for OpenShift integration: - OpenWebUI deployment manifests with OpenShift security contexts - Automated deployment script with prerequisite validation - Safe uninstall script with single confirmation prompt - Internal service discovery (no hardcoded URLs) - Integration with Envoy proxy for model load balancing - Persistent storage for user data and configurations - HTTPS external access via OpenShift routes - Support for auto, Model-A, and Model-B endpoints Files added: - deploy/openshift/openwebui/deployment.yaml - deploy/openshift/openwebui/service.yaml - deploy/openshift/openwebui/route.yaml - deploy/openshift/openwebui/pvc.yaml - deploy/openshift/openwebui/kustomization.yaml - deploy/openshift/openwebui/deploy-openwebui-on-openshift.sh - deploy/openshift/openwebui/uninstall-openwebui.sh - deploy/openshift/openwebui/README.md Features: - Zero-config setup with automatic model discovery - OpenShift-compatible security contexts - Rich user feedback with colored output - Complete validation and connectivity testing - Safe cleanup with data preservation options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

netlify · 2025-10-09T20:35:54Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`08ce5c1`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e81cbbf55c6200089c7b44
😎 Deploy Preview	https://deploy-preview-383--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-09T20:36:22Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/openshift/openwebui/README.md
deploy/openshift/openwebui/deploy-openwebui-on-openshift.sh
deploy/openshift/openwebui/deployment.yaml
deploy/openshift/openwebui/kustomization.yaml
deploy/openshift/openwebui/pvc.yaml
deploy/openshift/openwebui/route.yaml
deploy/openshift/openwebui/service.yaml
deploy/openshift/openwebui/uninstall-openwebui.sh

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-09T20:38:22Z

@yossiovadia it looks among these commits, the open webui commit 486fa9b is new, the others are already merged in #381

yossiovadia and others added 12 commits October 9, 2025 08:51

fix: apply pre-commit markdown formatting fixes

dfc95da

- Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]>

fix: update deployment output URLs to HTTPS and correct demo script path

6125b5a

Signed-off-by: Yossi Ovadia <[email protected]>

fix: apply pre-commit markdown formatting fixes

b9d6cdd

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia requested review from Xunzhuo and rootfs as code owners October 9, 2025 20:34

Merge branch 'main' into openshift-openwebUI-integration

08ce5c1

github-actions bot assigned rootfs and Xunzhuo Oct 9, 2025

yossiovadia closed this Oct 9, 2025

yossiovadia deleted the openshift-openwebUI-integration branch October 9, 2025 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Openshift openweb UI integration #383

Openshift openweb UI integration #383

Uh oh!

yossiovadia commented Oct 9, 2025

Uh oh!

netlify bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

rootfs commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Openshift openweb UI integration #383

Openshift openweb UI integration #383

Uh oh!

Conversation

yossiovadia commented Oct 9, 2025

Uh oh!

netlify bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 9, 2025

👥 vLLM Semantic Team Notification

📁 deploy

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Oct 9, 2025 •

edited

Loading

📁 `deploy`