Skip to content

Commit 4084632

Browse files
yossiovadiaclaude
authored andcommitted
Openshift observability (vllm-project#381)
* feat(openshift): add observability stack (Prometheus + Grafana) Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): simplify for demo - strict PII, 2 categories, better model names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]> * feat(grafana): relabel auto→semantic-router and update dashboard title - Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): restore 15 categories for richer demo experience - Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): revert to Model-A and Model-B naming - Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(demo): enhance test script with Model-B traffic and better coverage - Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix(grafana): prevent refusal rate panels from showing stale data Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]> * feat: enable observability by default with HTTPS and improved dashboard layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit markdown formatting fixes - Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]> * fix: update deployment output URLs to HTTPS and correct demo script path Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent 59fe123 commit 4084632

19 files changed

+2730
-4
lines changed

deploy/openshift/config-openshift.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,14 @@ model_config:
4747
reasoning_family: "qwen3" # This model uses Qwen reasoning syntax
4848
preferred_endpoints: ["model-a-endpoint"]
4949
pii_policy:
50-
allow_by_default: false # Strict PII blocking model
50+
allow_by_default: false # Strict PII blocking
5151
pii_types_allowed: ["EMAIL_ADDRESS"] # Only allow emails
5252
"Model-B":
5353
reasoning_family: "qwen3" # This model uses Qwen reasoning syntax
5454
preferred_endpoints: ["model-b-endpoint"]
5555
pii_policy:
56-
allow_by_default: true # Permissive PII model for safe routing
57-
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER", "US_SSN", "CREDIT_CARD"]
56+
allow_by_default: false # Strict PII blocking (changed from true)
57+
pii_types_allowed: ["EMAIL_ADDRESS"] # Only allow emails (same as Model-A)
5858

5959
# Classifier configuration
6060
classifier:
@@ -71,7 +71,7 @@ classifier:
7171
use_cpu: true
7272
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
7373

74-
# Categories with new use_reasoning field structure
74+
# Categories - Full set of 15 categories for rich classification demo
7575
categories:
7676
- name: business
7777
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."

0 commit comments

Comments
 (0)