-
Notifications
You must be signed in to change notification settings - Fork 276
feat: add OpenShift demo scripts and documentation #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
c6b2b60
feat: add OpenShift demo scripts and documentation
yossiovadia 7d55d4a
fix(envoy): use ORIGINAL_DST cluster for dynamic routing
yossiovadia b830969
feat(demo): add reasoning showcase test to OpenShift demo
yossiovadia af87950
fix: apply markdown linting fixes to demo documentation
yossiovadia 16eed3c
Merge branch 'main' into openshift-demo
yossiovadia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| # Category to Model Mapping | ||
|
|
||
| **Configuration File:** [deploy/openshift/config-openshift.yaml](../config-openshift.yaml) | ||
|
|
||
| ## Model-A Categories (Default Model) | ||
|
|
||
| Model-A handles **9 categories** (primarily science and technical topics): | ||
|
|
||
| | Category | Score | Reasoning Enabled | Description | | ||
| |----------|-------|-------------------|-------------| | ||
| | **math** | 1.0 | ✅ Yes | Mathematics expert with step-by-step solutions | | ||
| | **economics** | 1.0 | ❌ No | Economics expert (micro, macro, policy) | | ||
| | **biology** | 0.9 | ❌ No | Biology expert (molecular, genetics, ecology) | | ||
| | **physics** | 0.7 | ✅ Yes | Physics expert with mathematical derivations | | ||
| | **history** | 0.7 | ❌ No | Historian across different periods and cultures | | ||
| | **engineering** | 0.7 | ❌ No | Engineering expert (mechanical, electrical, civil, etc.) | | ||
| | **other** | 0.7 | ❌ No | General helpful assistant (fallback) | | ||
| | **chemistry** | 0.6 | ✅ Yes | Chemistry expert with lab techniques | | ||
| | **computer science** | 0.6 | ❌ No | Computer science expert (algorithms, programming) | | ||
|
|
||
| --- | ||
|
|
||
| ## Model-B Categories | ||
|
|
||
| Model-B handles **5 categories** (primarily social sciences and humanities): | ||
|
|
||
| | Category | Score | Reasoning Enabled | Description | | ||
| |----------|-------|-------------------|-------------| | ||
| | **business** | 0.7 | ❌ No | Business consultant and strategic advisor | | ||
| | **psychology** | 0.6 | ❌ No | Psychology expert (cognitive, behavioral, mental health) | | ||
| | **health** | 0.5 | ❌ No | Health and medical information expert | | ||
| | **philosophy** | 0.5 | ❌ No | Philosophy expert (ethics, logic, metaphysics) | | ||
| | **law** | 0.4 | ❌ No | Legal expert (case law, statutory interpretation) | | ||
|
|
||
| --- | ||
|
|
||
| ## Prompts Routing (Tested & Verified) | ||
|
|
||
| These prompts have **100% classification accuracy** and route as follows: | ||
|
|
||
| | Category | Example Prompt | Routes To | Confidence | | ||
| |----------|---------------|-----------|------------| | ||
| | **Math** | "Is 17 a prime number?" | Model-B* | ~0.326 | | ||
| | **Chemistry** | "What are atoms made of?" | Model-B* | ~0.196 | | ||
| | **Chemistry** | "Explain oxidation and reduction" | Model-B* | ~0.200 | | ||
| | **Chemistry** | "Explain chemical equilibrium" | Model-B* | ~0.197 | | ||
| | **History** | "What were the main causes of World War I?" | Model-B* | ~0.218 | | ||
| | **History** | "What was the Cold War?" | Model-B* | ~0.219 | | ||
| | **Psychology** | "What is the nature vs nurture debate?" | Model-B | ~0.391 | | ||
| | **Psychology** | "What are the stages of grief?" | Model-B | ~0.403 | | ||
| | **Health** | "How to maintain a healthy lifestyle?" | Model-B | ~0.221 | | ||
| | **Health** | "What is a balanced diet?" | Model-B | ~0.268 | | ||
|
|
||
| --- | ||
|
|
||
| ## Reasoning Mode (Chain-of-Thought) | ||
|
|
||
| Categories with **reasoning enabled** use extended thinking for complex problems: | ||
|
|
||
| - ✅ **Math** (Model-A) - Step-by-step mathematical solutions | ||
| - ✅ **Chemistry** (Model-A) - Complex chemical reactions and analysis | ||
| - ✅ **Physics** (Model-A) - Mathematical derivations and proofs | ||
|
|
||
| --- | ||
|
|
||
| ## Default Behavior | ||
|
|
||
| - **Default Model:** Model-A | ||
| - **Fallback Category:** "other" (score: 0.7) | ||
| - **Unmatched queries** route to Model-A with the "other" category system prompt | ||
|
|
||
| ### Key Parameters: | ||
|
|
||
| - **name:** Category identifier | ||
| - **system_prompt:** Specialized prompt for this category | ||
| - **model_scores.model:** Target model (Model-A or Model-B) | ||
| - **model_scores.score:** Routing priority (0.0 to 1.0) | ||
| - **use_reasoning:** Enable extended thinking mode | ||
|
|
||
| --- | ||
|
|
||
| ## Confidence Scores Explained | ||
|
|
||
| **Why are confidence scores low (0.2-0.4)?** | ||
|
|
||
| 1. **Softmax across 14 categories** - Even the "winning" category may only get 20-40% probability | ||
| 2. **Relative, not absolute** - Scores are compared against other categories | ||
| 3. **Consistency matters** - Same prompt always gets same category (100% in our tests) | ||
| 4. **Highest score wins** - 0.326 for "math" means it beat all other 13 categories | ||
|
|
||
| **What's important:** | ||
|
|
||
| - ✅ Classification is **consistent** across multiple runs | ||
| - ✅ Same prompt → same category every time | ||
| - ✅ Confidence is **relative** to other categories, not absolute certainty |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,242 @@ | ||
| # Demo Scripts for Semantic Router | ||
|
|
||
| This directory contains demo scripts to showcase the semantic router capabilities. | ||
|
|
||
| ## Quick Demo Guide | ||
|
|
||
| ### 1. Live Log Viewer (Run in Terminal 1) | ||
|
|
||
| Shows real-time classification, routing, and security decisions: | ||
|
|
||
| ```bash | ||
| ./deploy/openshift/demo/live-semantic-router-logs.sh | ||
| ``` | ||
|
|
||
| **What it shows:** | ||
|
|
||
| - 📨 **Incoming requests** with user prompts | ||
| - 🛡️ **Security checks** (jailbreak detection) | ||
| - 🔍 **Classification** (category detection with confidence) | ||
| - 🎯 **Routing decisions** (which model was selected) | ||
| - 💾 **Cache hits** (semantic similarity matching) | ||
| - 🧠 **Reasoning mode** activation | ||
|
|
||
| **Tip:** Run this in a split terminal or separate window during your demo! | ||
|
|
||
| --- | ||
|
|
||
| ### 2. Interactive Demo (Run in Terminal 2) | ||
|
|
||
| Interactive menu-driven semantic router demo: | ||
|
|
||
| ```bash | ||
| python3 deploy/openshift/demo/demo-semantic-router.py | ||
| ``` | ||
|
|
||
| **Features:** | ||
|
|
||
| 1. **Single Classification** - Tests random prompt from golden set | ||
| 2. **All Classifications** - Tests all 10 golden prompts | ||
| 3. **PII Detection Test** - Tests personal information filtering | ||
| 4. **Jailbreak Detection Test** - Tests security filtering | ||
| 5. **Run All Tests** - Executes all tests sequentially | ||
|
|
||
| **Requirements:** | ||
|
|
||
| - ✅ Must be logged into OpenShift (`oc login`) | ||
| - URLs are discovered automatically from routes | ||
|
|
||
| **What it does:** | ||
|
|
||
| - Goes through Envoy (same path as OpenWebUI) | ||
| - Shows routing decisions and response previews | ||
| - **Appears in Grafana dashboard!** | ||
| - Interactive - choose what to test | ||
|
|
||
| --- | ||
|
|
||
| ## Demo Flow Suggestion | ||
|
|
||
| ### Setup (Before Demo) | ||
|
|
||
| ```bash | ||
| # Terminal 1: Start log viewer | ||
| ./deploy/openshift/demo/live-semantic-router-logs.sh | ||
|
|
||
| # Terminal 2: Ready to run classification test | ||
| # (don't run yet) | ||
|
|
||
| # Browser Tab 1: Open Grafana | ||
| # http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com | ||
|
|
||
| # Browser Tab 2: Open OpenWebUI | ||
| # http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com | ||
| ``` | ||
|
|
||
| ### During Demo | ||
|
|
||
| 1. **Show the system overview** | ||
| - Explain semantic routing concept | ||
| - Show the architecture diagram | ||
|
|
||
| 2. **Run interactive demo** (Terminal 2) | ||
|
|
||
| ```bash | ||
| python3 deploy/openshift/demo/demo-semantic-router.py | ||
| ``` | ||
|
|
||
| Choose option 2 (All Classifications) | ||
|
|
||
| 3. **Point to live logs** (Terminal 1) | ||
| - Show real-time classification | ||
| - Highlight security checks (jailbreak: BENIGN) | ||
| - Show routing decisions (Model-A vs Model-B) | ||
| - Point out cache hits | ||
|
|
||
| 4. **Switch to Grafana** (Browser Tab 1) | ||
| - Show request metrics appearing | ||
| - Show classification category distribution | ||
| - Show model usage breakdown | ||
|
|
||
| 5. **Show OpenWebUI integration** (Browser Tab 2) | ||
| - Type one of the golden prompts | ||
| - Watch it appear in logs (Terminal 1) | ||
| - Show the same routing happening | ||
|
|
||
| --- | ||
|
|
||
| ## Key Talking Points | ||
|
|
||
| ### Classification Accuracy | ||
|
|
||
| - **10 golden prompts** with 100% accuracy | ||
| - Categories: Chemistry, History, Psychology, Health, Math | ||
| - Shows consistent classification behavior | ||
|
|
||
| ### Security Features | ||
|
|
||
| - **Jailbreak detection** on every request | ||
| - Shows "BENIGN" for safe requests | ||
| - Confidence scores displayed | ||
|
|
||
| ### Smart Routing | ||
|
|
||
| - Automatic model selection based on content | ||
| - Load balancing across Model-A and Model-B | ||
| - Routing decisions visible in logs | ||
|
|
||
| ### Performance | ||
|
|
||
| - **Semantic caching** reduces latency | ||
| - Cache hits shown in logs with similarity scores | ||
| - Sub-second response times | ||
|
|
||
| ### Observability | ||
|
|
||
| - Real-time logs with structured JSON | ||
| - Grafana metrics and dashboards | ||
| - Request tracing and debugging | ||
|
|
||
| --- | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Log viewer shows no output | ||
|
|
||
| ```bash | ||
| # Check if semantic-router pod is running | ||
| oc get pods -n vllm-semantic-router-system | grep semantic-router | ||
|
|
||
| # Check logs manually | ||
| oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20 | ||
| ``` | ||
|
|
||
| ### Classification test fails | ||
|
|
||
| ```bash | ||
| # Verify Envoy route is accessible | ||
| curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models | ||
|
|
||
| # Check if models are ready | ||
| oc get pods -n vllm-semantic-router-system | ||
| ``` | ||
|
|
||
| ### Grafana doesn't show metrics | ||
|
|
||
| - Wait 15-30 seconds for metrics to appear | ||
| - Refresh the dashboard | ||
| - Check the time range (last 5 minutes) | ||
|
|
||
| --- | ||
|
|
||
| ## Cache Management | ||
|
|
||
| ### Check Cache Status | ||
|
|
||
| ```bash | ||
| ./deploy/openshift/demo/cache-management.sh status | ||
| ``` | ||
|
|
||
| Shows recent cache activity and cached queries. | ||
|
|
||
| ### Clear Cache (for demo) | ||
|
|
||
| ```bash | ||
| ./deploy/openshift/demo/cache-management.sh clear | ||
| ``` | ||
|
|
||
| Restarts semantic-router deployment to clear in-memory cache (~30 seconds). | ||
|
|
||
| ### Demo Cache Feature | ||
|
|
||
| **Workflow to show caching in action:** | ||
|
|
||
| 1. Clear the cache: | ||
|
|
||
| ```bash | ||
| ./deploy/openshift/demo/cache-management.sh clear | ||
| ``` | ||
|
|
||
| 2. Run classification test (first time - no cache): | ||
|
|
||
| ```bash | ||
| python3 deploy/openshift/demo/demo-semantic-router.py | ||
| ``` | ||
|
|
||
| Choose option 2 (All Classifications) | ||
| - Processing time: ~3-4 seconds per query | ||
| - Logs show queries going to model | ||
|
|
||
| 3. Run classification test again (second time - with cache): | ||
|
|
||
| ```bash | ||
| python3 deploy/openshift/demo/demo-semantic-router.py | ||
|
||
| ``` | ||
|
|
||
| Choose option 2 (All Classifications) again | ||
| - Processing time: ~400ms per query (10x faster!) | ||
| - Logs show "💾 CACHE HIT" for all queries | ||
| - Similarity scores ~0.99999 | ||
|
|
||
| **Key talking point:** Cache uses **semantic similarity**, not exact string matching! | ||
|
|
||
| --- | ||
|
|
||
| ## Files | ||
|
|
||
| - `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing) | ||
| - `live-classifier-logs.sh` - Classification API log viewer | ||
| - `demo-semantic-router.py` - Interactive demo with multiple test options | ||
| - `curl-examples.sh` - Quick classification examples (direct API) | ||
| - `cache-management.sh` - Cache management helper | ||
| - `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference | ||
| - `demo-classification-results.json` - Test results (auto-generated) | ||
|
|
||
| --- | ||
|
|
||
| ## Notes | ||
|
|
||
| - The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C | ||
| - Classification test takes ~60 seconds (10 prompts with 0.5s delay between each) | ||
| - All requests go through Envoy, triggering the full routing pipeline | ||
| - Grafana metrics update in real-time (with slight delay) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These references to running the demo script multiple times are correct, but the workflow description in lines 93-97 in cache-management.sh incorrectly references 'demo-classification-test.py' instead of 'demo-semantic-router.py'