Skip to content

Commit c6b2b60

Browse files
yossiovadiaclaude
andcommitted
feat: add OpenShift demo scripts and documentation
Add comprehensive demo toolkit for semantic router capabilities: - Interactive demo script (demo-semantic-router.py) with menu options: - Single classification (cache demo with fixed prompt) - All classifications (10 golden prompts) - PII detection test - Jailbreak detection test - Run all tests - Live log viewers: - live-semantic-router-logs.sh: Envoy traffic with routing decisions - live-classifier-logs.sh: Classification API activity - Demo utilities: - curl-examples.sh: Quick classification examples - cache-management.sh: Cache status and clearing - Documentation: - DEMO-README.md: Complete demo guide with setup instructions - CATEGORY-MODEL-MAPPING.md: Category to model routing reference All scripts use dynamic URL discovery from OpenShift routes (requires oc login). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
1 parent 7d3a9f8 commit c6b2b60

File tree

8 files changed

+1358
-0
lines changed

8 files changed

+1358
-0
lines changed
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Category to Model Mapping
2+
3+
**Configuration File:** [deploy/openshift/config-openshift.yaml](../config-openshift.yaml)
4+
5+
## Model-A Categories (Default Model)
6+
7+
Model-A handles **9 categories** (primarily science and technical topics):
8+
9+
| Category | Score | Reasoning Enabled | Description |
10+
|----------|-------|-------------------|-------------|
11+
| **math** | 1.0 | ✅ Yes | Mathematics expert with step-by-step solutions |
12+
| **economics** | 1.0 | ❌ No | Economics expert (micro, macro, policy) |
13+
| **biology** | 0.9 | ❌ No | Biology expert (molecular, genetics, ecology) |
14+
| **physics** | 0.7 | ✅ Yes | Physics expert with mathematical derivations |
15+
| **history** | 0.7 | ❌ No | Historian across different periods and cultures |
16+
| **engineering** | 0.7 | ❌ No | Engineering expert (mechanical, electrical, civil, etc.) |
17+
| **other** | 0.7 | ❌ No | General helpful assistant (fallback) |
18+
| **chemistry** | 0.6 | ✅ Yes | Chemistry expert with lab techniques |
19+
| **computer science** | 0.6 | ❌ No | Computer science expert (algorithms, programming) |
20+
21+
---
22+
23+
## Model-B Categories
24+
25+
Model-B handles **5 categories** (primarily social sciences and humanities):
26+
27+
| Category | Score | Reasoning Enabled | Description |
28+
|----------|-------|-------------------|-------------|
29+
| **business** | 0.7 | ❌ No | Business consultant and strategic advisor |
30+
| **psychology** | 0.6 | ❌ No | Psychology expert (cognitive, behavioral, mental health) |
31+
| **health** | 0.5 | ❌ No | Health and medical information expert |
32+
| **philosophy** | 0.5 | ❌ No | Philosophy expert (ethics, logic, metaphysics) |
33+
| **law** | 0.4 | ❌ No | Legal expert (case law, statutory interpretation) |
34+
35+
---
36+
37+
## Prompts Routing (Tested & Verified)
38+
39+
These prompts have **100% classification accuracy** and route as follows:
40+
41+
| Category | Example Prompt | Routes To | Confidence |
42+
|----------|---------------|-----------|------------|
43+
| **Math** | "Is 17 a prime number?" | Model-B* | ~0.326 |
44+
| **Chemistry** | "What are atoms made of?" | Model-B* | ~0.196 |
45+
| **Chemistry** | "Explain oxidation and reduction" | Model-B* | ~0.200 |
46+
| **Chemistry** | "Explain chemical equilibrium" | Model-B* | ~0.197 |
47+
| **History** | "What were the main causes of World War I?" | Model-B* | ~0.218 |
48+
| **History** | "What was the Cold War?" | Model-B* | ~0.219 |
49+
| **Psychology** | "What is the nature vs nurture debate?" | Model-B | ~0.391 |
50+
| **Psychology** | "What are the stages of grief?" | Model-B | ~0.403 |
51+
| **Health** | "How to maintain a healthy lifestyle?" | Model-B | ~0.221 |
52+
| **Health** | "What is a balanced diet?" | Model-B | ~0.268 |
53+
54+
55+
---
56+
57+
## Reasoning Mode (Chain-of-Thought)
58+
59+
Categories with **reasoning enabled** use extended thinking for complex problems:
60+
61+
-**Math** (Model-A) - Step-by-step mathematical solutions
62+
-**Chemistry** (Model-A) - Complex chemical reactions and analysis
63+
-**Physics** (Model-A) - Mathematical derivations and proofs
64+
65+
---
66+
67+
## Default Behavior
68+
69+
- **Default Model:** Model-A
70+
- **Fallback Category:** "other" (score: 0.7)
71+
- **Unmatched queries** route to Model-A with the "other" category system prompt
72+
73+
74+
### Key Parameters:
75+
76+
- **name:** Category identifier
77+
- **system_prompt:** Specialized prompt for this category
78+
- **model_scores.model:** Target model (Model-A or Model-B)
79+
- **model_scores.score:** Routing priority (0.0 to 1.0)
80+
- **use_reasoning:** Enable extended thinking mode
81+
82+
---
83+
84+
85+
## Confidence Scores Explained
86+
87+
**Why are confidence scores low (0.2-0.4)?**
88+
89+
1. **Softmax across 14 categories** - Even the "winning" category may only get 20-40% probability
90+
2. **Relative, not absolute** - Scores are compared against other categories
91+
3. **Consistency matters** - Same prompt always gets same category (100% in our tests)
92+
4. **Highest score wins** - 0.326 for "math" means it beat all other 13 categories
93+
94+
**What's important:**
95+
- ✅ Classification is **consistent** across multiple runs
96+
- ✅ Same prompt → same category every time
97+
- ✅ Confidence is **relative** to other categories, not absolute certainty
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Demo Scripts for Semantic Router
2+
3+
This directory contains demo scripts to showcase the semantic router capabilities.
4+
5+
## Quick Demo Guide
6+
7+
### 1. Live Log Viewer (Run in Terminal 1)
8+
9+
Shows real-time classification, routing, and security decisions:
10+
11+
```bash
12+
./deploy/openshift/demo/live-semantic-router-logs.sh
13+
```
14+
15+
**What it shows:**
16+
- 📨 **Incoming requests** with user prompts
17+
- 🛡️ **Security checks** (jailbreak detection)
18+
- 🔍 **Classification** (category detection with confidence)
19+
- 🎯 **Routing decisions** (which model was selected)
20+
- 💾 **Cache hits** (semantic similarity matching)
21+
- 🧠 **Reasoning mode** activation
22+
23+
**Tip:** Run this in a split terminal or separate window during your demo!
24+
25+
---
26+
27+
### 2. Interactive Demo (Run in Terminal 2)
28+
29+
Interactive menu-driven semantic router demo:
30+
31+
```bash
32+
python3 deploy/openshift/demo/demo-semantic-router.py
33+
```
34+
35+
**Features:**
36+
1. **Single Classification** - Tests random prompt from golden set
37+
2. **All Classifications** - Tests all 10 golden prompts
38+
3. **PII Detection Test** - Tests personal information filtering
39+
4. **Jailbreak Detection Test** - Tests security filtering
40+
5. **Run All Tests** - Executes all tests sequentially
41+
42+
**Requirements:**
43+
- ✅ Must be logged into OpenShift (`oc login`)
44+
- URLs are discovered automatically from routes
45+
46+
**What it does:**
47+
- Goes through Envoy (same path as OpenWebUI)
48+
- Shows routing decisions and response previews
49+
- **Appears in Grafana dashboard!**
50+
- Interactive - choose what to test
51+
52+
---
53+
54+
## Demo Flow Suggestion
55+
56+
### Setup (Before Demo)
57+
58+
```bash
59+
# Terminal 1: Start log viewer
60+
./deploy/openshift/demo/live-semantic-router-logs.sh
61+
62+
# Terminal 2: Ready to run classification test
63+
# (don't run yet)
64+
65+
# Browser Tab 1: Open Grafana
66+
# http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
67+
68+
# Browser Tab 2: Open OpenWebUI
69+
# http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
70+
```
71+
72+
### During Demo
73+
74+
1. **Show the system overview**
75+
- Explain semantic routing concept
76+
- Show the architecture diagram
77+
78+
2. **Run interactive demo** (Terminal 2)
79+
```bash
80+
python3 deploy/openshift/demo/demo-semantic-router.py
81+
```
82+
Choose option 2 (All Classifications)
83+
84+
3. **Point to live logs** (Terminal 1)
85+
- Show real-time classification
86+
- Highlight security checks (jailbreak: BENIGN)
87+
- Show routing decisions (Model-A vs Model-B)
88+
- Point out cache hits
89+
90+
4. **Switch to Grafana** (Browser Tab 1)
91+
- Show request metrics appearing
92+
- Show classification category distribution
93+
- Show model usage breakdown
94+
95+
5. **Show OpenWebUI integration** (Browser Tab 2)
96+
- Type one of the golden prompts
97+
- Watch it appear in logs (Terminal 1)
98+
- Show the same routing happening
99+
100+
---
101+
102+
## Key Talking Points
103+
104+
### Classification Accuracy
105+
- **10 golden prompts** with 100% accuracy
106+
- Categories: Chemistry, History, Psychology, Health, Math
107+
- Shows consistent classification behavior
108+
109+
### Security Features
110+
- **Jailbreak detection** on every request
111+
- Shows "BENIGN" for safe requests
112+
- Confidence scores displayed
113+
114+
### Smart Routing
115+
- Automatic model selection based on content
116+
- Load balancing across Model-A and Model-B
117+
- Routing decisions visible in logs
118+
119+
### Performance
120+
- **Semantic caching** reduces latency
121+
- Cache hits shown in logs with similarity scores
122+
- Sub-second response times
123+
124+
### Observability
125+
- Real-time logs with structured JSON
126+
- Grafana metrics and dashboards
127+
- Request tracing and debugging
128+
129+
---
130+
131+
## Troubleshooting
132+
133+
### Log viewer shows no output
134+
```bash
135+
# Check if semantic-router pod is running
136+
oc get pods -n vllm-semantic-router-system | grep semantic-router
137+
138+
# Check logs manually
139+
oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20
140+
```
141+
142+
### Classification test fails
143+
```bash
144+
# Verify Envoy route is accessible
145+
curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models
146+
147+
# Check if models are ready
148+
oc get pods -n vllm-semantic-router-system
149+
```
150+
151+
### Grafana doesn't show metrics
152+
- Wait 15-30 seconds for metrics to appear
153+
- Refresh the dashboard
154+
- Check the time range (last 5 minutes)
155+
156+
---
157+
158+
## Cache Management
159+
160+
### Check Cache Status
161+
```bash
162+
./deploy/openshift/demo/cache-management.sh status
163+
```
164+
165+
Shows recent cache activity and cached queries.
166+
167+
### Clear Cache (for demo)
168+
```bash
169+
./deploy/openshift/demo/cache-management.sh clear
170+
```
171+
172+
Restarts semantic-router deployment to clear in-memory cache (~30 seconds).
173+
174+
### Demo Cache Feature
175+
176+
**Workflow to show caching in action:**
177+
178+
1. Clear the cache:
179+
```bash
180+
./deploy/openshift/demo/cache-management.sh clear
181+
```
182+
183+
2. Run classification test (first time - no cache):
184+
```bash
185+
python3 deploy/openshift/demo/demo-semantic-router.py
186+
```
187+
Choose option 2 (All Classifications)
188+
- Processing time: ~3-4 seconds per query
189+
- Logs show queries going to model
190+
191+
3. Run classification test again (second time - with cache):
192+
```bash
193+
python3 deploy/openshift/demo/demo-semantic-router.py
194+
```
195+
Choose option 2 (All Classifications) again
196+
- Processing time: ~400ms per query (10x faster!)
197+
- Logs show "💾 CACHE HIT" for all queries
198+
- Similarity scores ~0.99999
199+
200+
**Key talking point:** Cache uses **semantic similarity**, not exact string matching!
201+
202+
---
203+
204+
## Files
205+
206+
- `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing)
207+
- `live-classifier-logs.sh` - Classification API log viewer
208+
- `demo-semantic-router.py` - Interactive demo with multiple test options
209+
- `curl-examples.sh` - Quick classification examples (direct API)
210+
- `cache-management.sh` - Cache management helper
211+
- `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference
212+
- `demo-classification-results.json` - Test results (auto-generated)
213+
214+
---
215+
216+
## Notes
217+
218+
- The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C
219+
- Classification test takes ~60 seconds (10 prompts with 0.5s delay between each)
220+
- All requests go through Envoy, triggering the full routing pipeline
221+
- Grafana metrics update in real-time (with slight delay)

0 commit comments

Comments
 (0)