|
| 1 | +# Demo Scripts for Semantic Router |
| 2 | + |
| 3 | +This directory contains demo scripts to showcase the semantic router capabilities. |
| 4 | + |
| 5 | +## Quick Demo Guide |
| 6 | + |
| 7 | +### 1. Live Log Viewer (Run in Terminal 1) |
| 8 | + |
| 9 | +Shows real-time classification, routing, and security decisions: |
| 10 | + |
| 11 | +```bash |
| 12 | +./deploy/openshift/demo/live-semantic-router-logs.sh |
| 13 | +``` |
| 14 | + |
| 15 | +**What it shows:** |
| 16 | +- 📨 **Incoming requests** with user prompts |
| 17 | +- 🛡️ **Security checks** (jailbreak detection) |
| 18 | +- 🔍 **Classification** (category detection with confidence) |
| 19 | +- 🎯 **Routing decisions** (which model was selected) |
| 20 | +- 💾 **Cache hits** (semantic similarity matching) |
| 21 | +- 🧠 **Reasoning mode** activation |
| 22 | + |
| 23 | +**Tip:** Run this in a split terminal or separate window during your demo! |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +### 2. Interactive Demo (Run in Terminal 2) |
| 28 | + |
| 29 | +Interactive menu-driven semantic router demo: |
| 30 | + |
| 31 | +```bash |
| 32 | +python3 deploy/openshift/demo/demo-semantic-router.py |
| 33 | +``` |
| 34 | + |
| 35 | +**Features:** |
| 36 | +1. **Single Classification** - Tests random prompt from golden set |
| 37 | +2. **All Classifications** - Tests all 10 golden prompts |
| 38 | +3. **PII Detection Test** - Tests personal information filtering |
| 39 | +4. **Jailbreak Detection Test** - Tests security filtering |
| 40 | +5. **Run All Tests** - Executes all tests sequentially |
| 41 | + |
| 42 | +**Requirements:** |
| 43 | +- ✅ Must be logged into OpenShift (`oc login`) |
| 44 | +- URLs are discovered automatically from routes |
| 45 | + |
| 46 | +**What it does:** |
| 47 | +- Goes through Envoy (same path as OpenWebUI) |
| 48 | +- Shows routing decisions and response previews |
| 49 | +- **Appears in Grafana dashboard!** |
| 50 | +- Interactive - choose what to test |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Demo Flow Suggestion |
| 55 | + |
| 56 | +### Setup (Before Demo) |
| 57 | + |
| 58 | +```bash |
| 59 | +# Terminal 1: Start log viewer |
| 60 | +./deploy/openshift/demo/live-semantic-router-logs.sh |
| 61 | + |
| 62 | +# Terminal 2: Ready to run classification test |
| 63 | +# (don't run yet) |
| 64 | + |
| 65 | +# Browser Tab 1: Open Grafana |
| 66 | +# http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com |
| 67 | + |
| 68 | +# Browser Tab 2: Open OpenWebUI |
| 69 | +# http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com |
| 70 | +``` |
| 71 | + |
| 72 | +### During Demo |
| 73 | + |
| 74 | +1. **Show the system overview** |
| 75 | + - Explain semantic routing concept |
| 76 | + - Show the architecture diagram |
| 77 | + |
| 78 | +2. **Run interactive demo** (Terminal 2) |
| 79 | + ```bash |
| 80 | + python3 deploy/openshift/demo/demo-semantic-router.py |
| 81 | + ``` |
| 82 | + Choose option 2 (All Classifications) |
| 83 | + |
| 84 | +3. **Point to live logs** (Terminal 1) |
| 85 | + - Show real-time classification |
| 86 | + - Highlight security checks (jailbreak: BENIGN) |
| 87 | + - Show routing decisions (Model-A vs Model-B) |
| 88 | + - Point out cache hits |
| 89 | + |
| 90 | +4. **Switch to Grafana** (Browser Tab 1) |
| 91 | + - Show request metrics appearing |
| 92 | + - Show classification category distribution |
| 93 | + - Show model usage breakdown |
| 94 | + |
| 95 | +5. **Show OpenWebUI integration** (Browser Tab 2) |
| 96 | + - Type one of the golden prompts |
| 97 | + - Watch it appear in logs (Terminal 1) |
| 98 | + - Show the same routing happening |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Key Talking Points |
| 103 | + |
| 104 | +### Classification Accuracy |
| 105 | +- **10 golden prompts** with 100% accuracy |
| 106 | +- Categories: Chemistry, History, Psychology, Health, Math |
| 107 | +- Shows consistent classification behavior |
| 108 | + |
| 109 | +### Security Features |
| 110 | +- **Jailbreak detection** on every request |
| 111 | +- Shows "BENIGN" for safe requests |
| 112 | +- Confidence scores displayed |
| 113 | + |
| 114 | +### Smart Routing |
| 115 | +- Automatic model selection based on content |
| 116 | +- Load balancing across Model-A and Model-B |
| 117 | +- Routing decisions visible in logs |
| 118 | + |
| 119 | +### Performance |
| 120 | +- **Semantic caching** reduces latency |
| 121 | +- Cache hits shown in logs with similarity scores |
| 122 | +- Sub-second response times |
| 123 | + |
| 124 | +### Observability |
| 125 | +- Real-time logs with structured JSON |
| 126 | +- Grafana metrics and dashboards |
| 127 | +- Request tracing and debugging |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## Troubleshooting |
| 132 | + |
| 133 | +### Log viewer shows no output |
| 134 | +```bash |
| 135 | +# Check if semantic-router pod is running |
| 136 | +oc get pods -n vllm-semantic-router-system | grep semantic-router |
| 137 | + |
| 138 | +# Check logs manually |
| 139 | +oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20 |
| 140 | +``` |
| 141 | + |
| 142 | +### Classification test fails |
| 143 | +```bash |
| 144 | +# Verify Envoy route is accessible |
| 145 | +curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models |
| 146 | + |
| 147 | +# Check if models are ready |
| 148 | +oc get pods -n vllm-semantic-router-system |
| 149 | +``` |
| 150 | + |
| 151 | +### Grafana doesn't show metrics |
| 152 | +- Wait 15-30 seconds for metrics to appear |
| 153 | +- Refresh the dashboard |
| 154 | +- Check the time range (last 5 minutes) |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +## Cache Management |
| 159 | + |
| 160 | +### Check Cache Status |
| 161 | +```bash |
| 162 | +./deploy/openshift/demo/cache-management.sh status |
| 163 | +``` |
| 164 | + |
| 165 | +Shows recent cache activity and cached queries. |
| 166 | + |
| 167 | +### Clear Cache (for demo) |
| 168 | +```bash |
| 169 | +./deploy/openshift/demo/cache-management.sh clear |
| 170 | +``` |
| 171 | + |
| 172 | +Restarts semantic-router deployment to clear in-memory cache (~30 seconds). |
| 173 | + |
| 174 | +### Demo Cache Feature |
| 175 | + |
| 176 | +**Workflow to show caching in action:** |
| 177 | + |
| 178 | +1. Clear the cache: |
| 179 | + ```bash |
| 180 | + ./deploy/openshift/demo/cache-management.sh clear |
| 181 | + ``` |
| 182 | + |
| 183 | +2. Run classification test (first time - no cache): |
| 184 | + ```bash |
| 185 | + python3 deploy/openshift/demo/demo-semantic-router.py |
| 186 | + ``` |
| 187 | + Choose option 2 (All Classifications) |
| 188 | + - Processing time: ~3-4 seconds per query |
| 189 | + - Logs show queries going to model |
| 190 | + |
| 191 | +3. Run classification test again (second time - with cache): |
| 192 | + ```bash |
| 193 | + python3 deploy/openshift/demo/demo-semantic-router.py |
| 194 | + ``` |
| 195 | + Choose option 2 (All Classifications) again |
| 196 | + - Processing time: ~400ms per query (10x faster!) |
| 197 | + - Logs show "💾 CACHE HIT" for all queries |
| 198 | + - Similarity scores ~0.99999 |
| 199 | + |
| 200 | +**Key talking point:** Cache uses **semantic similarity**, not exact string matching! |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +## Files |
| 205 | + |
| 206 | +- `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing) |
| 207 | +- `live-classifier-logs.sh` - Classification API log viewer |
| 208 | +- `demo-semantic-router.py` - Interactive demo with multiple test options |
| 209 | +- `curl-examples.sh` - Quick classification examples (direct API) |
| 210 | +- `cache-management.sh` - Cache management helper |
| 211 | +- `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference |
| 212 | +- `demo-classification-results.json` - Test results (auto-generated) |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## Notes |
| 217 | + |
| 218 | +- The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C |
| 219 | +- Classification test takes ~60 seconds (10 prompts with 0.5s delay between each) |
| 220 | +- All requests go through Envoy, triggering the full routing pipeline |
| 221 | +- Grafana metrics update in real-time (with slight delay) |
0 commit comments