vllm-project · rootfs · Oct 16, 2025 · Oct 14, 2025 · Oct 15, 2025 · Oct 15, 2025
@@ -0,0 +1,95 @@
+# Category to Model Mapping
+
+**Configuration File:** [deploy/openshift/config-openshift.yaml](../config-openshift.yaml)
+
+## Model-A Categories (Default Model)
+
+Model-A handles **9 categories** (primarily science and technical topics):
+
+| Category | Score | Reasoning Enabled | Description |
+|----------|-------|-------------------|-------------|
+| **math** | 1.0 | ✅ Yes | Mathematics expert with step-by-step solutions |
+| **economics** | 1.0 | ❌ No | Economics expert (micro, macro, policy) |
+| **biology** | 0.9 | ❌ No | Biology expert (molecular, genetics, ecology) |
+| **physics** | 0.7 | ✅ Yes | Physics expert with mathematical derivations |
+| **history** | 0.7 | ❌ No | Historian across different periods and cultures |
+| **engineering** | 0.7 | ❌ No | Engineering expert (mechanical, electrical, civil, etc.) |
+| **other** | 0.7 | ❌ No | General helpful assistant (fallback) |
+| **chemistry** | 0.6 | ✅ Yes | Chemistry expert with lab techniques |
+| **computer science** | 0.6 | ❌ No | Computer science expert (algorithms, programming) |
+
+---
+
+## Model-B Categories
+
+Model-B handles **5 categories** (primarily social sciences and humanities):
+
+| Category | Score | Reasoning Enabled | Description |
+|----------|-------|-------------------|-------------|
+| **business** | 0.7 | ❌ No | Business consultant and strategic advisor |
+| **psychology** | 0.6 | ❌ No | Psychology expert (cognitive, behavioral, mental health) |
+| **health** | 0.5 | ❌ No | Health and medical information expert |
+| **philosophy** | 0.5 | ❌ No | Philosophy expert (ethics, logic, metaphysics) |
+| **law** | 0.4 | ❌ No | Legal expert (case law, statutory interpretation) |
+
+---
+
+## Prompts Routing (Tested & Verified)
+
+These prompts have **100% classification accuracy** and route as follows:
+
+| Category | Example Prompt | Routes To | Confidence |
+|----------|---------------|-----------|------------|
+| **Math** | "Is 17 a prime number?" | Model-B* | ~0.326 |
+| **Chemistry** | "What are atoms made of?" | Model-B* | ~0.196 |
+| **Chemistry** | "Explain oxidation and reduction" | Model-B* | ~0.200 |
+| **Chemistry** | "Explain chemical equilibrium" | Model-B* | ~0.197 |
+| **History** | "What were the main causes of World War I?" | Model-B* | ~0.218 |
+| **History** | "What was the Cold War?" | Model-B* | ~0.219 |
+| **Psychology** | "What is the nature vs nurture debate?" | Model-B | ~0.391 |
+| **Psychology** | "What are the stages of grief?" | Model-B | ~0.403 |
+| **Health** | "How to maintain a healthy lifestyle?" | Model-B | ~0.221 |
+| **Health** | "What is a balanced diet?" | Model-B | ~0.268 |
+
+---
+
+## Reasoning Mode (Chain-of-Thought)
+
+Categories with **reasoning enabled** use extended thinking for complex problems:
+
+- ✅ **Math** (Model-A) - Step-by-step mathematical solutions
+- ✅ **Chemistry** (Model-A) - Complex chemical reactions and analysis
+- ✅ **Physics** (Model-A) - Mathematical derivations and proofs
+
+---
+
+## Default Behavior
+
+- **Default Model:** Model-A
+- **Fallback Category:** "other" (score: 0.7)
+- **Unmatched queries** route to Model-A with the "other" category system prompt
+
+### Key Parameters:
+
+- **name:** Category identifier
+- **system_prompt:** Specialized prompt for this category
+- **model_scores.model:** Target model (Model-A or Model-B)
+- **model_scores.score:** Routing priority (0.0 to 1.0)
+- **use_reasoning:** Enable extended thinking mode
+
+---
+
+## Confidence Scores Explained
+
+**Why are confidence scores low (0.2-0.4)?**
+
+1. **Softmax across 14 categories** - Even the "winning" category may only get 20-40% probability
+2. **Relative, not absolute** - Scores are compared against other categories
+3. **Consistency matters** - Same prompt always gets same category (100% in our tests)
+4. **Highest score wins** - 0.326 for "math" means it beat all other 13 categories
+
+**What's important:**
+
+- ✅ Classification is **consistent** across multiple runs
+- ✅ Same prompt → same category every time
+- ✅ Confidence is **relative** to other categories, not absolute certainty
@@ -0,0 +1,242 @@
+# Demo Scripts for Semantic Router
+
+This directory contains demo scripts to showcase the semantic router capabilities.
+
+## Quick Demo Guide
+
+### 1. Live Log Viewer (Run in Terminal 1)
+
+Shows real-time classification, routing, and security decisions:
+
+```bash
+./deploy/openshift/demo/live-semantic-router-logs.sh
+```
+
+**What it shows:**
+
+- 📨 **Incoming requests** with user prompts
+- 🛡️ **Security checks** (jailbreak detection)
+- 🔍 **Classification** (category detection with confidence)
+- 🎯 **Routing decisions** (which model was selected)
+- 💾 **Cache hits** (semantic similarity matching)
+- 🧠 **Reasoning mode** activation
+
+**Tip:** Run this in a split terminal or separate window during your demo!
+
+---
+
+### 2. Interactive Demo (Run in Terminal 2)
+
+Interactive menu-driven semantic router demo:
+
+```bash
+python3 deploy/openshift/demo/demo-semantic-router.py
+```
+
+**Features:**
+
+1. **Single Classification** - Tests random prompt from golden set
+2. **All Classifications** - Tests all 10 golden prompts
+3. **PII Detection Test** - Tests personal information filtering
+4. **Jailbreak Detection Test** - Tests security filtering
+5. **Run All Tests** - Executes all tests sequentially
+
+**Requirements:**
+
+- ✅ Must be logged into OpenShift (`oc login`)
+- URLs are discovered automatically from routes
+
+**What it does:**
+
+- Goes through Envoy (same path as OpenWebUI)
+- Shows routing decisions and response previews
+- **Appears in Grafana dashboard!**
+- Interactive - choose what to test
+
+---
+
+## Demo Flow Suggestion
+
+### Setup (Before Demo)
+
+```bash
+# Terminal 1: Start log viewer
+./deploy/openshift/demo/live-semantic-router-logs.sh
+
+# Terminal 2: Ready to run classification test
+# (don't run yet)
+
+# Browser Tab 1: Open Grafana
+# http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
+
+# Browser Tab 2: Open OpenWebUI
+# http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
+```
+
+### During Demo
+
+1. **Show the system overview**
+   - Explain semantic routing concept
+   - Show the architecture diagram
+
+2. **Run interactive demo** (Terminal 2)
+
+   ```bash
+   python3 deploy/openshift/demo/demo-semantic-router.py
+   ```
+
+   Choose option 2 (All Classifications)
+
+3. **Point to live logs** (Terminal 1)
+   - Show real-time classification
+   - Highlight security checks (jailbreak: BENIGN)
+   - Show routing decisions (Model-A vs Model-B)
+   - Point out cache hits
+
+4. **Switch to Grafana** (Browser Tab 1)
+   - Show request metrics appearing
+   - Show classification category distribution
+   - Show model usage breakdown
+
+5. **Show OpenWebUI integration** (Browser Tab 2)
+   - Type one of the golden prompts
+   - Watch it appear in logs (Terminal 1)
+   - Show the same routing happening
+
+---
+
+## Key Talking Points
+
+### Classification Accuracy
+
+- **10 golden prompts** with 100% accuracy
+- Categories: Chemistry, History, Psychology, Health, Math
+- Shows consistent classification behavior
+
+### Security Features
+
+- **Jailbreak detection** on every request
+- Shows "BENIGN" for safe requests
+- Confidence scores displayed
+
+### Smart Routing
+
+- Automatic model selection based on content
+- Load balancing across Model-A and Model-B
+- Routing decisions visible in logs
+
+### Performance
+
+- **Semantic caching** reduces latency
+- Cache hits shown in logs with similarity scores
+- Sub-second response times
+
+### Observability
+
+- Real-time logs with structured JSON
+- Grafana metrics and dashboards
+- Request tracing and debugging
+
+---
+
+## Troubleshooting
+
+### Log viewer shows no output
+
+```bash
+# Check if semantic-router pod is running
+oc get pods -n vllm-semantic-router-system | grep semantic-router
+
+# Check logs manually
+oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20
+```
+
+### Classification test fails
+
+```bash
+# Verify Envoy route is accessible
+curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models
+
+# Check if models are ready
+oc get pods -n vllm-semantic-router-system
+```
+
+### Grafana doesn't show metrics
+
+- Wait 15-30 seconds for metrics to appear
+- Refresh the dashboard
+- Check the time range (last 5 minutes)
+
+---
+
+## Cache Management
+
+### Check Cache Status
+
+```bash
+./deploy/openshift/demo/cache-management.sh status
+```
+
+Shows recent cache activity and cached queries.
+
+### Clear Cache (for demo)
+
+```bash
+./deploy/openshift/demo/cache-management.sh clear
+```
+
+Restarts semantic-router deployment to clear in-memory cache (~30 seconds).
+
+### Demo Cache Feature
+
+**Workflow to show caching in action:**
+
+1. Clear the cache:
+
+   ```bash
+   ./deploy/openshift/demo/cache-management.sh clear
+   ```
+
+2. Run classification test (first time - no cache):
+
+   ```bash
+   python3 deploy/openshift/demo/demo-semantic-router.py
+   ```
+
+   Choose option 2 (All Classifications)
+   - Processing time: ~3-4 seconds per query
+   - Logs show queries going to model
+
+3. Run classification test again (second time - with cache):
+
+   ```bash
+   python3 deploy/openshift/demo/demo-semantic-router.py
+   ```
+
+   Choose option 2 (All Classifications) again
+   - Processing time: ~400ms per query (10x faster!)
+   - Logs show "💾 CACHE HIT" for all queries
+   - Similarity scores ~0.99999
+
+**Key talking point:** Cache uses **semantic similarity**, not exact string matching!
+
+---
+
+## Files
+
+- `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing)
+- `live-classifier-logs.sh` - Classification API log viewer
+- `demo-semantic-router.py` - Interactive demo with multiple test options
+- `curl-examples.sh` - Quick classification examples (direct API)
+- `cache-management.sh` - Cache management helper
+- `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference
+- `demo-classification-results.json` - Test results (auto-generated)
+
+---
+
+## Notes
+
+- The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C
+- Classification test takes ~60 seconds (10 prompts with 0.5s delay between each)
+- All requests go through Envoy, triggering the full routing pipeline
+- Grafana metrics update in real-time (with slight delay)