Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions deploy/openshift/demo/CATEGORY-MODEL-MAPPING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Category to Model Mapping

**Configuration File:** [deploy/openshift/config-openshift.yaml](../config-openshift.yaml)

## Model-A Categories (Default Model)

Model-A handles **9 categories** (primarily science and technical topics):

| Category | Score | Reasoning Enabled | Description |
|----------|-------|-------------------|-------------|
| **math** | 1.0 | ✅ Yes | Mathematics expert with step-by-step solutions |
| **economics** | 1.0 | ❌ No | Economics expert (micro, macro, policy) |
| **biology** | 0.9 | ❌ No | Biology expert (molecular, genetics, ecology) |
| **physics** | 0.7 | ✅ Yes | Physics expert with mathematical derivations |
| **history** | 0.7 | ❌ No | Historian across different periods and cultures |
| **engineering** | 0.7 | ❌ No | Engineering expert (mechanical, electrical, civil, etc.) |
| **other** | 0.7 | ❌ No | General helpful assistant (fallback) |
| **chemistry** | 0.6 | ✅ Yes | Chemistry expert with lab techniques |
| **computer science** | 0.6 | ❌ No | Computer science expert (algorithms, programming) |

---

## Model-B Categories

Model-B handles **5 categories** (primarily social sciences and humanities):

| Category | Score | Reasoning Enabled | Description |
|----------|-------|-------------------|-------------|
| **business** | 0.7 | ❌ No | Business consultant and strategic advisor |
| **psychology** | 0.6 | ❌ No | Psychology expert (cognitive, behavioral, mental health) |
| **health** | 0.5 | ❌ No | Health and medical information expert |
| **philosophy** | 0.5 | ❌ No | Philosophy expert (ethics, logic, metaphysics) |
| **law** | 0.4 | ❌ No | Legal expert (case law, statutory interpretation) |

---

## Prompts Routing (Tested & Verified)

These prompts have **100% classification accuracy** and route as follows:

| Category | Example Prompt | Routes To | Confidence |
|----------|---------------|-----------|------------|
| **Math** | "Is 17 a prime number?" | Model-B* | ~0.326 |
| **Chemistry** | "What are atoms made of?" | Model-B* | ~0.196 |
| **Chemistry** | "Explain oxidation and reduction" | Model-B* | ~0.200 |
| **Chemistry** | "Explain chemical equilibrium" | Model-B* | ~0.197 |
| **History** | "What were the main causes of World War I?" | Model-B* | ~0.218 |
| **History** | "What was the Cold War?" | Model-B* | ~0.219 |
| **Psychology** | "What is the nature vs nurture debate?" | Model-B | ~0.391 |
| **Psychology** | "What are the stages of grief?" | Model-B | ~0.403 |
| **Health** | "How to maintain a healthy lifestyle?" | Model-B | ~0.221 |
| **Health** | "What is a balanced diet?" | Model-B | ~0.268 |

---

## Reasoning Mode (Chain-of-Thought)

Categories with **reasoning enabled** use extended thinking for complex problems:

- ✅ **Math** (Model-A) - Step-by-step mathematical solutions
- ✅ **Chemistry** (Model-A) - Complex chemical reactions and analysis
- ✅ **Physics** (Model-A) - Mathematical derivations and proofs

---

## Default Behavior

- **Default Model:** Model-A
- **Fallback Category:** "other" (score: 0.7)
- **Unmatched queries** route to Model-A with the "other" category system prompt

### Key Parameters:

- **name:** Category identifier
- **system_prompt:** Specialized prompt for this category
- **model_scores.model:** Target model (Model-A or Model-B)
- **model_scores.score:** Routing priority (0.0 to 1.0)
- **use_reasoning:** Enable extended thinking mode

---

## Confidence Scores Explained

**Why are confidence scores low (0.2-0.4)?**

1. **Softmax across 14 categories** - Even the "winning" category may only get 20-40% probability
2. **Relative, not absolute** - Scores are compared against other categories
3. **Consistency matters** - Same prompt always gets same category (100% in our tests)
4. **Highest score wins** - 0.326 for "math" means it beat all other 13 categories

**What's important:**

- ✅ Classification is **consistent** across multiple runs
- ✅ Same prompt → same category every time
- ✅ Confidence is **relative** to other categories, not absolute certainty
242 changes: 242 additions & 0 deletions deploy/openshift/demo/DEMO-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# Demo Scripts for Semantic Router

This directory contains demo scripts to showcase the semantic router capabilities.

## Quick Demo Guide

### 1. Live Log Viewer (Run in Terminal 1)

Shows real-time classification, routing, and security decisions:

```bash
./deploy/openshift/demo/live-semantic-router-logs.sh
```

**What it shows:**

- 📨 **Incoming requests** with user prompts
- 🛡️ **Security checks** (jailbreak detection)
- 🔍 **Classification** (category detection with confidence)
- 🎯 **Routing decisions** (which model was selected)
- 💾 **Cache hits** (semantic similarity matching)
- 🧠 **Reasoning mode** activation

**Tip:** Run this in a split terminal or separate window during your demo!

---

### 2. Interactive Demo (Run in Terminal 2)

Interactive menu-driven semantic router demo:

```bash
python3 deploy/openshift/demo/demo-semantic-router.py
```

**Features:**

1. **Single Classification** - Tests random prompt from golden set
2. **All Classifications** - Tests all 10 golden prompts
3. **PII Detection Test** - Tests personal information filtering
4. **Jailbreak Detection Test** - Tests security filtering
5. **Run All Tests** - Executes all tests sequentially

**Requirements:**

- ✅ Must be logged into OpenShift (`oc login`)
- URLs are discovered automatically from routes

**What it does:**

- Goes through Envoy (same path as OpenWebUI)
- Shows routing decisions and response previews
- **Appears in Grafana dashboard!**
- Interactive - choose what to test

---

## Demo Flow Suggestion

### Setup (Before Demo)

```bash
# Terminal 1: Start log viewer
./deploy/openshift/demo/live-semantic-router-logs.sh

# Terminal 2: Ready to run classification test
# (don't run yet)

# Browser Tab 1: Open Grafana
# http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com

# Browser Tab 2: Open OpenWebUI
# http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
```

### During Demo

1. **Show the system overview**
- Explain semantic routing concept
- Show the architecture diagram

2. **Run interactive demo** (Terminal 2)

```bash
python3 deploy/openshift/demo/demo-semantic-router.py
```

Choose option 2 (All Classifications)

3. **Point to live logs** (Terminal 1)
- Show real-time classification
- Highlight security checks (jailbreak: BENIGN)
- Show routing decisions (Model-A vs Model-B)
- Point out cache hits

4. **Switch to Grafana** (Browser Tab 1)
- Show request metrics appearing
- Show classification category distribution
- Show model usage breakdown

5. **Show OpenWebUI integration** (Browser Tab 2)
- Type one of the golden prompts
- Watch it appear in logs (Terminal 1)
- Show the same routing happening

---

## Key Talking Points

### Classification Accuracy

- **10 golden prompts** with 100% accuracy
- Categories: Chemistry, History, Psychology, Health, Math
- Shows consistent classification behavior

### Security Features

- **Jailbreak detection** on every request
- Shows "BENIGN" for safe requests
- Confidence scores displayed

### Smart Routing

- Automatic model selection based on content
- Load balancing across Model-A and Model-B
- Routing decisions visible in logs

### Performance

- **Semantic caching** reduces latency
- Cache hits shown in logs with similarity scores
- Sub-second response times

### Observability

- Real-time logs with structured JSON
- Grafana metrics and dashboards
- Request tracing and debugging

---

## Troubleshooting

### Log viewer shows no output

```bash
# Check if semantic-router pod is running
oc get pods -n vllm-semantic-router-system | grep semantic-router

# Check logs manually
oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20
```

### Classification test fails

```bash
# Verify Envoy route is accessible
curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models

# Check if models are ready
oc get pods -n vllm-semantic-router-system
```

### Grafana doesn't show metrics

- Wait 15-30 seconds for metrics to appear
- Refresh the dashboard
- Check the time range (last 5 minutes)

---

## Cache Management

### Check Cache Status

```bash
./deploy/openshift/demo/cache-management.sh status
```

Shows recent cache activity and cached queries.

### Clear Cache (for demo)

```bash
./deploy/openshift/demo/cache-management.sh clear
```

Restarts semantic-router deployment to clear in-memory cache (~30 seconds).

### Demo Cache Feature

**Workflow to show caching in action:**

1. Clear the cache:

```bash
./deploy/openshift/demo/cache-management.sh clear
```

2. Run classification test (first time - no cache):

```bash
python3 deploy/openshift/demo/demo-semantic-router.py
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These references to running the demo script multiple times are correct, but the workflow description in lines 93-97 in cache-management.sh incorrectly references 'demo-classification-test.py' instead of 'demo-semantic-router.py'

Copilot uses AI. Check for mistakes.
```

Choose option 2 (All Classifications)
- Processing time: ~3-4 seconds per query
- Logs show queries going to model

3. Run classification test again (second time - with cache):

```bash
python3 deploy/openshift/demo/demo-semantic-router.py
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These references to running the demo script multiple times are correct, but the workflow description in lines 93-97 in cache-management.sh incorrectly references 'demo-classification-test.py' instead of 'demo-semantic-router.py'

Copilot uses AI. Check for mistakes.
```

Choose option 2 (All Classifications) again
- Processing time: ~400ms per query (10x faster!)
- Logs show "💾 CACHE HIT" for all queries
- Similarity scores ~0.99999

**Key talking point:** Cache uses **semantic similarity**, not exact string matching!

---

## Files

- `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing)
- `live-classifier-logs.sh` - Classification API log viewer
- `demo-semantic-router.py` - Interactive demo with multiple test options
- `curl-examples.sh` - Quick classification examples (direct API)
- `cache-management.sh` - Cache management helper
- `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference
- `demo-classification-results.json` - Test results (auto-generated)

---

## Notes

- The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C
- Classification test takes ~60 seconds (10 prompts with 0.5s delay between each)
- All requests go through Envoy, triggering the full routing pipeline
- Grafana metrics update in real-time (with slight delay)
Loading
Loading