-
Notifications
You must be signed in to change notification settings - Fork 296
Description
Summary
I've been investigating some unexpected routing behavior in my E2E tests and wanted to share my findings. I'm not entirely sure if this is a configuration issue on my end or a potential bug, but the evidence seems worth discussing.
Observed Behavior
When testing category-based routing with model: auto, I'm seeing math queries consistently routed to Model-A instead of the expected Model-B, despite the configuration showing Model-B has a higher score.
Evidence
1. Configuration (config/testing/config.e2e.yaml)
categories:
- name: math
model_scores:
- model: "Model-B"
score: 1.0 # ← HIGHEST SCORE
- model: "Model-A"
score: 0.9
default_model: "Model-A"
threshold: 0.6Expected behavior: Math queries should route to Model-B (score 1.0)
2. Test Results - BEFORE
Running a minimal reproduction test with random math queries to avoid cache hits:
TEST 1: Direct Classification API (port 8080)
================================================================================
Query: What is 234 + 567?
Category: math
Confidence: 0.886
Above threshold (0.6): ✅ YES
Classification correct: ✅ YES
TEST 2: Envoy Routing with model='auto' (port 8801)
================================================================================
Query: What is 234 + 567?
Request model: auto
Response model: Model-A
X-VSR-Selected-Model header: Model-A
Expected: Model-B (score 1.0 in config)
Actual: Model-A
❌ FAIL: Incorrectly routed to Model-A instead of Model-B
Pattern: Classification API correctly identifies math with high confidence (0.886 > threshold 0.6), but Envoy routing selects wrong model.
3. Router Logs Analysis
During test execution, I noticed these logs from the ExtProc router:
🔧 DEBUG: Router using UNIFIED classifier (LoRA models)
🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation (initialized=true)
...
❌ ERROR: Traditional BERT classifier not initialized
⚠️ WARNING: Classifier fallback: using 'biology' as category (classifier not initialized)
🔧 DEBUG: SelectBestModelForCategory: category=biology, threshold=0.600000
🔧 DEBUG: No valid model found for category 'biology'
🔧 DEBUG: Using default model: Model-A
Observation: Even though the UnifiedClassifier (LoRA-based) was initialized, the router seems to be falling back to an uninitialized traditional BERT classifier, resulting in:
- Wrong category (
biologyinstead ofmath) - Fallback to default model (
Model-A)
4. Architecture Investigation
Looking at the code, I noticed there are two classifier systems:
Classification API Server (src/semantic-router/pkg/services/classification.go):
- Uses
UnifiedClassifier(LoRA-based models) - Works correctly ✅
ExtProc Router (src/semantic-router/pkg/extproc/router.go):
- Originally used legacy
Classifier(traditional BERT) - May not have been wired to use the
UnifiedClassifier
Suspected Root Cause
I think the issue might be that the ExtProc router is using a different classifier instance than the Classification API:
- Classification API (port 8080): Uses initialized
UnifiedClassifier(LoRA-based) → correct category - ExtProc Router (port 8801): Uses uninitialized legacy
Classifier(traditional BERT) → wrong category → wrong model
Proposed Fix (Unverified)
I tried modifying src/semantic-router/pkg/extproc/router.go to wire the UnifiedClassifier from ClassificationService:
// In NewOpenAIRouter:
if classificationSvc.HasUnifiedClassifier() {
unifiedClassifier := classificationSvc.GetUnifiedClassifier()
if unifiedClassifier != nil {
classifier.UnifiedClassifier = unifiedClassifier
logging.Infof("🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation")
}
}And added delegation in src/semantic-router/pkg/classification/classifier.go:
func (c *Classifier) ClassifyCategoryWithEntropy(text string) (string, float64, entropy.ReasoningDecision, error) {
// Try UnifiedClassifier (LoRA models) first - highest accuracy
if c.UnifiedClassifier != nil {
return c.classifyWithUnifiedClassifier(text)
}
// ... rest of original logic
}Test Results - AFTER
TEST 1: Direct Classification API
================================================================================
Query: What is 789 + 123?
Category: math
Confidence: 0.896
Above threshold (0.6): ✅ YES
TEST 2: Envoy Routing with model='auto'
================================================================================
Query: What is 789 + 123?
Response model: Model-B
X-VSR-Selected-Model header: Model-B
Expected: Model-B
Actual: Model-B
✅ PASS: Correctly routed to Model-B
Questions
- Is this the intended behavior? Should ExtProc and the Classification API use the same classifier?
- If so, is my proposed fix the right approach, or is there a better way to ensure consistency?
- Could this be related to Bug: Response model field does not match routing decision #430 (category-based routing)?
Reproduction
Setup:
make run-router-e2e # Starts Envoy, semantic-router, llm-katanTest script:
# /tmp/minimal_repro_test.py
import random
import requests
query = f"What is {random.randint(100, 999)} + {random.randint(100, 999)}?"
# Test 1: Classification API
response = requests.post(
"http://localhost:8080/api/v1/classify/intent",
json={"text": query}
)
result = response.json()
print(f"Category: {result['classification']['category']}")
print(f"Confidence: {result['classification']['confidence']:.3f}")
# Test 2: Envoy routing
response = requests.post(
"http://localhost:8801/v1/chat/completions",
json={
"model": "auto",
"messages": [{"role": "user", "content": query}]
}
)
result = response.json()
print(f"Selected model: {result['model']}")
print(f"Expected: Model-B (score 1.0)")Environment
- Config:
config/testing/config.e2e.yaml - Models: LoRA intent classifiers (
models/lora_intent_classifier_bert-base-uncased_model/) - Test:
e2e-tests/02-router-classification-test.py::test_category_classification
I'd appreciate any guidance on whether this is expected behavior or if my analysis is on the right track. Happy to provide more details or test different approaches!