Skip to content

ExtProc router uses legacy Classifier instead of LoRA-based classifier #640

@yossiovadia

Description

@yossiovadia

Summary

I've been investigating some unexpected routing behavior in my E2E tests and wanted to share my findings. I'm not entirely sure if this is a configuration issue on my end or a potential bug, but the evidence seems worth discussing.

Observed Behavior

When testing category-based routing with model: auto, I'm seeing math queries consistently routed to Model-A instead of the expected Model-B, despite the configuration showing Model-B has a higher score.

Evidence

1. Configuration (config/testing/config.e2e.yaml)

categories:
  - name: math
    model_scores:
      - model: "Model-B"
        score: 1.0          # ← HIGHEST SCORE
      - model: "Model-A" 
        score: 0.9
        
default_model: "Model-A"
threshold: 0.6

Expected behavior: Math queries should route to Model-B (score 1.0)

2. Test Results - BEFORE

Running a minimal reproduction test with random math queries to avoid cache hits:

TEST 1: Direct Classification API (port 8080)
================================================================================
Query: What is 234 + 567?
Category: math
Confidence: 0.886
Above threshold (0.6): ✅ YES
Classification correct: ✅ YES

TEST 2: Envoy Routing with model='auto' (port 8801)
================================================================================
Query: What is 234 + 567?
Request model: auto
Response model: Model-A
X-VSR-Selected-Model header: Model-A

Expected: Model-B (score 1.0 in config)
Actual: Model-A

❌ FAIL: Incorrectly routed to Model-A instead of Model-B

Pattern: Classification API correctly identifies math with high confidence (0.886 > threshold 0.6), but Envoy routing selects wrong model.

3. Router Logs Analysis

During test execution, I noticed these logs from the ExtProc router:

🔧 DEBUG: Router using UNIFIED classifier (LoRA models)
🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation (initialized=true)
...
❌ ERROR: Traditional BERT classifier not initialized
⚠️  WARNING: Classifier fallback: using 'biology' as category (classifier not initialized)
🔧 DEBUG: SelectBestModelForCategory: category=biology, threshold=0.600000
🔧 DEBUG: No valid model found for category 'biology'
🔧 DEBUG: Using default model: Model-A

Observation: Even though the UnifiedClassifier (LoRA-based) was initialized, the router seems to be falling back to an uninitialized traditional BERT classifier, resulting in:

  1. Wrong category (biology instead of math)
  2. Fallback to default model (Model-A)

4. Architecture Investigation

Looking at the code, I noticed there are two classifier systems:

Classification API Server (src/semantic-router/pkg/services/classification.go):

  • Uses UnifiedClassifier (LoRA-based models)
  • Works correctly ✅

ExtProc Router (src/semantic-router/pkg/extproc/router.go):

  • Originally used legacy Classifier (traditional BERT)
  • May not have been wired to use the UnifiedClassifier

Suspected Root Cause

I think the issue might be that the ExtProc router is using a different classifier instance than the Classification API:

  • Classification API (port 8080): Uses initialized UnifiedClassifier (LoRA-based) → correct category
  • ExtProc Router (port 8801): Uses uninitialized legacy Classifier (traditional BERT) → wrong category → wrong model

Proposed Fix (Unverified)

I tried modifying src/semantic-router/pkg/extproc/router.go to wire the UnifiedClassifier from ClassificationService:

// In NewOpenAIRouter:
if classificationSvc.HasUnifiedClassifier() {
    unifiedClassifier := classificationSvc.GetUnifiedClassifier()
    if unifiedClassifier != nil {
        classifier.UnifiedClassifier = unifiedClassifier
        logging.Infof("🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation")
    }
}

And added delegation in src/semantic-router/pkg/classification/classifier.go:

func (c *Classifier) ClassifyCategoryWithEntropy(text string) (string, float64, entropy.ReasoningDecision, error) {
    // Try UnifiedClassifier (LoRA models) first - highest accuracy
    if c.UnifiedClassifier != nil {
        return c.classifyWithUnifiedClassifier(text)
    }
    // ... rest of original logic
}

Test Results - AFTER

TEST 1: Direct Classification API
================================================================================
Query: What is 789 + 123?
Category: math
Confidence: 0.896
Above threshold (0.6): ✅ YES

TEST 2: Envoy Routing with model='auto'
================================================================================
Query: What is 789 + 123?
Response model: Model-B
X-VSR-Selected-Model header: Model-B

Expected: Model-B
Actual: Model-B

✅ PASS: Correctly routed to Model-B

Questions

  1. Is this the intended behavior? Should ExtProc and the Classification API use the same classifier?
  2. If so, is my proposed fix the right approach, or is there a better way to ensure consistency?
  3. Could this be related to Bug: Response model field does not match routing decision #430 (category-based routing)?

Reproduction

Setup:

make run-router-e2e  # Starts Envoy, semantic-router, llm-katan

Test script:

# /tmp/minimal_repro_test.py
import random
import requests

query = f"What is {random.randint(100, 999)} + {random.randint(100, 999)}?"

# Test 1: Classification API
response = requests.post(
    "http://localhost:8080/api/v1/classify/intent",
    json={"text": query}
)
result = response.json()
print(f"Category: {result['classification']['category']}")
print(f"Confidence: {result['classification']['confidence']:.3f}")

# Test 2: Envoy routing
response = requests.post(
    "http://localhost:8801/v1/chat/completions",
    json={
        "model": "auto",
        "messages": [{"role": "user", "content": query}]
    }
)
result = response.json()
print(f"Selected model: {result['model']}")
print(f"Expected: Model-B (score 1.0)")

Environment

  • Config: config/testing/config.e2e.yaml
  • Models: LoRA intent classifiers (models/lora_intent_classifier_bert-base-uncased_model/)
  • Test: e2e-tests/02-router-classification-test.py::test_category_classification

I'd appreciate any guidance on whether this is expected behavior or if my analysis is on the right track. Happy to provide more details or test different approaches!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions