ExtProc router uses legacy Classifier instead of LoRA-based classifier

## Summary

I've been investigating some unexpected routing behavior in my E2E tests and wanted to share my findings. I'm not entirely sure if this is a configuration issue on my end or a potential bug, but the evidence seems worth discussing.

## Observed Behavior

When testing category-based routing with `model: auto`, I'm seeing math queries consistently routed to `Model-A` instead of the expected `Model-B`, despite the configuration showing `Model-B` has a higher score.

## Evidence

### 1. Configuration (`config/testing/config.e2e.yaml`)

```yaml
categories:
  - name: math
    model_scores:
      - model: "Model-B"
        score: 1.0          # ← HIGHEST SCORE
      - model: "Model-A" 
        score: 0.9
        
default_model: "Model-A"
threshold: 0.6
```

**Expected behavior:** Math queries should route to Model-B (score 1.0)

### 2. Test Results - BEFORE

Running a minimal reproduction test with random math queries to avoid cache hits:

```
TEST 1: Direct Classification API (port 8080)
================================================================================
Query: What is 234 + 567?
Category: math
Confidence: 0.886
Above threshold (0.6): ✅ YES
Classification correct: ✅ YES

TEST 2: Envoy Routing with model='auto' (port 8801)
================================================================================
Query: What is 234 + 567?
Request model: auto
Response model: Model-A
X-VSR-Selected-Model header: Model-A

Expected: Model-B (score 1.0 in config)
Actual: Model-A

❌ FAIL: Incorrectly routed to Model-A instead of Model-B
```

**Pattern:** Classification API correctly identifies `math` with high confidence (0.886 > threshold 0.6), but Envoy routing selects wrong model.

### 3. Router Logs Analysis

During test execution, I noticed these logs from the ExtProc router:

```
🔧 DEBUG: Router using UNIFIED classifier (LoRA models)
🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation (initialized=true)
...
❌ ERROR: Traditional BERT classifier not initialized
⚠️  WARNING: Classifier fallback: using 'biology' as category (classifier not initialized)
🔧 DEBUG: SelectBestModelForCategory: category=biology, threshold=0.600000
🔧 DEBUG: No valid model found for category 'biology'
🔧 DEBUG: Using default model: Model-A
```

**Observation:** Even though the UnifiedClassifier (LoRA-based) was initialized, the router seems to be falling back to an uninitialized traditional BERT classifier, resulting in:
1. Wrong category (`biology` instead of `math`)
2. Fallback to default model (`Model-A`)

### 4. Architecture Investigation

Looking at the code, I noticed there are two classifier systems:

**Classification API Server** (`src/semantic-router/pkg/services/classification.go`):
- Uses `UnifiedClassifier` (LoRA-based models)
- Works correctly ✅

**ExtProc Router** (`src/semantic-router/pkg/extproc/router.go`):
- Originally used legacy `Classifier` (traditional BERT)
- May not have been wired to use the `UnifiedClassifier`

## Suspected Root Cause

I *think* the issue might be that the ExtProc router is using a different classifier instance than the Classification API:

- **Classification API (port 8080):** Uses initialized `UnifiedClassifier` (LoRA-based) → correct category
- **ExtProc Router (port 8801):** Uses uninitialized legacy `Classifier` (traditional BERT) → wrong category → wrong model

## Proposed Fix (Unverified)

I tried modifying `src/semantic-router/pkg/extproc/router.go` to wire the `UnifiedClassifier` from `ClassificationService`:

```go
// In NewOpenAIRouter:
if classificationSvc.HasUnifiedClassifier() {
    unifiedClassifier := classificationSvc.GetUnifiedClassifier()
    if unifiedClassifier != nil {
        classifier.UnifiedClassifier = unifiedClassifier
        logging.Infof("🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation")
    }
}
```

And added delegation in `src/semantic-router/pkg/classification/classifier.go`:

```go
func (c *Classifier) ClassifyCategoryWithEntropy(text string) (string, float64, entropy.ReasoningDecision, error) {
    // Try UnifiedClassifier (LoRA models) first - highest accuracy
    if c.UnifiedClassifier != nil {
        return c.classifyWithUnifiedClassifier(text)
    }
    // ... rest of original logic
}
```

### Test Results - AFTER

```
TEST 1: Direct Classification API
================================================================================
Query: What is 789 + 123?
Category: math
Confidence: 0.896
Above threshold (0.6): ✅ YES

TEST 2: Envoy Routing with model='auto'
================================================================================
Query: What is 789 + 123?
Response model: Model-B
X-VSR-Selected-Model header: Model-B

Expected: Model-B
Actual: Model-B

✅ PASS: Correctly routed to Model-B
```

## Questions

1. Is this the intended behavior? Should ExtProc and the Classification API use the same classifier?
2. If so, is my proposed fix the right approach, or is there a better way to ensure consistency?
3. Could this be related to https://github.com/vllm-project/semantic-router/issues/430 (category-based routing)?

## Reproduction

**Setup:**
```bash
make run-router-e2e  # Starts Envoy, semantic-router, llm-katan
```

**Test script:**
```python
# /tmp/minimal_repro_test.py
import random
import requests

query = f"What is {random.randint(100, 999)} + {random.randint(100, 999)}?"

# Test 1: Classification API
response = requests.post(
    "http://localhost:8080/api/v1/classify/intent",
    json={"text": query}
)
result = response.json()
print(f"Category: {result['classification']['category']}")
print(f"Confidence: {result['classification']['confidence']:.3f}")

# Test 2: Envoy routing
response = requests.post(
    "http://localhost:8801/v1/chat/completions",
    json={
        "model": "auto",
        "messages": [{"role": "user", "content": query}]
    }
)
result = response.json()
print(f"Selected model: {result['model']}")
print(f"Expected: Model-B (score 1.0)")
```

## Environment

- Config: `config/testing/config.e2e.yaml`
- Models: LoRA intent classifiers (`models/lora_intent_classifier_bert-base-uncased_model/`)
- Test: `e2e-tests/02-router-classification-test.py::test_category_classification`

---

I'd appreciate any guidance on whether this is expected behavior or if my analysis is on the right track. Happy to provide more details or test different approaches!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ExtProc router uses legacy Classifier instead of LoRA-based classifier #640

Summary

Observed Behavior

Evidence

1. Configuration (`config/testing/config.e2e.yaml`)

2. Test Results - BEFORE

3. Router Logs Analysis

4. Architecture Investigation

Suspected Root Cause

Proposed Fix (Unverified)

Test Results - AFTER

Questions

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ExtProc router uses legacy Classifier instead of LoRA-based classifier #640

Description

Summary

Observed Behavior

Evidence

1. Configuration (config/testing/config.e2e.yaml)

2. Test Results - BEFORE

3. Router Logs Analysis

4. Architecture Investigation

Suspected Root Cause

Proposed Fix (Unverified)

Test Results - AFTER

Questions

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Configuration (`config/testing/config.e2e.yaml`)