Support Semantic Chain for Flexible Multi-Stage Model Routing

### Is your feature request related to a problem? Please describe.

The current semantic routing implementation only supports category-based model selection using a fine-tuned ModernBERT classifier. This single-stage approach has critical limitations:

1. **No Customizable Routing Pipeline**: Cannot define custom routing stages or their execution order
2. **Limited Flexibility**: Cannot combine multiple routing criteria (keywords + similarity + categories) to progressively narrow down model candidates
3. **No Multi-Stage Filtering**: Cannot first filter models by keywords, then by semantic similarity, then select the best one using category classification
4. **Inflexible Combination Logic**: Cannot choose between sequential filtering or parallel intersection of routing stages
5. **Fixed Category Set**: Constrained to predefined MMLU-Pro categories from the training dataset (14 categories)
6. **No Custom Categories**: Cannot define domain-specific categories without retraining the classifier

**Real-World Scenario**:
A user wants to route queries about "Kubernetes security best practices" to specialized models. The ideal workflow would be:

1. **Stage 1 - Keyword Filtering**: Match keywords "Kubernetes" → [k8s-expert, devops-model, cloud-model, security-model]
2. **Stage 2 - Similarity Matching**: Match custom category "cloud-native-security" by semantic similarity → [k8s-expert, security-model, cloud-model]
3. **Stage 3 - Category Classification**: Classify as "computer science" → [k8s-expert, devops-model, cloud-model]
4. **Combination**: Take intersection of all three stages → [k8s-expert, cloud-model]
5. **Final Selection**: Select best model by score → k8s-expert (score: 0.95)

Currently, this is impossible without retraining the classifier or manually managing complex routing logic.

**Related Issues**:
- Issue #312: Support Similarity-Based Custom Category Routing for Dynamic Model Selection

### Describe the solution you'd like

Implement a **Semantic Chain** routing system that allows users to define custom multi-stage routing pipelines through configuration. The solution should enable:

**Core Concept**: A `semantic_chain` array that defines the order and stages of routing execution:

```yaml
routing:
  semantic_chain: ["keyword", "similarity-category", "finetune-category"]
  combination_mode: "sequential"  # or "intersection"
```

**Key Features**:
1. **Flexible Stage Ordering**: Define any order of routing stages (keyword → similarity → category, or any other combination)
2. **Stage Selection**: Enable/disable stages by including/excluding them from the chain
3. **Combination Modes**: Choose between sequential filtering or parallel intersection
4. **Zero-Shot Custom Categories**: Add domain-specific categories without retraining (via similarity matching)
5. **Backward Compatible**: Existing category-only routing continues to work (`semantic_chain: ["finetune-category"]`)

#### Architecture Overview

**Sequential Mode Flow**:

```mermaid
flowchart TD
    A[User Query] --> B{Parse semantic_chain}
    B --> C[Stage 1: keyword]
    C --> D[Candidate Models A]
    D --> E[Stage 2: similarity-category]
    E --> F[Filter A with Similarity]
    F --> G[Candidate Models B]
    G --> H[Stage 3: finetune-category]
    H --> I[Filter B with Category]
    I --> J[Final Candidates]
    J --> K[Select Best Model by Score]
    K --> L[Return Selected Model]

    style A fill:#e1f5ff
    style L fill:#c8e6c9
    style C fill:#fff9c4
    style E fill:#ffe0b2
    style H fill:#f8bbd0
```

**Intersection Mode Flow**:

```mermaid
flowchart TD
    A[User Query] --> B{Parse semantic_chain}
    B --> C[Stage 1: keyword]
    B --> D[Stage 2: similarity-category]
    B --> E[Stage 3: finetune-category]

    C --> F[Candidate Models A]
    D --> G[Candidate Models B]
    E --> H[Candidate Models C]

    F --> I[Intersection: A ∩ B ∩ C]
    G --> I
    H --> I

    I --> J[Final Candidates]
    J --> K[Select Best Model by Score]
    K --> L[Return Selected Model]

    style A fill:#e1f5ff
    style L fill:#c8e6c9
    style C fill:#fff9c4
    style D fill:#ffe0b2
    style E fill:#f8bbd0
    style I fill:#d1c4e9
```

**Stage Details**:

```mermaid
graph LR
    subgraph "Stage 1: keyword"
        K1[Match Keywords] --> K2[AND/OR Logic]
        K2 --> K3[Case Sensitivity]
        K3 --> K4[Output: Models A]
    end

    subgraph "Stage 2: similarity-category"
        S1[Generate Embedding] --> S2[Cosine Similarity]
        S2 --> S3[Threshold Check]
        S3 --> S4[Gap Validation]
        S4 --> S5[Output: Models B]
    end

    subgraph "Stage 3: finetune-category"
        F1[ModernBERT Inference] --> F2[Category Prediction]
        F2 --> F3[Confidence Check]
        F3 --> F4[Output: Models C]
    end

    style K1 fill:#fff9c4
    style S1 fill:#ffe0b2
    style F1 fill:#f8bbd0
```

The solution should provide:

#### 1. Three-Stage Routing Configuration

```yaml
# Three-stage hybrid routing configuration
routing:
  # Semantic routing chain: defines the order and stages of routing
  # Each stage filters candidate models, and stages are executed in order
  # Available stages:
  #   - "keyword": Keyword-based matching
  #   - "similarity-category": Similarity-based custom category matching
  #   - "finetune-category": Fine-tuned ModernBERT category classification
  #
  # Examples:
  #   ["keyword", "finetune-category"]                    - Keyword then category
  #   ["similarity-category", "finetune-category"]        - Similarity then category
  #   ["keyword", "similarity-category", "finetune-category"] - All three stages
  #   ["finetune-category"]                               - Category only (existing behavior)
  #   ["keyword"]                                         - Keyword only
  #   ["similarity-category"]                             - Similarity only
  semantic_chain: ["keyword", "similarity-category", "finetune-category"]

  # Combination mode: how to combine results from multiple stages
  # Options:
  #   - "intersection": Take intersection of candidates from all stages (A ∩ B ∩ C)
  #   - "sequential": Each stage filters the output of previous stage (A → B → C)
  combination_mode: "sequential"  # or "intersection"

  # Fallback behavior when no models match after all stages
  fallback: "default_model"

# Stage 1: Keyword-based routing rules
keyword_routing:
  enabled: true
  rules:
    - name: "kubernetes-infrastructure"
      description: "Route Kubernetes-related queries to infrastructure models"
      keywords:
        operator: "OR"  # OR | AND
        case_sensitive: false
        terms:
          - "kubernetes"
          - "k8s"
          - "kubectl"
          - "helm"
          - "pod"
          - "deployment"
      # Candidate models that match these keywords
      candidate_models:
        - "infrastructure-expert-model"
        - "devops-specialist-model"
        - "cloud-native-model"
      priority: 100

    - name: "database-operations"
      keywords:
        operator: "AND"  # Must contain ALL keywords
        case_sensitive: false
        terms:
          - "database"
          - "query"
      candidate_models:
        - "database-expert-model"
        - "sql-specialist-model"
        - "data-engineer-model"
      priority: 90

    - name: "security-critical"
      keywords:
        operator: "OR"
        case_sensitive: true  # Case-sensitive for CVE IDs
        terms:
          - "CVE-"
          - "vulnerability"
          - "exploit"
          - "security"
      candidate_models:
        - "security-hardened-model"
        - "compliance-model"
      priority: 95

# Stage 2: Similarity-based custom category routing (NEW - from issue #312)
custom_categories:
  enabled: true
  similarity_threshold: 0.75  # Minimum cosine similarity for matching
  gap_threshold: 0.05         # Minimum gap between top-1 and top-2 to avoid ambiguity

  categories:
    - id: "cloud-native-security"
      name: "Cloud Native Security"
      description: "Security best practices for cloud-native applications, Kubernetes security, container security, and DevSecOps"
      examples:
        - "How to secure a Kubernetes cluster?"
        - "Best practices for container image scanning"
        - "Implementing RBAC in Kubernetes"
        - "How to prevent privilege escalation in containers?"
      candidate_models:
        - "k8s-expert"
        - "security-model"
        - "cloud-model"

    - id: "travel-planning"
      name: "Travel & Tourism"
      description: "Travel planning, destination recommendations, visa requirements, and tourism information"
      examples:
        - "Recommend a 3-day itinerary for Paris"
        - "What documents do I need for a Schengen visa?"
        - "Best time to visit Japan for cherry blossoms"
      candidate_models:
        - "travel-expert-model"
        - "general-assistant-model"

    - id: "legal-consulting"
      name: "Legal Consultation"
      description: "Questions about laws, regulations, legal procedures, and compliance"
      examples:
        - "What are the latest changes in civil law?"
        - "How to apply for legal aid?"
        - "GDPR compliance requirements for data processing"
      candidate_models:
        - "legal-expert-model"
        - "compliance-model"

# Stage 3: Existing category-based routing (unchanged)
categories:
  - name: computer science
    model_scores:
      - model: infrastructure-expert-model
        score: 0.9
      - model: devops-specialist-model
        score: 0.85
      - model: k8s-expert
        score: 0.95
      - model: general-cs-model
        score: 0.7

  - name: engineering
    model_scores:
      - model: cloud-native-model
        score: 0.9
      - model: infrastructure-expert-model
        score: 0.8

# BERT model for similarity calculation (reuse existing config)
bert_model:
  model_id: sentence-transformers/all-MiniLM-L12-v2
  threshold: 0.6
  use_cpu: true
```

#### 2. Complete End-to-End Flow Diagram

```mermaid
sequenceDiagram
    participant User
    participant Router as SemanticChainRouter
    participant KM as KeywordMatcher
    participant SM as SimilarityMatcher
    participant FC as FinetuneClassifier
    participant Selector as ModelSelector

    User->>Router: Query: "How to secure K8s with RBAC?"
    Router->>Router: Parse semantic_chain config

    Note over Router: semantic_chain: ["keyword", "similarity-category", "finetune-category"]
    Note over Router: combination_mode: "sequential"

    rect rgb(255, 249, 196)
        Note over Router,KM: Stage 1: keyword
        Router->>KM: Match keywords in query
        KM->>KM: Check "kubernetes", "k8s", "kubectl"
        KM->>KM: Match rule: "kubernetes-infrastructure"
        KM-->>Router: Candidates A: [k8s-expert, devops-model, cloud-model, infra-model]
    end

    rect rgb(255, 224, 178)
        Note over Router,SM: Stage 2: similarity-category
        Router->>SM: Generate embedding + match categories
        SM->>SM: Embedding: [0.12, -0.45, 0.78, ...]
        SM->>SM: Calculate similarity with custom categories
        SM->>SM: Best match: "cloud-native-security" (0.82)
        SM->>SM: Gap check: 0.82 - 0.67 = 0.15 > 0.05 ✓
        SM-->>Router: Candidates B: [k8s-expert, security-model, cloud-model]
        Router->>Router: Filter A with B: [k8s-expert, cloud-model]
    end

    rect rgb(248, 187, 208)
        Note over Router,FC: Stage 3: finetune-category
        Router->>FC: Classify query
        FC->>FC: ModernBERT inference
        FC->>FC: Predicted: "computer science" (0.85)
        FC-->>Router: Candidates C: [k8s-expert, devops-model, infra-model]
        Router->>Router: Filter [k8s-expert, cloud-model] with C
        Router->>Router: Final: [k8s-expert]
    end

    rect rgb(200, 230, 201)
        Note over Router,Selector: Model Selection
        Router->>Selector: Select best from [k8s-expert]
        Selector->>Selector: Check category scores
        Selector->>Selector: k8s-expert: score 0.95
        Selector-->>Router: Selected: k8s-expert
    end

    Router-->>User: Return: k8s-expert
```

**Configuration Flexibility Examples**:

```mermaid
graph TD
    subgraph "Example 1: Full Chain Sequential"
        A1[semantic_chain: keyword, similarity, finetune] --> B1[combination_mode: sequential]
        B1 --> C1[All Models → Keyword → Similarity → Category → Final]
    end

    subgraph "Example 2: Keyword + Category"
        A2[semantic_chain: keyword, finetune] --> B2[combination_mode: sequential]
        B2 --> C2[All Models → Keyword → Category → Final]
    end

    subgraph "Example 3: Similarity Only"
        A3[semantic_chain: similarity] --> B3[combination_mode: sequential]
        B3 --> C3[All Models → Similarity → Final]
    end

    subgraph "Example 4: Intersection Mode"
        A4[semantic_chain: keyword, similarity, finetune] --> B4[combination_mode: intersection]
        B4 --> C4[Keyword ∩ Similarity ∩ Category → Final]
    end

    style C1 fill:#c8e6c9
    style C2 fill:#c8e6c9
    style C3 fill:#c8e6c9
    style C4 fill:#c8e6c9
```

#### 3. Semantic Chain Routing Examples

**Example A: Full Three-Stage Sequential Chain**
```yaml
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
combination_mode: "sequential"
```

```
User Query: "How to secure a Kubernetes cluster with RBAC?"
    ↓
Stage 1: keyword
  - Match rule: "kubernetes-infrastructure"
  - Output: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
    ↓
Stage 2: similarity-category (filters Stage 1 output)
  - Input candidates: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
  - Best match: "cloud-native-security" (similarity: 0.82)
  - Similarity candidates: [k8s-expert, security-model, cloud-model]
  - Intersection with input: [k8s-expert, cloud-native-model]
  - Output: [k8s-expert, cloud-native-model]
    ↓
Stage 3: finetune-category (filters Stage 2 output)
  - Input candidates: [k8s-expert, cloud-native-model]
  - Predicted category: "computer science" (confidence: 0.85)
  - Category candidates: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
  - Intersection with input: [k8s-expert]
  - Output: [k8s-expert]
    ↓
Final Selection: k8s-expert (score: 0.95)
```

**Example B: Keyword → Category (Skip Similarity)**
```yaml
semantic_chain: ["keyword", "finetune-category"]
combination_mode: "sequential"
```

```
User Query: "How to secure a Kubernetes cluster with RBAC?"
    ↓
Stage 1: keyword
  - Match rule: "kubernetes-infrastructure"
  - Output: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
    ↓
Stage 2: finetune-category (filters Stage 1 output)
  - Input candidates: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
  - Predicted category: "computer science" (confidence: 0.85)
  - Category candidates: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
  - Intersection with input: [infrastructure-expert-model, devops-specialist-model, k8s-expert]
  - Output: [infrastructure-expert-model, devops-specialist-model, k8s-expert]
    ↓
Final Selection: k8s-expert (score: 0.95)
```

**Example C: Similarity Only (Zero-Shot Custom Categories)**
```yaml
semantic_chain: ["similarity-category"]
combination_mode: "sequential"
```

```
User Query: "Recommend a 3-day itinerary for Paris"
    ↓
Stage 1: similarity-category
  - Generate query embedding
  - Best match: "travel-planning" (similarity: 0.88, gap: 0.20)
  - Output: [travel-expert-model, general-assistant-model]
    ↓
Final Selection: travel-expert-model (score: 0.90)
```

**Example D: Intersection Mode (Parallel Execution)**
```yaml
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
combination_mode: "intersection"
```

```
User Query: "How to secure a Kubernetes cluster with RBAC?"
    ↓
    ├─→ Stage 1: keyword
    │   - Output A: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
    │
    ├─→ Stage 2: similarity-category
    │   - Output B: [k8s-expert, security-model, cloud-model]
    │
    └─→ Stage 3: finetune-category
        - Output C: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
    ↓
Intersection (A ∩ B ∩ C): [k8s-expert]
    ↓
Final Selection: k8s-expert (score: 0.95)
```

**Example E: Category Only (Existing Behavior)**
```yaml
semantic_chain: ["finetune-category"]
combination_mode: "sequential"
```

```
User Query: "How to secure a Kubernetes cluster with RBAC?"
    ↓
Stage 1: finetune-category
  - Predicted category: "computer science" (confidence: 0.85)
  - Output: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
    ↓
Final Selection: k8s-expert (score: 0.95)
```

#### 3. Combination Mode Comparison

```mermaid
graph TB
    subgraph "Sequential Mode: Progressive Filtering"
        Q1[Query] --> AM1[All Models: 100 models]
        AM1 --> K1[Keyword Stage]
        K1 --> C1[Candidates: 20 models]
        C1 --> S1[Similarity Stage]
        S1 --> C2[Candidates: 8 models]
        C2 --> F1[Finetune Stage]
        F1 --> C3[Candidates: 3 models]
        C3 --> R1[Select Best: 1 model]

        style AM1 fill:#e3f2fd
        style C1 fill:#fff9c4
        style C2 fill:#ffe0b2
        style C3 fill:#f8bbd0
        style R1 fill:#c8e6c9
    end

    subgraph "Intersection Mode: Parallel Execution"
        Q2[Query] --> AM2[All Models: 100 models]

        AM2 --> K2[Keyword Stage]
        AM2 --> S2[Similarity Stage]
        AM2 --> F2[Finetune Stage]

        K2 --> CK[Candidates A: 20 models]
        S2 --> CS[Candidates B: 15 models]
        F2 --> CF[Candidates C: 25 models]

        CK --> INT[Intersection: A ∩ B ∩ C]
        CS --> INT
        CF --> INT

        INT --> C4[Candidates: 5 models]
        C4 --> R2[Select Best: 1 model]

        style AM2 fill:#e3f2fd
        style CK fill:#fff9c4
        style CS fill:#ffe0b2
        style CF fill:#f8bbd0
        style INT fill:#d1c4e9
        style R2 fill:#c8e6c9
    end
```

**Performance Characteristics**:

```mermaid
graph LR
    subgraph "Latency Breakdown"
        A[Total: ~45ms] --> B[Keyword: 2ms]
        A --> C[Similarity: 15ms]
        A --> D[Category: 25ms]
        A --> E[Selection: 3ms]
    end

    subgraph "Sequential Mode"
        S1[Stage 1: 2ms] --> S2[Stage 2: 15ms]
        S2 --> S3[Stage 3: 25ms]
        S3 --> S4[Total: 42ms]
    end

    subgraph "Intersection Mode"
        P1[Stage 1: 2ms]
        P2[Stage 2: 15ms]
        P3[Stage 3: 25ms]
        P1 --> P4[Total: 25ms parallel]
        P2 --> P4
        P3 --> P4
    end

    style S4 fill:#fff9c4
    style P4 fill:#c8e6c9
```

#### 4. Three-Stage Implementation Architecture

**Core Components:**

1. **KeywordMatcher** (`pkg/utils/keyword/matcher.go`)
```go
type KeywordMatcher struct {
    rules []KeywordRule
}

type KeywordRule struct {
    Name            string
    Keywords        KeywordSet
    CandidateModels []string
    Priority        int
}

type KeywordSet struct {
    Operator      string   // "AND" | "OR"
    CaseSensitive bool
    Terms         []string
}

// MatchQuery returns matched rules and their candidate models
func (m *KeywordMatcher) MatchQuery(query string) []KeywordMatchResult

type KeywordMatchResult struct {
    RuleName        string
    CandidateModels []string
    Priority        int
}
```

2. **SimilarityMatcher** (`pkg/utils/similarity/matcher.go` - NEW)
```go
type SimilarityMatcher struct {
    categories          []CustomCategory
    categoryEmbeddings  map[string][]float32  // Pre-computed embeddings
    similarityThreshold float32
    gapThreshold        float32
}

type CustomCategory struct {
    ID              string
    Name            string
    Description     string
    Examples        []string
    CandidateModels []string
}

// MatchQuery finds the best matching custom category by semantic similarity
func (m *SimilarityMatcher) MatchQuery(query string) (*SimilarityMatchResult, error)

type SimilarityMatchResult struct {
    CategoryID      string
    CategoryName    string
    Similarity      float32
    Gap             float32  // Difference between top-1 and top-2
    CandidateModels []string
    Confident       bool     // True if similarity >= threshold AND gap >= gapThreshold
}

// InitializeEmbeddings pre-computes and caches category embeddings
func (m *SimilarityMatcher) InitializeEmbeddings() error {
    for _, category := range m.categories {
        // Combine description and examples for better matching
        text := category.Description
        for _, example := range category.Examples {
            text += " " + example
        }

        // Generate embedding using existing BERT model
        embedding, err := candle_binding.GetEmbedding(text, 512)
        if err != nil {
            return fmt.Errorf("failed to generate embedding for category %s: %w", category.ID, err)
        }

        // Normalize embedding for cosine similarity
        normalized := normalizeEmbedding(embedding)
        m.categoryEmbeddings[category.ID] = normalized
    }
    return nil
}

// calculateCosineSimilarity computes cosine similarity between two normalized embeddings
func calculateCosineSimilarity(a, b []float32) float32 {
    var dotProduct float32
    for i := 0; i < len(a) && i < len(b); i++ {
        dotProduct += a[i] * b[i]
    }
    return dotProduct
}
```

3. **SemanticChainRouter** (`pkg/extproc/semantic_chain_router.go` - NEW)
```go
type SemanticChainRouter struct {
    config            *config.RouterConfig
    keywordMatcher    *keyword.KeywordMatcher
    similarityMatcher *similarity.SimilarityMatcher
    classifier        *classification.Classifier
}

// SelectModel performs semantic chain routing based on configured chain
func (r *SemanticChainRouter) SelectModel(query string) (string, *RoutingDecision) {
    decision := &RoutingDecision{
        SemanticChain:    r.config.Routing.SemanticChain,
        CombinationMode:  r.config.Routing.CombinationMode,
        StageResults:     make(map[string]*StageResult),
    }

    // Execute routing based on combination mode
    if r.config.Routing.IsSequentialMode() {
        return r.executeSequential(query, decision)
    } else if r.config.Routing.IsIntersectionMode() {
        return r.executeIntersection(query, decision)
    }

    // Fallback to default model
    return r.config.DefaultModel, decision
}

// executeSequential executes stages sequentially, each filtering the previous output
func (r *SemanticChainRouter) executeSequential(query string, decision *RoutingDecision) (string, *RoutingDecision) {
    var candidates []string

    for i, stage := range r.config.Routing.SemanticChain {
        stageResult := &StageResult{
            Stage:         stage,
            InputModels:   candidates,
        }

        switch stage {
        case "keyword":
            candidates = r.executeKeywordStage(query, candidates, stageResult)
        case "similarity-category":
            candidates = r.executeSimilarityStage(query, candidates, stageResult)
        case "finetune-category":
            candidates = r.executeFinetuneStage(query, candidates, stageResult)
        }

        decision.StageResults[stage] = stageResult

        // If no candidates after this stage, stop
        if len(candidates) == 0 {
            observability.Warnf("No candidates after stage %d (%s), using fallback", i+1, stage)
            return r.config.DefaultModel, decision
        }
    }

    // Select best model from final candidates
    selectedModel := r.selectBestModel(candidates, decision)
    decision.SelectedModel = selectedModel
    decision.FinalCandidates = candidates

    return selectedModel, decision
}

// executeIntersection executes all stages in parallel and takes intersection
func (r *SemanticChainRouter) executeIntersection(query string, decision *RoutingDecision) (string, *RoutingDecision) {
    allCandidates := make([][]string, 0)

    for _, stage := range r.config.Routing.SemanticChain {
        stageResult := &StageResult{
            Stage: stage,
        }

        var candidates []string
        switch stage {
        case "keyword":
            candidates = r.executeKeywordStage(query, nil, stageResult)
        case "similarity-category":
            candidates = r.executeSimilarityStage(query, nil, stageResult)
        case "finetune-category":
            candidates = r.executeFinetuneStage(query, nil, stageResult)
        }

        decision.StageResults[stage] = stageResult

        if len(candidates) > 0 {
            allCandidates = append(allCandidates, candidates)
        }
    }

    // Take intersection of all candidate sets
    intersection := r.intersectCandidates(allCandidates)

    if len(intersection) == 0 {
        observability.Warnf("No candidates in intersection, using fallback")
        return r.config.DefaultModel, decision
    }

    // Select best model from intersection
    selectedModel := r.selectBestModel(intersection, decision)
    decision.SelectedModel = selectedModel
    decision.FinalCandidates = intersection

    return selectedModel, decision
}

type RoutingDecision struct {
    SelectedModel    string
    SemanticChain    []string
    CombinationMode  string
    StageResults     map[string]*StageResult
    FinalCandidates  []string
    DecisionPath     string
}

type StageResult struct {
    Stage            string
    InputModels      []string  // Input to this stage (for sequential mode)
    OutputModels     []string  // Output from this stage

    // Stage-specific details
    KeywordMatches   []string  // For keyword stage
    SimilarityMatch  *SimilarityMatch  // For similarity stage
    CategoryMatch    *CategoryMatch    // For finetune stage

    ExecutionTime    time.Duration
}

type SimilarityMatch struct {
    CategoryID   string
    CategoryName string
    Similarity   float32
    Gap          float32
    Confident    bool
}

type CategoryMatch struct {
    Category   string
    Confidence float64
}
```

4. **Configuration Extension** (`pkg/config/config.go`)
```go
type RouterConfig struct {
    // ... existing fields ...

    Routing           RoutingConfig           `yaml:"routing"`
    KeywordRouting    KeywordRoutingConfig    `yaml:"keyword_routing"`
    CustomCategories  CustomCategoriesConfig  `yaml:"custom_categories"`  // NEW
}

type RoutingConfig struct {
    // Semantic routing chain: defines the order and stages of routing
    // Each element represents a routing stage to execute
    // Available stages:
    //   - "keyword": Keyword-based matching
    //   - "similarity-category": Similarity-based custom category matching
    //   - "finetune-category": Fine-tuned ModernBERT category classification
    //
    // Examples:
    //   ["keyword", "finetune-category"]                    - Keyword then category
    //   ["similarity-category", "finetune-category"]        - Similarity then category
    //   ["keyword", "similarity-category", "finetune-category"] - All three stages
    //   ["finetune-category"]                               - Category only (existing behavior)
    SemanticChain []string `yaml:"semantic_chain"`

    // Combination mode: how to combine results from multiple stages
    // Options:
    //   - "sequential": Each stage filters the output of previous stage (A → B → C)
    //   - "intersection": Take intersection of candidates from all stages (A ∩ B ∩ C)
    CombinationMode string `yaml:"combination_mode"`

    // Fallback behavior when no models match after all stages
    Fallback string `yaml:"fallback"` // "default_model"
}

type KeywordRoutingConfig struct {
    Enabled bool           `yaml:"enabled"`
    Rules   []KeywordRule  `yaml:"rules"`
}

type CustomCategoriesConfig struct {
    Enabled             bool             `yaml:"enabled"`
    SimilarityThreshold float32          `yaml:"similarity_threshold"`
    GapThreshold        float32          `yaml:"gap_threshold"`
    Categories          []CustomCategory `yaml:"categories"`
}

type CustomCategory struct {
    ID              string       `yaml:"id"`
    Name            string       `yaml:"name"`
    Description     string       `yaml:"description"`
    Examples        []string     `yaml:"examples"`
    CandidateModels []string     `yaml:"candidate_models"`
}

// Helper methods for semantic chain

// HasStage checks if a specific stage is in the semantic chain
func (r *RoutingConfig) HasStage(stage string) bool {
    for _, s := range r.SemanticChain {
        if s == stage {
            return true
        }
    }
    return false
}

// IsKeywordEnabled checks if keyword routing is enabled in the chain
func (r *RoutingConfig) IsKeywordEnabled() bool {
    return r.HasStage("keyword")
}

// IsSimilarityEnabled checks if similarity routing is enabled in the chain
func (r *RoutingConfig) IsSimilarityEnabled() bool {
    return r.HasStage("similarity-category")
}

// IsFinetuneEnabled checks if finetune category routing is enabled in the chain
func (r *RoutingConfig) IsFinetuneEnabled() bool {
    return r.HasStage("finetune-category")
}

// IsSequentialMode checks if combination mode is sequential
func (r *RoutingConfig) IsSequentialMode() bool {
    return r.CombinationMode == "sequential"
}

// IsIntersectionMode checks if combination mode is intersection
func (r *RoutingConfig) IsIntersectionMode() bool {
    return r.CombinationMode == "intersection"
}
```

#### 4. Integration Points

**Modify `handleModelRouting()` in `pkg/extproc/request_handler.go`:**

```go
func (r *OpenAIRouter) handleModelRouting(...) (*ext_proc.ProcessingResponse, error) {
    // ... existing code ...

    if originalModel == "auto" {
        var selectedModel string
        var routingDecision *RoutingDecision

        // Use three-stage router if any advanced routing is enabled
        if r.Config.KeywordRouting.Enabled || r.Config.CustomCategories.Enabled {
            selectedModel, routingDecision = r.ThreeStageRouter.SelectModel(userContent)

            // Log detailed three-stage routing decision
            observability.Infof("Three-stage routing decision: model=%s, strategy=%s",
                selectedModel, routingDecision.Strategy)
            observability.Infof("  Stage 1 (Keyword): matches=%v, candidates=%v",
                routingDecision.KeywordMatches, routingDecision.KeywordCandidates)
            observability.Infof("  Stage 2 (Similarity): category=%s, score=%.3f, gap=%.3f, candidates=%v",
                routingDecision.SimilarityCategory, routingDecision.SimilarityScore,
                routingDecision.SimilarityGap, routingDecision.SimilarityCandidates)
            observability.Infof("  Stage 3 (Category): prediction=%s, confidence=%.3f, candidates=%v",
                routingDecision.CategoryPrediction, routingDecision.CategoryConfidence,
                routingDecision.CategoryCandidates)
            observability.Infof("  Final: candidates=%v, path=%s",
                routingDecision.FinalCandidates, routingDecision.DecisionPath)

            // Record metrics
            metrics.RecordThreeStageRouting(
                routingDecision.Strategy,
                routingDecision.KeywordMatches,
                routingDecision.SimilarityCategory,
                routingDecision.CategoryPrediction,
                selectedModel,
            )
        } else {
            // Fallback to existing category-only routing
            selectedModel = r.classifyAndSelectBestModel(userContent)
        }

        matchedModel = selectedModel
    }

    // ... rest of the code ...
}
```

**Router Initialization in `pkg/extproc/router.go`:**

```go
func NewOpenAIRouter(configPath string) (*OpenAIRouter, error) {
    // ... existing initialization code ...

    // Initialize keyword matcher if enabled
    var keywordMatcher *keyword.KeywordMatcher
    if cfg.KeywordRouting.Enabled {
        keywordMatcher = keyword.NewKeywordMatcher(cfg.KeywordRouting)
        observability.Infof("Initialized keyword matcher with %d rules", len(cfg.KeywordRouting.Rules))
    }

    // Initialize similarity matcher if custom categories are enabled
    var similarityMatcher *similarity.SimilarityMatcher
    if cfg.CustomCategories.Enabled {
        similarityMatcher = similarity.NewSimilarityMatcher(cfg.CustomCategories)

        // Pre-compute and cache category embeddings
        if err := similarityMatcher.InitializeEmbeddings(); err != nil {
            return nil, fmt.Errorf("failed to initialize similarity matcher: %w", err)
        }

        observability.Infof("Initialized similarity matcher with %d custom categories",
            len(cfg.CustomCategories.Categories))
    }

    // Create three-stage router
    var threeStageRouter *ThreeStageRouter
    if keywordMatcher != nil || similarityMatcher != nil {
        threeStageRouter = &ThreeStageRouter{
            config:            cfg,
            keywordMatcher:    keywordMatcher,
            similarityMatcher: similarityMatcher,
            classifier:        classifier,
        }
    }

    router := &OpenAIRouter{
        Config:           cfg,
        Classifier:       classifier,
        ThreeStageRouter: threeStageRouter,
        // ... other fields ...
    }

    return router, nil
}
```

#### 5. Observability & Metrics

**New Metrics:**
- `three_stage_routing_decisions_total{strategy, keyword_match, similarity_match, category}` - Counter of routing decisions
- `keyword_match_count{rule_name}` - Counter of keyword rule matches
- `similarity_match_count{category_id}` - Counter of similarity category matches
- `similarity_score_histogram{category_id}` - Histogram of similarity scores
- `similarity_gap_histogram{category_id}` - Histogram of similarity gaps (top-1 vs top-2)
- `routing_strategy_duration_seconds{strategy, stage}` - Histogram of routing latency by strategy and stage
- `candidate_models_count{strategy, stage}` - Histogram of candidate model counts per stage
- `stage_execution_duration_seconds{stage}` - Histogram of individual stage execution time
  - stage: "keyword", "similarity", "category"

**Enhanced Logging:**
```
[INFO] Query: "How to secure a Kubernetes cluster with RBAC?"
[INFO] Routing strategy: keyword_then_similarity_then_category
[INFO]
[INFO] Stage 1 - Keyword Matching:
[INFO]   - Matched rule: 'kubernetes-infrastructure' (keywords: kubernetes)
[INFO]   - Candidate models: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
[INFO]
[INFO] Stage 2 - Similarity Matching:
[INFO]   - Query embedding generated (384 dimensions)
[INFO]   - Top-3 matches:
[INFO]     1. cloud-native-security (similarity: 0.82, gap: 0.15)
[INFO]     2. security-critical (similarity: 0.67, gap: 0.10)
[INFO]     3. kubernetes-infrastructure (similarity: 0.57)
[INFO]   - Selected: cloud-native-security (confident: true)
[INFO]   - Candidate models: [k8s-expert, security-model, cloud-model]
[INFO]
[INFO] Stage 3 - Category Classification:
[INFO]   - Predicted category: computer science (confidence: 0.85)
[INFO]   - Candidate models: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
[INFO]
[INFO] Final Decision:
[INFO]   - Intersection (Stage1 ∩ Stage2 ∩ Stage3): [k8s-expert]
[INFO]   - Best model by score: k8s-expert (score: 0.95)
[INFO]   - Total routing time: 45ms (keyword: 2ms, similarity: 15ms, category: 25ms, selection: 3ms)
[INFO]
[INFO] Final selection: k8s-expert
```

### Additional context

**Benefits:**

1. **Three-Stage Progressive Filtering**: Narrow down model candidates through keyword → similarity → category stages
2. **Custom Categories Without Training**: Define domain-specific categories using natural language descriptions and examples
3. **Flexible Combination**: Support intersection, union, and sequential strategies across all three stages
4. **Backward Compatible**: Existing category-only routing continues to work; all new features are opt-in
5. **Zero-Shot Capability**: Add new custom categories without retraining models (similarity-based)
6. **Performance Optimized**:
   - Keyword matching: O(n) where n = number of keywords (fastest)
   - Similarity matching: O(m) where m = number of custom categories (fast, pre-computed embeddings)
   - Category classification: O(1) model inference (slowest, only when needed)
7. **Deterministic + Semantic**: Combine deterministic keyword rules with semantic understanding

**Use Cases:**

1. **Cloud-Native Security**: Route "Kubernetes security" queries through keyword (k8s) → similarity (cloud-native-security) → category (computer science)
2. **Travel Planning**: Route travel queries to specialized models using similarity matching without retraining
3. **Legal Compliance**: Filter by compliance keywords, match legal domain by similarity, select best model by category
4. **Multi-Domain Routing**: Support unlimited custom domains (travel, legal, finance, healthcare) without model retraining
5. **Performance Optimization**: Pre-filter models by keywords to reduce similarity/classification overhead

**Implementation Phases:**

**Phase 1: Core Implementation (Keyword + Category)**
- Implement KeywordMatcher with AND/OR logic
- Add configuration structures for keyword routing
- Implement basic routing strategies (keyword_only, category_only, keyword_then_category)
- Add unit tests for keyword matching

**Phase 2: Similarity Matching (NEW - from issue #312)**
- Implement SimilarityMatcher with embedding-based matching
- Add CustomCategoriesConfig to configuration
- Implement embedding pre-computation and caching
- Add similarity threshold and gap validation
- Implement fallback to category classification
- Add unit tests for similarity matching

**Phase 3: Three-Stage Integration**
- Implement ThreeStageRouter orchestration
- Add all routing strategies (intersection, union, sequential)
- Implement fallback logic across all stages
- Add integration tests for three-stage routing

**Phase 4: Observability & Optimization**
- Add comprehensive metrics for all three stages
- Add detailed logging with stage-by-stage breakdown
- Add routing decision headers
- Performance benchmarks and optimization
- Add E2E tests with real queries

**Technical Considerations:**

1. **Embedding Model**: Reuse existing BERT model (`sentence-transformers/all-MiniLM-L12-v2`)
   - 384-dimensional embeddings
   - ~120MB model size
   - Fast inference (~10-20ms per query)

2. **Embedding Caching**: Pre-compute and cache category embeddings at initialization
   - Combine description + examples for better matching
   - Normalize embeddings for cosine similarity
   - Store in memory for fast retrieval

3. **Similarity Calculation**: Use cosine similarity with normalized embeddings
   - Efficient for <100 custom categories (brute-force acceptable)
   - For >1000 categories, consider ANN libraries (FAISS, HNSWlib)

4. **Threshold Tuning**:
   - `similarity_threshold`: 0.75 (recommended starting point)
   - `gap_threshold`: 0.05 (avoid ambiguous matches)
   - Monitor metrics to tune thresholds

5. **Fallback Strategy**: Ensure robust fallback at each stage
   - No keyword match → proceed to similarity
   - No similarity match → proceed to category
   - No category match → use default model

**Testing Requirements:**

1. **Unit Tests**:
   - KeywordMatcher: AND/OR logic, case sensitivity, priority ordering
   - SimilarityMatcher: embedding generation, cosine similarity, threshold validation
   - ThreeStageRouter: all routing strategies, fallback logic

2. **Integration Tests**:
   - Three-stage routing flow with all strategies
   - Intersection/union logic across stages
   - Fallback behavior at each stage

3. **E2E Tests**:
   - Real queries with expected routing decisions
   - Performance benchmarks (latency per stage)
   - Stress tests with many custom categories

4. **Configuration Tests**:
   - Validation of keyword rules
   - Validation of custom categories
   - Invalid configuration handling

**Related Files:**
- `src/semantic-router/pkg/config/config.go` - Configuration structures
- `src/semantic-router/pkg/extproc/request_handler.go` - Request routing logic
- `src/semantic-router/pkg/extproc/three_stage_router.go` - Three-stage router (NEW)
- `src/semantic-router/pkg/utils/keyword/matcher.go` - Keyword matcher (NEW)
- `src/semantic-router/pkg/utils/similarity/matcher.go` - Similarity matcher (NEW)
- `src/semantic-router/pkg/utils/classification/classifier.go` - Category classifier
- `candle-binding/semantic-router.go` - BERT embedding functions
- `config/config.yaml` - Configuration example

**Related Issues:**
- Issue #312: Support Similarity-Based Custom Category Routing for Dynamic Model Selection

**Future Enhancements:**
- Support multiple embedding models (BGE for Chinese, MPNet for higher accuracy)
- Implement category embedding versioning for model upgrades
- Add A/B testing framework for threshold tuning
- Support hierarchical categories with parent-child relationships
- Integrate with vector databases (Milvus, Qdrant) for large-scale deployments (>1000 categories)
- Add negative keywords support (e.g., "NOT contains X")
- Add keyword weighting/scoring instead of binary match

/area core
/milestone v0.1
/priority P0



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Semantic Chain for Flexible Multi-Stage Model Routing #310

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Architecture Overview

1. Three-Stage Routing Configuration

2. Complete End-to-End Flow Diagram

3. Semantic Chain Routing Examples

3. Combination Mode Comparison

4. Three-Stage Implementation Architecture

4. Integration Points

5. Observability & Metrics

Additional context

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Semantic Chain for Flexible Multi-Stage Model Routing #310

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Architecture Overview

1. Three-Stage Routing Configuration

2. Complete End-to-End Flow Diagram

3. Semantic Chain Routing Examples

3. Combination Mode Comparison

4. Three-Stage Implementation Architecture

4. Integration Points

5. Observability & Metrics

Additional context

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions