-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Is your feature request related to a problem? Please describe.
The current semantic routing implementation only supports category-based model selection using a fine-tuned ModernBERT classifier. This single-stage approach has critical limitations:
- No Customizable Routing Pipeline: Cannot define custom routing stages or their execution order
- Limited Flexibility: Cannot combine multiple routing criteria (keywords + similarity + categories) to progressively narrow down model candidates
- No Multi-Stage Filtering: Cannot first filter models by keywords, then by semantic similarity, then select the best one using category classification
- Inflexible Combination Logic: Cannot choose between sequential filtering or parallel intersection of routing stages
- Fixed Category Set: Constrained to predefined MMLU-Pro categories from the training dataset (14 categories)
- No Custom Categories: Cannot define domain-specific categories without retraining the classifier
Real-World Scenario:
A user wants to route queries about "Kubernetes security best practices" to specialized models. The ideal workflow would be:
- Stage 1 - Keyword Filtering: Match keywords "Kubernetes" → [k8s-expert, devops-model, cloud-model, security-model]
- Stage 2 - Similarity Matching: Match custom category "cloud-native-security" by semantic similarity → [k8s-expert, security-model, cloud-model]
- Stage 3 - Category Classification: Classify as "computer science" → [k8s-expert, devops-model, cloud-model]
- Combination: Take intersection of all three stages → [k8s-expert, cloud-model]
- Final Selection: Select best model by score → k8s-expert (score: 0.95)
Currently, this is impossible without retraining the classifier or manually managing complex routing logic.
Related Issues:
- Issue Support Similarity-Based Custom Category Routing for Dynamic Model Selection #312: Support Similarity-Based Custom Category Routing for Dynamic Model Selection
Describe the solution you'd like
Implement a Semantic Chain routing system that allows users to define custom multi-stage routing pipelines through configuration. The solution should enable:
Core Concept: A semantic_chain
array that defines the order and stages of routing execution:
routing:
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
combination_mode: "sequential" # or "intersection"
Key Features:
- Flexible Stage Ordering: Define any order of routing stages (keyword → similarity → category, or any other combination)
- Stage Selection: Enable/disable stages by including/excluding them from the chain
- Combination Modes: Choose between sequential filtering or parallel intersection
- Zero-Shot Custom Categories: Add domain-specific categories without retraining (via similarity matching)
- Backward Compatible: Existing category-only routing continues to work (
semantic_chain: ["finetune-category"]
)
Architecture Overview
Sequential Mode Flow:
flowchart TD
A[User Query] --> B{Parse semantic_chain}
B --> C[Stage 1: keyword]
C --> D[Candidate Models A]
D --> E[Stage 2: similarity-category]
E --> F[Filter A with Similarity]
F --> G[Candidate Models B]
G --> H[Stage 3: finetune-category]
H --> I[Filter B with Category]
I --> J[Final Candidates]
J --> K[Select Best Model by Score]
K --> L[Return Selected Model]
style A fill:#e1f5ff
style L fill:#c8e6c9
style C fill:#fff9c4
style E fill:#ffe0b2
style H fill:#f8bbd0
Intersection Mode Flow:
flowchart TD
A[User Query] --> B{Parse semantic_chain}
B --> C[Stage 1: keyword]
B --> D[Stage 2: similarity-category]
B --> E[Stage 3: finetune-category]
C --> F[Candidate Models A]
D --> G[Candidate Models B]
E --> H[Candidate Models C]
F --> I[Intersection: A ∩ B ∩ C]
G --> I
H --> I
I --> J[Final Candidates]
J --> K[Select Best Model by Score]
K --> L[Return Selected Model]
style A fill:#e1f5ff
style L fill:#c8e6c9
style C fill:#fff9c4
style D fill:#ffe0b2
style E fill:#f8bbd0
style I fill:#d1c4e9
Stage Details:
graph LR
subgraph "Stage 1: keyword"
K1[Match Keywords] --> K2[AND/OR Logic]
K2 --> K3[Case Sensitivity]
K3 --> K4[Output: Models A]
end
subgraph "Stage 2: similarity-category"
S1[Generate Embedding] --> S2[Cosine Similarity]
S2 --> S3[Threshold Check]
S3 --> S4[Gap Validation]
S4 --> S5[Output: Models B]
end
subgraph "Stage 3: finetune-category"
F1[ModernBERT Inference] --> F2[Category Prediction]
F2 --> F3[Confidence Check]
F3 --> F4[Output: Models C]
end
style K1 fill:#fff9c4
style S1 fill:#ffe0b2
style F1 fill:#f8bbd0
The solution should provide:
1. Three-Stage Routing Configuration
# Three-stage hybrid routing configuration
routing:
# Semantic routing chain: defines the order and stages of routing
# Each stage filters candidate models, and stages are executed in order
# Available stages:
# - "keyword": Keyword-based matching
# - "similarity-category": Similarity-based custom category matching
# - "finetune-category": Fine-tuned ModernBERT category classification
#
# Examples:
# ["keyword", "finetune-category"] - Keyword then category
# ["similarity-category", "finetune-category"] - Similarity then category
# ["keyword", "similarity-category", "finetune-category"] - All three stages
# ["finetune-category"] - Category only (existing behavior)
# ["keyword"] - Keyword only
# ["similarity-category"] - Similarity only
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
# Combination mode: how to combine results from multiple stages
# Options:
# - "intersection": Take intersection of candidates from all stages (A ∩ B ∩ C)
# - "sequential": Each stage filters the output of previous stage (A → B → C)
combination_mode: "sequential" # or "intersection"
# Fallback behavior when no models match after all stages
fallback: "default_model"
# Stage 1: Keyword-based routing rules
keyword_routing:
enabled: true
rules:
- name: "kubernetes-infrastructure"
description: "Route Kubernetes-related queries to infrastructure models"
keywords:
operator: "OR" # OR | AND
case_sensitive: false
terms:
- "kubernetes"
- "k8s"
- "kubectl"
- "helm"
- "pod"
- "deployment"
# Candidate models that match these keywords
candidate_models:
- "infrastructure-expert-model"
- "devops-specialist-model"
- "cloud-native-model"
priority: 100
- name: "database-operations"
keywords:
operator: "AND" # Must contain ALL keywords
case_sensitive: false
terms:
- "database"
- "query"
candidate_models:
- "database-expert-model"
- "sql-specialist-model"
- "data-engineer-model"
priority: 90
- name: "security-critical"
keywords:
operator: "OR"
case_sensitive: true # Case-sensitive for CVE IDs
terms:
- "CVE-"
- "vulnerability"
- "exploit"
- "security"
candidate_models:
- "security-hardened-model"
- "compliance-model"
priority: 95
# Stage 2: Similarity-based custom category routing (NEW - from issue #312)
custom_categories:
enabled: true
similarity_threshold: 0.75 # Minimum cosine similarity for matching
gap_threshold: 0.05 # Minimum gap between top-1 and top-2 to avoid ambiguity
categories:
- id: "cloud-native-security"
name: "Cloud Native Security"
description: "Security best practices for cloud-native applications, Kubernetes security, container security, and DevSecOps"
examples:
- "How to secure a Kubernetes cluster?"
- "Best practices for container image scanning"
- "Implementing RBAC in Kubernetes"
- "How to prevent privilege escalation in containers?"
candidate_models:
- "k8s-expert"
- "security-model"
- "cloud-model"
- id: "travel-planning"
name: "Travel & Tourism"
description: "Travel planning, destination recommendations, visa requirements, and tourism information"
examples:
- "Recommend a 3-day itinerary for Paris"
- "What documents do I need for a Schengen visa?"
- "Best time to visit Japan for cherry blossoms"
candidate_models:
- "travel-expert-model"
- "general-assistant-model"
- id: "legal-consulting"
name: "Legal Consultation"
description: "Questions about laws, regulations, legal procedures, and compliance"
examples:
- "What are the latest changes in civil law?"
- "How to apply for legal aid?"
- "GDPR compliance requirements for data processing"
candidate_models:
- "legal-expert-model"
- "compliance-model"
# Stage 3: Existing category-based routing (unchanged)
categories:
- name: computer science
model_scores:
- model: infrastructure-expert-model
score: 0.9
- model: devops-specialist-model
score: 0.85
- model: k8s-expert
score: 0.95
- model: general-cs-model
score: 0.7
- name: engineering
model_scores:
- model: cloud-native-model
score: 0.9
- model: infrastructure-expert-model
score: 0.8
# BERT model for similarity calculation (reuse existing config)
bert_model:
model_id: sentence-transformers/all-MiniLM-L12-v2
threshold: 0.6
use_cpu: true
2. Complete End-to-End Flow Diagram
sequenceDiagram
participant User
participant Router as SemanticChainRouter
participant KM as KeywordMatcher
participant SM as SimilarityMatcher
participant FC as FinetuneClassifier
participant Selector as ModelSelector
User->>Router: Query: "How to secure K8s with RBAC?"
Router->>Router: Parse semantic_chain config
Note over Router: semantic_chain: ["keyword", "similarity-category", "finetune-category"]
Note over Router: combination_mode: "sequential"
rect rgb(255, 249, 196)
Note over Router,KM: Stage 1: keyword
Router->>KM: Match keywords in query
KM->>KM: Check "kubernetes", "k8s", "kubectl"
KM->>KM: Match rule: "kubernetes-infrastructure"
KM-->>Router: Candidates A: [k8s-expert, devops-model, cloud-model, infra-model]
end
rect rgb(255, 224, 178)
Note over Router,SM: Stage 2: similarity-category
Router->>SM: Generate embedding + match categories
SM->>SM: Embedding: [0.12, -0.45, 0.78, ...]
SM->>SM: Calculate similarity with custom categories
SM->>SM: Best match: "cloud-native-security" (0.82)
SM->>SM: Gap check: 0.82 - 0.67 = 0.15 > 0.05 ✓
SM-->>Router: Candidates B: [k8s-expert, security-model, cloud-model]
Router->>Router: Filter A with B: [k8s-expert, cloud-model]
end
rect rgb(248, 187, 208)
Note over Router,FC: Stage 3: finetune-category
Router->>FC: Classify query
FC->>FC: ModernBERT inference
FC->>FC: Predicted: "computer science" (0.85)
FC-->>Router: Candidates C: [k8s-expert, devops-model, infra-model]
Router->>Router: Filter [k8s-expert, cloud-model] with C
Router->>Router: Final: [k8s-expert]
end
rect rgb(200, 230, 201)
Note over Router,Selector: Model Selection
Router->>Selector: Select best from [k8s-expert]
Selector->>Selector: Check category scores
Selector->>Selector: k8s-expert: score 0.95
Selector-->>Router: Selected: k8s-expert
end
Router-->>User: Return: k8s-expert
Configuration Flexibility Examples:
graph TD
subgraph "Example 1: Full Chain Sequential"
A1[semantic_chain: keyword, similarity, finetune] --> B1[combination_mode: sequential]
B1 --> C1[All Models → Keyword → Similarity → Category → Final]
end
subgraph "Example 2: Keyword + Category"
A2[semantic_chain: keyword, finetune] --> B2[combination_mode: sequential]
B2 --> C2[All Models → Keyword → Category → Final]
end
subgraph "Example 3: Similarity Only"
A3[semantic_chain: similarity] --> B3[combination_mode: sequential]
B3 --> C3[All Models → Similarity → Final]
end
subgraph "Example 4: Intersection Mode"
A4[semantic_chain: keyword, similarity, finetune] --> B4[combination_mode: intersection]
B4 --> C4[Keyword ∩ Similarity ∩ Category → Final]
end
style C1 fill:#c8e6c9
style C2 fill:#c8e6c9
style C3 fill:#c8e6c9
style C4 fill:#c8e6c9
3. Semantic Chain Routing Examples
Example A: Full Three-Stage Sequential Chain
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
combination_mode: "sequential"
User Query: "How to secure a Kubernetes cluster with RBAC?"
↓
Stage 1: keyword
- Match rule: "kubernetes-infrastructure"
- Output: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
↓
Stage 2: similarity-category (filters Stage 1 output)
- Input candidates: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
- Best match: "cloud-native-security" (similarity: 0.82)
- Similarity candidates: [k8s-expert, security-model, cloud-model]
- Intersection with input: [k8s-expert, cloud-native-model]
- Output: [k8s-expert, cloud-native-model]
↓
Stage 3: finetune-category (filters Stage 2 output)
- Input candidates: [k8s-expert, cloud-native-model]
- Predicted category: "computer science" (confidence: 0.85)
- Category candidates: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
- Intersection with input: [k8s-expert]
- Output: [k8s-expert]
↓
Final Selection: k8s-expert (score: 0.95)
Example B: Keyword → Category (Skip Similarity)
semantic_chain: ["keyword", "finetune-category"]
combination_mode: "sequential"
User Query: "How to secure a Kubernetes cluster with RBAC?"
↓
Stage 1: keyword
- Match rule: "kubernetes-infrastructure"
- Output: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
↓
Stage 2: finetune-category (filters Stage 1 output)
- Input candidates: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
- Predicted category: "computer science" (confidence: 0.85)
- Category candidates: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
- Intersection with input: [infrastructure-expert-model, devops-specialist-model, k8s-expert]
- Output: [infrastructure-expert-model, devops-specialist-model, k8s-expert]
↓
Final Selection: k8s-expert (score: 0.95)
Example C: Similarity Only (Zero-Shot Custom Categories)
semantic_chain: ["similarity-category"]
combination_mode: "sequential"
User Query: "Recommend a 3-day itinerary for Paris"
↓
Stage 1: similarity-category
- Generate query embedding
- Best match: "travel-planning" (similarity: 0.88, gap: 0.20)
- Output: [travel-expert-model, general-assistant-model]
↓
Final Selection: travel-expert-model (score: 0.90)
Example D: Intersection Mode (Parallel Execution)
semantic_chain: ["keyword", "similarity-category", "finetune-category"]
combination_mode: "intersection"
User Query: "How to secure a Kubernetes cluster with RBAC?"
↓
├─→ Stage 1: keyword
│ - Output A: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
│
├─→ Stage 2: similarity-category
│ - Output B: [k8s-expert, security-model, cloud-model]
│
└─→ Stage 3: finetune-category
- Output C: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
↓
Intersection (A ∩ B ∩ C): [k8s-expert]
↓
Final Selection: k8s-expert (score: 0.95)
Example E: Category Only (Existing Behavior)
semantic_chain: ["finetune-category"]
combination_mode: "sequential"
User Query: "How to secure a Kubernetes cluster with RBAC?"
↓
Stage 1: finetune-category
- Predicted category: "computer science" (confidence: 0.85)
- Output: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
↓
Final Selection: k8s-expert (score: 0.95)
3. Combination Mode Comparison
graph TB
subgraph "Sequential Mode: Progressive Filtering"
Q1[Query] --> AM1[All Models: 100 models]
AM1 --> K1[Keyword Stage]
K1 --> C1[Candidates: 20 models]
C1 --> S1[Similarity Stage]
S1 --> C2[Candidates: 8 models]
C2 --> F1[Finetune Stage]
F1 --> C3[Candidates: 3 models]
C3 --> R1[Select Best: 1 model]
style AM1 fill:#e3f2fd
style C1 fill:#fff9c4
style C2 fill:#ffe0b2
style C3 fill:#f8bbd0
style R1 fill:#c8e6c9
end
subgraph "Intersection Mode: Parallel Execution"
Q2[Query] --> AM2[All Models: 100 models]
AM2 --> K2[Keyword Stage]
AM2 --> S2[Similarity Stage]
AM2 --> F2[Finetune Stage]
K2 --> CK[Candidates A: 20 models]
S2 --> CS[Candidates B: 15 models]
F2 --> CF[Candidates C: 25 models]
CK --> INT[Intersection: A ∩ B ∩ C]
CS --> INT
CF --> INT
INT --> C4[Candidates: 5 models]
C4 --> R2[Select Best: 1 model]
style AM2 fill:#e3f2fd
style CK fill:#fff9c4
style CS fill:#ffe0b2
style CF fill:#f8bbd0
style INT fill:#d1c4e9
style R2 fill:#c8e6c9
end
Performance Characteristics:
graph LR
subgraph "Latency Breakdown"
A[Total: ~45ms] --> B[Keyword: 2ms]
A --> C[Similarity: 15ms]
A --> D[Category: 25ms]
A --> E[Selection: 3ms]
end
subgraph "Sequential Mode"
S1[Stage 1: 2ms] --> S2[Stage 2: 15ms]
S2 --> S3[Stage 3: 25ms]
S3 --> S4[Total: 42ms]
end
subgraph "Intersection Mode"
P1[Stage 1: 2ms]
P2[Stage 2: 15ms]
P3[Stage 3: 25ms]
P1 --> P4[Total: 25ms parallel]
P2 --> P4
P3 --> P4
end
style S4 fill:#fff9c4
style P4 fill:#c8e6c9
4. Three-Stage Implementation Architecture
Core Components:
- KeywordMatcher (
pkg/utils/keyword/matcher.go
)
type KeywordMatcher struct {
rules []KeywordRule
}
type KeywordRule struct {
Name string
Keywords KeywordSet
CandidateModels []string
Priority int
}
type KeywordSet struct {
Operator string // "AND" | "OR"
CaseSensitive bool
Terms []string
}
// MatchQuery returns matched rules and their candidate models
func (m *KeywordMatcher) MatchQuery(query string) []KeywordMatchResult
type KeywordMatchResult struct {
RuleName string
CandidateModels []string
Priority int
}
- SimilarityMatcher (
pkg/utils/similarity/matcher.go
- NEW)
type SimilarityMatcher struct {
categories []CustomCategory
categoryEmbeddings map[string][]float32 // Pre-computed embeddings
similarityThreshold float32
gapThreshold float32
}
type CustomCategory struct {
ID string
Name string
Description string
Examples []string
CandidateModels []string
}
// MatchQuery finds the best matching custom category by semantic similarity
func (m *SimilarityMatcher) MatchQuery(query string) (*SimilarityMatchResult, error)
type SimilarityMatchResult struct {
CategoryID string
CategoryName string
Similarity float32
Gap float32 // Difference between top-1 and top-2
CandidateModels []string
Confident bool // True if similarity >= threshold AND gap >= gapThreshold
}
// InitializeEmbeddings pre-computes and caches category embeddings
func (m *SimilarityMatcher) InitializeEmbeddings() error {
for _, category := range m.categories {
// Combine description and examples for better matching
text := category.Description
for _, example := range category.Examples {
text += " " + example
}
// Generate embedding using existing BERT model
embedding, err := candle_binding.GetEmbedding(text, 512)
if err != nil {
return fmt.Errorf("failed to generate embedding for category %s: %w", category.ID, err)
}
// Normalize embedding for cosine similarity
normalized := normalizeEmbedding(embedding)
m.categoryEmbeddings[category.ID] = normalized
}
return nil
}
// calculateCosineSimilarity computes cosine similarity between two normalized embeddings
func calculateCosineSimilarity(a, b []float32) float32 {
var dotProduct float32
for i := 0; i < len(a) && i < len(b); i++ {
dotProduct += a[i] * b[i]
}
return dotProduct
}
- SemanticChainRouter (
pkg/extproc/semantic_chain_router.go
- NEW)
type SemanticChainRouter struct {
config *config.RouterConfig
keywordMatcher *keyword.KeywordMatcher
similarityMatcher *similarity.SimilarityMatcher
classifier *classification.Classifier
}
// SelectModel performs semantic chain routing based on configured chain
func (r *SemanticChainRouter) SelectModel(query string) (string, *RoutingDecision) {
decision := &RoutingDecision{
SemanticChain: r.config.Routing.SemanticChain,
CombinationMode: r.config.Routing.CombinationMode,
StageResults: make(map[string]*StageResult),
}
// Execute routing based on combination mode
if r.config.Routing.IsSequentialMode() {
return r.executeSequential(query, decision)
} else if r.config.Routing.IsIntersectionMode() {
return r.executeIntersection(query, decision)
}
// Fallback to default model
return r.config.DefaultModel, decision
}
// executeSequential executes stages sequentially, each filtering the previous output
func (r *SemanticChainRouter) executeSequential(query string, decision *RoutingDecision) (string, *RoutingDecision) {
var candidates []string
for i, stage := range r.config.Routing.SemanticChain {
stageResult := &StageResult{
Stage: stage,
InputModels: candidates,
}
switch stage {
case "keyword":
candidates = r.executeKeywordStage(query, candidates, stageResult)
case "similarity-category":
candidates = r.executeSimilarityStage(query, candidates, stageResult)
case "finetune-category":
candidates = r.executeFinetuneStage(query, candidates, stageResult)
}
decision.StageResults[stage] = stageResult
// If no candidates after this stage, stop
if len(candidates) == 0 {
observability.Warnf("No candidates after stage %d (%s), using fallback", i+1, stage)
return r.config.DefaultModel, decision
}
}
// Select best model from final candidates
selectedModel := r.selectBestModel(candidates, decision)
decision.SelectedModel = selectedModel
decision.FinalCandidates = candidates
return selectedModel, decision
}
// executeIntersection executes all stages in parallel and takes intersection
func (r *SemanticChainRouter) executeIntersection(query string, decision *RoutingDecision) (string, *RoutingDecision) {
allCandidates := make([][]string, 0)
for _, stage := range r.config.Routing.SemanticChain {
stageResult := &StageResult{
Stage: stage,
}
var candidates []string
switch stage {
case "keyword":
candidates = r.executeKeywordStage(query, nil, stageResult)
case "similarity-category":
candidates = r.executeSimilarityStage(query, nil, stageResult)
case "finetune-category":
candidates = r.executeFinetuneStage(query, nil, stageResult)
}
decision.StageResults[stage] = stageResult
if len(candidates) > 0 {
allCandidates = append(allCandidates, candidates)
}
}
// Take intersection of all candidate sets
intersection := r.intersectCandidates(allCandidates)
if len(intersection) == 0 {
observability.Warnf("No candidates in intersection, using fallback")
return r.config.DefaultModel, decision
}
// Select best model from intersection
selectedModel := r.selectBestModel(intersection, decision)
decision.SelectedModel = selectedModel
decision.FinalCandidates = intersection
return selectedModel, decision
}
type RoutingDecision struct {
SelectedModel string
SemanticChain []string
CombinationMode string
StageResults map[string]*StageResult
FinalCandidates []string
DecisionPath string
}
type StageResult struct {
Stage string
InputModels []string // Input to this stage (for sequential mode)
OutputModels []string // Output from this stage
// Stage-specific details
KeywordMatches []string // For keyword stage
SimilarityMatch *SimilarityMatch // For similarity stage
CategoryMatch *CategoryMatch // For finetune stage
ExecutionTime time.Duration
}
type SimilarityMatch struct {
CategoryID string
CategoryName string
Similarity float32
Gap float32
Confident bool
}
type CategoryMatch struct {
Category string
Confidence float64
}
- Configuration Extension (
pkg/config/config.go
)
type RouterConfig struct {
// ... existing fields ...
Routing RoutingConfig `yaml:"routing"`
KeywordRouting KeywordRoutingConfig `yaml:"keyword_routing"`
CustomCategories CustomCategoriesConfig `yaml:"custom_categories"` // NEW
}
type RoutingConfig struct {
// Semantic routing chain: defines the order and stages of routing
// Each element represents a routing stage to execute
// Available stages:
// - "keyword": Keyword-based matching
// - "similarity-category": Similarity-based custom category matching
// - "finetune-category": Fine-tuned ModernBERT category classification
//
// Examples:
// ["keyword", "finetune-category"] - Keyword then category
// ["similarity-category", "finetune-category"] - Similarity then category
// ["keyword", "similarity-category", "finetune-category"] - All three stages
// ["finetune-category"] - Category only (existing behavior)
SemanticChain []string `yaml:"semantic_chain"`
// Combination mode: how to combine results from multiple stages
// Options:
// - "sequential": Each stage filters the output of previous stage (A → B → C)
// - "intersection": Take intersection of candidates from all stages (A ∩ B ∩ C)
CombinationMode string `yaml:"combination_mode"`
// Fallback behavior when no models match after all stages
Fallback string `yaml:"fallback"` // "default_model"
}
type KeywordRoutingConfig struct {
Enabled bool `yaml:"enabled"`
Rules []KeywordRule `yaml:"rules"`
}
type CustomCategoriesConfig struct {
Enabled bool `yaml:"enabled"`
SimilarityThreshold float32 `yaml:"similarity_threshold"`
GapThreshold float32 `yaml:"gap_threshold"`
Categories []CustomCategory `yaml:"categories"`
}
type CustomCategory struct {
ID string `yaml:"id"`
Name string `yaml:"name"`
Description string `yaml:"description"`
Examples []string `yaml:"examples"`
CandidateModels []string `yaml:"candidate_models"`
}
// Helper methods for semantic chain
// HasStage checks if a specific stage is in the semantic chain
func (r *RoutingConfig) HasStage(stage string) bool {
for _, s := range r.SemanticChain {
if s == stage {
return true
}
}
return false
}
// IsKeywordEnabled checks if keyword routing is enabled in the chain
func (r *RoutingConfig) IsKeywordEnabled() bool {
return r.HasStage("keyword")
}
// IsSimilarityEnabled checks if similarity routing is enabled in the chain
func (r *RoutingConfig) IsSimilarityEnabled() bool {
return r.HasStage("similarity-category")
}
// IsFinetuneEnabled checks if finetune category routing is enabled in the chain
func (r *RoutingConfig) IsFinetuneEnabled() bool {
return r.HasStage("finetune-category")
}
// IsSequentialMode checks if combination mode is sequential
func (r *RoutingConfig) IsSequentialMode() bool {
return r.CombinationMode == "sequential"
}
// IsIntersectionMode checks if combination mode is intersection
func (r *RoutingConfig) IsIntersectionMode() bool {
return r.CombinationMode == "intersection"
}
4. Integration Points
Modify handleModelRouting()
in pkg/extproc/request_handler.go
:
func (r *OpenAIRouter) handleModelRouting(...) (*ext_proc.ProcessingResponse, error) {
// ... existing code ...
if originalModel == "auto" {
var selectedModel string
var routingDecision *RoutingDecision
// Use three-stage router if any advanced routing is enabled
if r.Config.KeywordRouting.Enabled || r.Config.CustomCategories.Enabled {
selectedModel, routingDecision = r.ThreeStageRouter.SelectModel(userContent)
// Log detailed three-stage routing decision
observability.Infof("Three-stage routing decision: model=%s, strategy=%s",
selectedModel, routingDecision.Strategy)
observability.Infof(" Stage 1 (Keyword): matches=%v, candidates=%v",
routingDecision.KeywordMatches, routingDecision.KeywordCandidates)
observability.Infof(" Stage 2 (Similarity): category=%s, score=%.3f, gap=%.3f, candidates=%v",
routingDecision.SimilarityCategory, routingDecision.SimilarityScore,
routingDecision.SimilarityGap, routingDecision.SimilarityCandidates)
observability.Infof(" Stage 3 (Category): prediction=%s, confidence=%.3f, candidates=%v",
routingDecision.CategoryPrediction, routingDecision.CategoryConfidence,
routingDecision.CategoryCandidates)
observability.Infof(" Final: candidates=%v, path=%s",
routingDecision.FinalCandidates, routingDecision.DecisionPath)
// Record metrics
metrics.RecordThreeStageRouting(
routingDecision.Strategy,
routingDecision.KeywordMatches,
routingDecision.SimilarityCategory,
routingDecision.CategoryPrediction,
selectedModel,
)
} else {
// Fallback to existing category-only routing
selectedModel = r.classifyAndSelectBestModel(userContent)
}
matchedModel = selectedModel
}
// ... rest of the code ...
}
Router Initialization in pkg/extproc/router.go
:
func NewOpenAIRouter(configPath string) (*OpenAIRouter, error) {
// ... existing initialization code ...
// Initialize keyword matcher if enabled
var keywordMatcher *keyword.KeywordMatcher
if cfg.KeywordRouting.Enabled {
keywordMatcher = keyword.NewKeywordMatcher(cfg.KeywordRouting)
observability.Infof("Initialized keyword matcher with %d rules", len(cfg.KeywordRouting.Rules))
}
// Initialize similarity matcher if custom categories are enabled
var similarityMatcher *similarity.SimilarityMatcher
if cfg.CustomCategories.Enabled {
similarityMatcher = similarity.NewSimilarityMatcher(cfg.CustomCategories)
// Pre-compute and cache category embeddings
if err := similarityMatcher.InitializeEmbeddings(); err != nil {
return nil, fmt.Errorf("failed to initialize similarity matcher: %w", err)
}
observability.Infof("Initialized similarity matcher with %d custom categories",
len(cfg.CustomCategories.Categories))
}
// Create three-stage router
var threeStageRouter *ThreeStageRouter
if keywordMatcher != nil || similarityMatcher != nil {
threeStageRouter = &ThreeStageRouter{
config: cfg,
keywordMatcher: keywordMatcher,
similarityMatcher: similarityMatcher,
classifier: classifier,
}
}
router := &OpenAIRouter{
Config: cfg,
Classifier: classifier,
ThreeStageRouter: threeStageRouter,
// ... other fields ...
}
return router, nil
}
5. Observability & Metrics
New Metrics:
three_stage_routing_decisions_total{strategy, keyword_match, similarity_match, category}
- Counter of routing decisionskeyword_match_count{rule_name}
- Counter of keyword rule matchessimilarity_match_count{category_id}
- Counter of similarity category matchessimilarity_score_histogram{category_id}
- Histogram of similarity scoressimilarity_gap_histogram{category_id}
- Histogram of similarity gaps (top-1 vs top-2)routing_strategy_duration_seconds{strategy, stage}
- Histogram of routing latency by strategy and stagecandidate_models_count{strategy, stage}
- Histogram of candidate model counts per stagestage_execution_duration_seconds{stage}
- Histogram of individual stage execution time- stage: "keyword", "similarity", "category"
Enhanced Logging:
[INFO] Query: "How to secure a Kubernetes cluster with RBAC?"
[INFO] Routing strategy: keyword_then_similarity_then_category
[INFO]
[INFO] Stage 1 - Keyword Matching:
[INFO] - Matched rule: 'kubernetes-infrastructure' (keywords: kubernetes)
[INFO] - Candidate models: [infrastructure-expert-model, devops-specialist-model, cloud-native-model, k8s-expert]
[INFO]
[INFO] Stage 2 - Similarity Matching:
[INFO] - Query embedding generated (384 dimensions)
[INFO] - Top-3 matches:
[INFO] 1. cloud-native-security (similarity: 0.82, gap: 0.15)
[INFO] 2. security-critical (similarity: 0.67, gap: 0.10)
[INFO] 3. kubernetes-infrastructure (similarity: 0.57)
[INFO] - Selected: cloud-native-security (confident: true)
[INFO] - Candidate models: [k8s-expert, security-model, cloud-model]
[INFO]
[INFO] Stage 3 - Category Classification:
[INFO] - Predicted category: computer science (confidence: 0.85)
[INFO] - Candidate models: [infrastructure-expert-model, devops-specialist-model, k8s-expert, general-cs-model]
[INFO]
[INFO] Final Decision:
[INFO] - Intersection (Stage1 ∩ Stage2 ∩ Stage3): [k8s-expert]
[INFO] - Best model by score: k8s-expert (score: 0.95)
[INFO] - Total routing time: 45ms (keyword: 2ms, similarity: 15ms, category: 25ms, selection: 3ms)
[INFO]
[INFO] Final selection: k8s-expert
Additional context
Benefits:
- Three-Stage Progressive Filtering: Narrow down model candidates through keyword → similarity → category stages
- Custom Categories Without Training: Define domain-specific categories using natural language descriptions and examples
- Flexible Combination: Support intersection, union, and sequential strategies across all three stages
- Backward Compatible: Existing category-only routing continues to work; all new features are opt-in
- Zero-Shot Capability: Add new custom categories without retraining models (similarity-based)
- Performance Optimized:
- Keyword matching: O(n) where n = number of keywords (fastest)
- Similarity matching: O(m) where m = number of custom categories (fast, pre-computed embeddings)
- Category classification: O(1) model inference (slowest, only when needed)
- Deterministic + Semantic: Combine deterministic keyword rules with semantic understanding
Use Cases:
- Cloud-Native Security: Route "Kubernetes security" queries through keyword (k8s) → similarity (cloud-native-security) → category (computer science)
- Travel Planning: Route travel queries to specialized models using similarity matching without retraining
- Legal Compliance: Filter by compliance keywords, match legal domain by similarity, select best model by category
- Multi-Domain Routing: Support unlimited custom domains (travel, legal, finance, healthcare) without model retraining
- Performance Optimization: Pre-filter models by keywords to reduce similarity/classification overhead
Implementation Phases:
Phase 1: Core Implementation (Keyword + Category)
- Implement KeywordMatcher with AND/OR logic
- Add configuration structures for keyword routing
- Implement basic routing strategies (keyword_only, category_only, keyword_then_category)
- Add unit tests for keyword matching
Phase 2: Similarity Matching (NEW - from issue #312)
- Implement SimilarityMatcher with embedding-based matching
- Add CustomCategoriesConfig to configuration
- Implement embedding pre-computation and caching
- Add similarity threshold and gap validation
- Implement fallback to category classification
- Add unit tests for similarity matching
Phase 3: Three-Stage Integration
- Implement ThreeStageRouter orchestration
- Add all routing strategies (intersection, union, sequential)
- Implement fallback logic across all stages
- Add integration tests for three-stage routing
Phase 4: Observability & Optimization
- Add comprehensive metrics for all three stages
- Add detailed logging with stage-by-stage breakdown
- Add routing decision headers
- Performance benchmarks and optimization
- Add E2E tests with real queries
Technical Considerations:
-
Embedding Model: Reuse existing BERT model (
sentence-transformers/all-MiniLM-L12-v2
)- 384-dimensional embeddings
- ~120MB model size
- Fast inference (~10-20ms per query)
-
Embedding Caching: Pre-compute and cache category embeddings at initialization
- Combine description + examples for better matching
- Normalize embeddings for cosine similarity
- Store in memory for fast retrieval
-
Similarity Calculation: Use cosine similarity with normalized embeddings
- Efficient for <100 custom categories (brute-force acceptable)
- For >1000 categories, consider ANN libraries (FAISS, HNSWlib)
-
Threshold Tuning:
similarity_threshold
: 0.75 (recommended starting point)gap_threshold
: 0.05 (avoid ambiguous matches)- Monitor metrics to tune thresholds
-
Fallback Strategy: Ensure robust fallback at each stage
- No keyword match → proceed to similarity
- No similarity match → proceed to category
- No category match → use default model
Testing Requirements:
-
Unit Tests:
- KeywordMatcher: AND/OR logic, case sensitivity, priority ordering
- SimilarityMatcher: embedding generation, cosine similarity, threshold validation
- ThreeStageRouter: all routing strategies, fallback logic
-
Integration Tests:
- Three-stage routing flow with all strategies
- Intersection/union logic across stages
- Fallback behavior at each stage
-
E2E Tests:
- Real queries with expected routing decisions
- Performance benchmarks (latency per stage)
- Stress tests with many custom categories
-
Configuration Tests:
- Validation of keyword rules
- Validation of custom categories
- Invalid configuration handling
Related Files:
src/semantic-router/pkg/config/config.go
- Configuration structuressrc/semantic-router/pkg/extproc/request_handler.go
- Request routing logicsrc/semantic-router/pkg/extproc/three_stage_router.go
- Three-stage router (NEW)src/semantic-router/pkg/utils/keyword/matcher.go
- Keyword matcher (NEW)src/semantic-router/pkg/utils/similarity/matcher.go
- Similarity matcher (NEW)src/semantic-router/pkg/utils/classification/classifier.go
- Category classifiercandle-binding/semantic-router.go
- BERT embedding functionsconfig/config.yaml
- Configuration example
Related Issues:
- Issue Support Similarity-Based Custom Category Routing for Dynamic Model Selection #312: Support Similarity-Based Custom Category Routing for Dynamic Model Selection
Future Enhancements:
- Support multiple embedding models (BGE for Chinese, MPNet for higher accuracy)
- Implement category embedding versioning for model upgrades
- Add A/B testing framework for threshold tuning
- Support hierarchical categories with parent-child relationships
- Integrate with vector databases (Milvus, Qdrant) for large-scale deployments (>1000 categories)
- Add negative keywords support (e.g., "NOT contains X")
- Add keyword weighting/scoring instead of binary match
/area core
/milestone v0.1
/priority P0