-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Is your feature request related to a problem? Please describe.
I'm blinded when the semantic router makes routing decisions that I cannot understand or control. The current system uses a black-box ModernBERT classification model that:
- Makes opaque decisions: I have no visibility into why a request was routed to a specific model
- Offers no threshold control: I cannot adjust classification confidence or fine-tune routing sensitivity
- Lacks customizability: I cannot define custom routing rules beyond the predefined categories
- Is dataset-dependent: Routing behavior is entirely determined by training data (MMLU-Pro, Presidio) with no business logic integration
- Provides no debugging: When routing goes wrong, there's no way to understand or fix the decision logic
- Cannot scale: As my routing requirements grow exponentially, the static model approach becomes limiting
This black-box approach makes it impossible to implement business-specific routing logic, debug routing issues, or have confidence in the system's decision-making process.
Describe the solution you'd like
I want a configurable, interpretable routing rules system that extends the current semantic router to support both model-based and rule-based routing approaches:
Core Requirements
- Hybrid routing approach: Support both model-based classification AND user-defined rules
- Transparent decision-making: Every routing decision should provide a clear explanation of which rules fired and why
- User-defined rules: Ability to create custom routing logic with multiple condition types (semantic, request-based, performance-based, custom)
- Configurable thresholds: Full control over classification sensitivity and decision boundaries for both models and rules
- Real-time updates: Rules can be modified without service restart
- Scalable architecture: Support for exponentially growing rule sets (10,000+ rules)
- Rule precedence: Ability to define when rules take precedence over model classification
Rule Types Needed
- Semantic conditions: Category classification with custom thresholds, intent detection, content complexity
- Request-based conditions: Headers, metadata, user identity, time-based rules
- Performance-based conditions: Model availability, load metrics, cost optimization
- Custom conditions: External API calls, database lookups, business logic integration
Interpretability Features
- Decision explanation API: Returns reasoning for each routing choice (both model and rule-based)
- Rule execution traces: Shows which rules fired with confidence scores
- Model decision transparency: When model-based routing is used, explain which categories matched and why
- Visual decision trees: For complex routing logic
- Audit logs: Detailed decision explanations for debugging
- A/B testing framework: Compare rule effectiveness vs model-based routing
Describe alternatives you've considered
-
Fine-tuning the existing ModernBERT model:
- Pros: Leverages existing architecture
- Cons: Still black-box, requires ML expertise, dataset-dependent, no real-time control
-
Adding threshold configuration to current system:
- Pros: Minimal changes to existing code
- Cons: Still non-interpretable, limited to predefined categories, no custom logic
-
Hybrid approach (ML + Rules):
- Pros: Combines ML benefits with rule transparency
- Cons: Adds complexity, still has black-box components
-
External rule engine integration:
- Pros: Leverages existing rule engines
- Cons: Additional infrastructure, integration complexity, performance overhead
Chosen approach: Extend the existing system with a hybrid approach that supports both model-based classification and user-defined rules. This provides the best of both worlds - leveraging the existing ML capabilities while adding the transparency, configurability, and scalability needed for production use cases.
Additional context
Current Architecture Limitations
The existing semantic router uses a ModernBERT model trained on MMLU-Pro, Presidio, and jailbreak datasets. While this works well for general classification, it creates several production challenges:
- No business logic integration: Cannot incorporate company-specific routing requirements
- Debugging difficulties: When routing fails, there's no way to trace the decision process
- Limited customization: Users are constrained to predefined categories and cannot add domain-specific rules
- Scalability concerns: As routing requirements grow, the static model approach becomes a bottleneck
Hybrid Approach Benefits
By extending the current system rather than replacing it, we can:
- Preserve existing functionality: Keep the proven model-based classification for general use cases
- Add flexibility: Enable custom rules for specific business requirements
- Maintain performance: Leverage the existing high-performance ML pipeline
- Provide choice: Users can choose between model-based, rule-based, or hybrid routing strategies
Use Case Examples
- Enterprise routing: Route based on user permissions, department, or project type (rules) + general content classification (model)
- Cost optimization: Route simple queries to cheaper models, complex ones to premium models (hybrid approach)
- A/B testing: Test different routing strategies for different user segments (rules vs model comparison)
- Compliance: Route sensitive data to specific models based on regulatory requirements (rules)
- Performance tuning: Adjust routing based on real-time model performance metrics (hybrid approach)
- Fallback scenarios: Use model classification when rules don't match, or rules when model confidence is low
Technical Requirements
- Performance: Rule evaluation must complete within 50ms for 95% of requests
- Scalability: Support at least 10,000 active rules without performance degradation
- Reliability: Circuit breaker patterns for rule failure handling
- Monitoring: Comprehensive metrics and alerting for rule performance
Implementation Priority
This feature should be prioritized as P1 because it addresses fundamental limitations that prevent the semantic router from being used in production environments where interpretability, customizability, and debugging capabilities are essential.
Example Hybrid Configuration
apiVersion: vllm.ai/v1alpha1
kind: SemanticRoute
metadata:
name: hybrid-routing-example
spec:
# Global routing strategy
routing_strategy: "hybrid" # Options: "model", "rules", "hybrid"
# Model-based routing (existing functionality)
model_routing:
enabled: true
fallback_to_rules: true
confidence_threshold: 0.7
# Rule-based routing (new functionality)
rules:
- name: "enterprise-math-routing"
priority: 100
enabled: true
description: "Route complex math problems to specialized model"
conditions:
- type: "category_classification"
category: "math"
threshold: 0.8
operator: "gte"
- type: "content_complexity"
metric: "token_count"
threshold: 100
operator: "gt"
- type: "user_permission"
permission: "advanced_math_access"
operator: "equals"
value: true
actions:
- type: "route_to_model"
model: "math-specialized-model"
weight: 100
- type: "enable_reasoning"
effort: "high"
max_steps: 20
evaluation:
timeout: 100ms
fallback_action: "use_model_classification"
API Endpoints
// Rule management
POST /api/v1/rules // Create new rule
GET /api/v1/rules // List all rules
PUT /api/v1/rules/{id} // Update rule
DELETE /api/v1/rules/{id} // Delete rule
// Rule evaluation and debugging
POST /api/v1/rules/evaluate // Evaluate rules for request
GET /api/v1/rules/explain/{id} // Get decision explanation
POST /api/v1/rules/test // Test rule with sample data
Labels: enhancement
, routing
, configuration
, scalability
, interpretability
Priority: P1
Milestone: v0.2