-
Notifications
You must be signed in to change notification settings - Fork 302
Description
Is your feature request related to a problem? Please describe.
Currently, the semantic router has comprehensive Mixture of Models (MoM) functionality implemented, including semantic classification, model selection algorithms, and reasoning-aware routing. However, the intelligent route tutorials (website/docs/tutorials/intelligent-route/) lack practical use case guides that demonstrate how to implement and optimize MoM strategies for real-world scenarios.
Users struggle with:
- Understanding practical MoM applications - While the overview exists in
website/docs/overview/mixture-of-models.md, there's no hands-on tutorial showing step-by-step implementation - Configuring optimal model pools - No guidance on selecting and scoring models for different use cases
- Cost-performance optimization - Missing practical examples of balancing cost, latency, and quality
- Real-world deployment patterns - Lack of concrete examples for common business scenarios
This creates a gap between the powerful MoM capabilities and user adoption, especially for teams wanting to implement intelligent routing for production workloads.
Describe the solution you'd like
Create a comprehensive Mixture of Models Use Case Guide in website/docs/tutorials/intelligent-route/mixture-of-models.md that includes:
1. Practical Use Case Scenarios
- E-commerce Customer Support: Route simple order queries to lightweight models, complex issues to premium models
- Code Assistant Platform: Direct syntax questions to fast models, architecture reviews to specialized models
- Educational Content Platform: Route basic questions to efficient models, complex explanations to reasoning-capable models
- Financial Analysis Service: Balance cost-sensitive queries with high-accuracy requirements
2. Step-by-Step Implementation Guide
- Model pool selection and configuration
- Category definition and scoring strategies
- Confidence threshold optimization
- Fallback mechanism setup
3. Configuration Examples
- Complete
config.yamlexamples for each use case - Model scoring matrices with performance data
- Reasoning family configurations
- Cost-performance trade-off examples
4. Performance Optimization Strategies
- Model selection algorithms in practice
- Confidence-based routing decisions
- Dynamic scaling patterns
- A/B testing configurations for model comparison
5. Monitoring and Analytics
- Key metrics for MoM performance evaluation
- Cost tracking and optimization
- Quality assurance patterns
- Routing decision analysis
6. Advanced Patterns
- Weighted routing configurations
- Multi-stage classification
- Provider diversity strategies
- Hybrid cloud/on-premise deployments
7. Troubleshooting and Best Practices
- Common routing issues and solutions
- Performance tuning guidelines
- Cost optimization techniques
- Quality assurance strategies
Additional context
Current Implementation Assets:
- MoM overview documentation (
website/docs/overview/mixture-of-models.md) - Model selection algorithms (
src/semantic-router/pkg/extproc/model_selector.go) - Classification system (
src/semantic-router/pkg/utils/classification/classifier.go) - Reasoning mode selection (
src/semantic-router/pkg/extproc/reason_mode_selector.go) - Example configurations (
examples/semanticroute/)
Existing Tutorial Structure:
website/docs/tutorials/intelligent-route/overview.md- Basic conceptswebsite/docs/tutorials/intelligent-route/reasoning.md- Reasoning routing guide
Technical Features to Showcase:
- 14+ query categories with semantic classification
- Model scoring and selection algorithms
- Reasoning-aware routing with confidence thresholds
- Weighted routing and traffic splitting
- Fallback mechanisms and graceful degradation
- Integration with semantic cache and content safety
Real-World Examples to Include:
- Cost savings calculations (like the $165K/year example in overview)
- Performance benchmarks across different model combinations
- Latency vs. quality trade-offs
- Provider diversity and reliability patterns
Integration Points:
- Reference existing semantic cache tutorials for performance optimization
- Link to content safety tutorials for secure routing
- Connect with observability guides for monitoring MoM performance
- Cross-reference with Kubernetes deployment for production scaling
This tutorial will bridge the gap between the powerful MoM capabilities and practical implementation, enabling users to leverage intelligent routing for cost-effective, high-performance AI applications.
/area core user-experience docs
/milestone v0.1
/priority P1
Metadata
Metadata
Assignees
Labels
Type
Projects
Status