Skip to content

Add Mixture of Models (MoM) Use Case Guide to Intelligent Route Tutorials #281

@Xunzhuo

Description

@Xunzhuo

Is your feature request related to a problem? Please describe.

Currently, the semantic router has comprehensive Mixture of Models (MoM) functionality implemented, including semantic classification, model selection algorithms, and reasoning-aware routing. However, the intelligent route tutorials (website/docs/tutorials/intelligent-route/) lack practical use case guides that demonstrate how to implement and optimize MoM strategies for real-world scenarios.

Users struggle with:

  1. Understanding practical MoM applications - While the overview exists in website/docs/overview/mixture-of-models.md, there's no hands-on tutorial showing step-by-step implementation
  2. Configuring optimal model pools - No guidance on selecting and scoring models for different use cases
  3. Cost-performance optimization - Missing practical examples of balancing cost, latency, and quality
  4. Real-world deployment patterns - Lack of concrete examples for common business scenarios

This creates a gap between the powerful MoM capabilities and user adoption, especially for teams wanting to implement intelligent routing for production workloads.

Describe the solution you'd like

Create a comprehensive Mixture of Models Use Case Guide in website/docs/tutorials/intelligent-route/mixture-of-models.md that includes:

1. Practical Use Case Scenarios

  • E-commerce Customer Support: Route simple order queries to lightweight models, complex issues to premium models
  • Code Assistant Platform: Direct syntax questions to fast models, architecture reviews to specialized models
  • Educational Content Platform: Route basic questions to efficient models, complex explanations to reasoning-capable models
  • Financial Analysis Service: Balance cost-sensitive queries with high-accuracy requirements

2. Step-by-Step Implementation Guide

  • Model pool selection and configuration
  • Category definition and scoring strategies
  • Confidence threshold optimization
  • Fallback mechanism setup

3. Configuration Examples

  • Complete config.yaml examples for each use case
  • Model scoring matrices with performance data
  • Reasoning family configurations
  • Cost-performance trade-off examples

4. Performance Optimization Strategies

  • Model selection algorithms in practice
  • Confidence-based routing decisions
  • Dynamic scaling patterns
  • A/B testing configurations for model comparison

5. Monitoring and Analytics

  • Key metrics for MoM performance evaluation
  • Cost tracking and optimization
  • Quality assurance patterns
  • Routing decision analysis

6. Advanced Patterns

  • Weighted routing configurations
  • Multi-stage classification
  • Provider diversity strategies
  • Hybrid cloud/on-premise deployments

7. Troubleshooting and Best Practices

  • Common routing issues and solutions
  • Performance tuning guidelines
  • Cost optimization techniques
  • Quality assurance strategies

Additional context

Current Implementation Assets:

  • MoM overview documentation (website/docs/overview/mixture-of-models.md)
  • Model selection algorithms (src/semantic-router/pkg/extproc/model_selector.go)
  • Classification system (src/semantic-router/pkg/utils/classification/classifier.go)
  • Reasoning mode selection (src/semantic-router/pkg/extproc/reason_mode_selector.go)
  • Example configurations (examples/semanticroute/)

Existing Tutorial Structure:

  • website/docs/tutorials/intelligent-route/overview.md - Basic concepts
  • website/docs/tutorials/intelligent-route/reasoning.md - Reasoning routing guide

Technical Features to Showcase:

  • 14+ query categories with semantic classification
  • Model scoring and selection algorithms
  • Reasoning-aware routing with confidence thresholds
  • Weighted routing and traffic splitting
  • Fallback mechanisms and graceful degradation
  • Integration with semantic cache and content safety

Real-World Examples to Include:

  • Cost savings calculations (like the $165K/year example in overview)
  • Performance benchmarks across different model combinations
  • Latency vs. quality trade-offs
  • Provider diversity and reliability patterns

Integration Points:

  • Reference existing semantic cache tutorials for performance optimization
  • Link to content safety tutorials for secure routing
  • Connect with observability guides for monitoring MoM performance
  • Cross-reference with Kubernetes deployment for production scaling

This tutorial will bridge the gap between the powerful MoM capabilities and practical implementation, enabling users to leverage intelligent routing for cost-effective, high-performance AI applications.

/area core user-experience docs
/milestone v0.1
/priority P1

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions