Skip to content

Commit ea0c6ea

Browse files
authored
feat: add out-of-tree and mcp based classification support (#375)
* feat: add out-of-tree and mcp based classification support Signed-off-by: Huamin Chen <[email protected]> * fix unit tests Signed-off-by: Huamin Chen <[email protected]> * update unit test Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review comments Signed-off-by: Huamin Chen <[email protected]> * add example regex based classification mcp server Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback: auto discover mcp tools Signed-off-by: Huamin Chen <[email protected]> * verify fixes are working Signed-off-by: Huamin Chen <[email protected]> * verify fixes are working Signed-off-by: Huamin Chen <[email protected]> * add missing file Signed-off-by: Huamin Chen <[email protected]> * fix lint Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]>
1 parent a71ba5c commit ea0c6ea

File tree

19 files changed

+4401
-7
lines changed

19 files changed

+4401
-7
lines changed
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# Example Configuration for MCP-Based Category Classifier (HTTP Transport)
2+
#
3+
# This configuration demonstrates how to use an external MCP (Model Context Protocol)
4+
# service via HTTP for category classification instead of the built-in Candle/ModernBERT models.
5+
#
6+
# Use cases:
7+
# - Offload classification to a remote HTTP service
8+
# - Use custom classification models not supported in-tree
9+
# - Scale classification independently from the router
10+
# - Integrate with existing ML infrastructure via REST API
11+
#
12+
# Note: This example uses HTTP transport. The MCP server should expose an HTTP endpoint
13+
# that implements the MCP protocol (e.g., http://localhost:8080/mcp)
14+
15+
# BERT model for semantic caching and tool selection
16+
bert_model:
17+
model_id: "sentence-transformers/all-MiniLM-L6-v2"
18+
threshold: 0.85
19+
use_cpu: true
20+
21+
# Classifier configuration
22+
classifier:
23+
# Disable in-tree category classifier (leave model_id empty)
24+
category_model:
25+
model_id: "" # Empty = disabled
26+
27+
# Enable MCP-based category classifier (HTTP transport only)
28+
mcp_category_model:
29+
enabled: true # Enable MCP classifier
30+
transport_type: "http" # HTTP transport
31+
url: "http://localhost:8090/mcp" # MCP server endpoint
32+
33+
# tool_name: Optional - auto-discovers classification tool if not specified
34+
# Will search for tools like: classify_text, classify, categorize, etc.
35+
# Uncomment to explicitly specify:
36+
# tool_name: "classify_text"
37+
38+
threshold: 0.6 # Confidence threshold
39+
timeout_seconds: 30 # Request timeout
40+
41+
# Categories for routing queries
42+
#
43+
# Categories are automatically loaded from MCP server via 'list_categories' tool.
44+
# The MCP server controls BOTH classification AND routing decisions.
45+
#
46+
# How it works:
47+
# 1. Router connects to MCP server at startup
48+
# 2. Calls 'list_categories' tool: MCP returns {"categories": ["business", "law", ...]}
49+
# 3. For each request, calls 'classify_text' tool which returns:
50+
# {
51+
# "class": 3,
52+
# "confidence": 0.85,
53+
# "model": "openai/gpt-oss-20b", # MCP decides which model to use
54+
# "use_reasoning": true # MCP decides whether to use reasoning
55+
# }
56+
# 4. Router uses the model and reasoning settings from MCP response
57+
#
58+
# BENEFITS:
59+
# - MCP server makes intelligent routing decisions per query
60+
# - No hardcoded routing rules needed in config
61+
# - MCP can adapt routing based on query complexity, content, etc.
62+
# - Centralized routing logic in MCP server
63+
#
64+
# FALLBACK:
65+
# - If MCP doesn't return model/use_reasoning, uses default_model below
66+
# - Can also add category-specific overrides here if needed
67+
#
68+
categories: []
69+
70+
# Default model to use when category can't be determined
71+
default_model: openai/gpt-oss-20b
72+
73+
# vLLM endpoints configuration
74+
vllm_endpoints:
75+
- name: endpoint1
76+
address: 127.0.0.1
77+
port: 8000
78+
models:
79+
- openai/gpt-oss-20b
80+
weight: 1
81+
health_check_path: /health
82+
83+
# Model-specific configuration
84+
model_config:
85+
openai/gpt-oss-20b:
86+
reasoning_family: gpt-oss
87+
preferred_endpoints:
88+
- endpoint1
89+
pii_policy:
90+
allow_by_default: true
91+
92+
# Reasoning family configurations
93+
reasoning_families:
94+
deepseek:
95+
type: chat_template_kwargs
96+
parameter: thinking
97+
qwen3:
98+
type: chat_template_kwargs
99+
parameter: enable_thinking
100+
gpt-oss:
101+
type: reasoning_effort
102+
parameter: reasoning_effort
103+
gpt:
104+
type: reasoning_effort
105+
parameter: reasoning_effort
106+
107+
# Default reasoning effort level
108+
default_reasoning_effort: high
109+
110+
# Tools configuration (optional)
111+
tools:
112+
enabled: false
113+
top_k: 5
114+
similarity_threshold: 0.7
115+
tools_db_path: "config/tools_db.json"
116+
fallback_to_empty: true
117+
118+
# API configuration
119+
api:
120+
batch_classification:
121+
max_batch_size: 100
122+
concurrency_threshold: 5
123+
max_concurrency: 8
124+
metrics:
125+
enabled: true
126+
detailed_goroutine_tracking: true
127+
high_resolution_timing: false
128+
sample_rate: 1.0
129+
duration_buckets:
130+
- 0.001
131+
- 0.005
132+
- 0.01
133+
- 0.025
134+
- 0.05
135+
- 0.1
136+
- 0.25
137+
- 0.5
138+
- 1
139+
- 2.5
140+
- 5
141+
- 10
142+
- 30
143+
size_buckets:
144+
- 1
145+
- 2
146+
- 5
147+
- 10
148+
- 20
149+
- 50
150+
- 100
151+
- 200
152+
153+
# Observability configuration
154+
observability:
155+
tracing:
156+
enabled: false
157+
provider: "opentelemetry"
158+
exporter:
159+
type: "otlp"
160+
endpoint: "localhost:4317"
161+
insecure: true
162+
sampling:
163+
type: "always_on"
164+
resource:
165+
service_name: "semantic-router"
166+
service_version: "1.0.0"
167+
deployment_environment: "production"
168+
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# MCP Classification Server
2+
3+
Example MCP server that provides text classification with intelligent routing for the semantic router.
4+
5+
## Features
6+
7+
- **Dynamic Categories**: Loaded from MCP server at runtime via `list_categories`
8+
- **Intelligent Routing**: Returns `model` and `use_reasoning` in classification response
9+
- **Regex-Based**: Simple pattern matching (replace with ML models for production)
10+
- **Dual Transport**: Supports both HTTP and stdio
11+
12+
## Categories
13+
14+
| Index | Category | Example Keywords |
15+
|-------|----------|------------------|
16+
| 0 | math | calculate, equation, formula, integral |
17+
| 1 | science | physics, chemistry, biology, atom, DNA |
18+
| 2 | technology | computer, programming, AI, cloud |
19+
| 3 | history | ancient, war, empire, civilization |
20+
| 4 | general | Catch-all for other queries |
21+
22+
## Quick Start
23+
24+
```bash
25+
# Install dependencies
26+
pip install -r requirements.txt
27+
28+
# HTTP mode (for semantic router)
29+
python server.py --http --port 8090
30+
31+
# Stdio mode (for MCP clients)
32+
python server.py
33+
```
34+
35+
**Test the server:**
36+
37+
```bash
38+
curl http://localhost:8090/health
39+
# → {"status": "ok", "categories": ["math", "science", "technology", "history", "general"]}
40+
```
41+
42+
## Configuration
43+
44+
**Router config (`config-mcp-classifier-example.yaml`):**
45+
46+
```yaml
47+
classifier:
48+
category_model:
49+
model_id: "" # Empty = use MCP
50+
51+
mcp_category_model:
52+
enabled: true
53+
transport_type: "http"
54+
url: "http://localhost:8090/mcp"
55+
# tool_name: optional - auto-discovers classification tool if not specified
56+
threshold: 0.6
57+
timeout_seconds: 30
58+
59+
categories: [] # Loaded dynamically from MCP
60+
default_model: openai/gpt-oss-20b
61+
```
62+
63+
**Tool Auto-Discovery:**
64+
The router automatically discovers classification tools from the MCP server by:
65+
66+
1. Listing available tools on connection
67+
2. Looking for common names: `classify_text`, `classify`, `categorize`, `categorize_text`
68+
3. Pattern matching for tools containing "classif" in name/description
69+
4. Optionally specify `tool_name` to use a specific tool
70+
71+
## Protocol API
72+
73+
This server implements the MCP classification protocol defined in:
74+
75+
```
76+
github.com/vllm-project/semantic-router/src/semantic-router/pkg/connectivity/mcp/api
77+
```
78+
79+
**Required Tools:**
80+
81+
1. **`list_categories`** - Returns `ListCategoriesResponse`:
82+
83+
```json
84+
{"categories": ["math", "science", "technology", ...]}
85+
```
86+
87+
2. **`classify_text`** - Returns `ClassifyResponse`:
88+
89+
```json
90+
{
91+
"class": 1,
92+
"confidence": 0.85,
93+
"model": "openai/gpt-oss-20b",
94+
"use_reasoning": true
95+
}
96+
```
97+
98+
See the `api` package for full type definitions and documentation.
99+
100+
## How It Works
101+
102+
**Intelligent Routing Rules:**
103+
104+
- Long query (>20 words) + complex words (`why`, `how`, `explain`) → `use_reasoning: true`
105+
- Math + short query → `use_reasoning: false`
106+
- High confidence (>0.9) → `use_reasoning: false`
107+
- Low confidence (<0.6) → `use_reasoning: true`
108+
- Default → `use_reasoning: true`
109+
110+
## Customization
111+
112+
Edit `CATEGORIES` to add categories:
113+
114+
```python
115+
CATEGORIES = {
116+
"your_category": {
117+
"patterns": [r"\b(keyword1|keyword2)\b"],
118+
"description": "Your description"
119+
}
120+
}
121+
```
122+
123+
Edit `decide_routing()` for custom routing logic:
124+
125+
```python
126+
def decide_routing(text, category, confidence):
127+
if category == "math":
128+
return "deepseek/deepseek-math", False
129+
return "openai/gpt-oss-20b", True
130+
```
131+
132+
## License
133+
134+
MIT
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
mcp>=1.0.0
2+
aiohttp>=3.9.0

0 commit comments

Comments
 (0)