diff --git a/examples/mcp-classifier-server/README.md b/examples/mcp-classifier-server/README.md index cc639073..4fef7f18 100644 --- a/examples/mcp-classifier-server/README.md +++ b/examples/mcp-classifier-server/README.md @@ -1,13 +1,40 @@ # MCP Classification Server -Example MCP server that provides text classification with intelligent routing for the semantic router. +Example MCP servers that provide text classification with intelligent routing for the semantic router. -## Features +## 📦 Two Implementations + +This directory contains **two MCP classification servers**: + +### 1. **Regex-Based Server** (`server.py`) + +- ✅ **Simple & Fast** - Pattern matching with regex +- ✅ **Lightweight** - ~10MB memory, <5ms per query +- ✅ **No Dependencies** - Just MCP SDK +- 📝 **Best For**: Prototyping, simple rules, low-latency requirements + +### 2. **Embedding-Based Server** (`server_embedding.py`) 🆕 + +- ✅ **High Accuracy** - Semantic understanding with Qwen3-Embedding-0.6B +- ✅ **RAG-Style** - FAISS vector database with similarity search +- ✅ **Flexible** - Handles paraphrases, synonyms, variations +- 📝 **Best For**: Production use, high-accuracy requirements + +**Choose based on your needs:** + +- **Quick start / Testing?** → Use `server.py` (regex-based) +- **Production / Accuracy?** → Use `server_embedding.py` (embedding-based) + +--- + +## Regex-Based Server (`server.py`) + +### Features - **Dynamic Categories**: Loaded from MCP server at runtime via `list_categories` - **Per-Category System Prompts**: Each category has its own specialized system prompt for LLM context - **Intelligent Routing**: Returns `model` and `use_reasoning` in classification response -- **Regex-Based**: Simple pattern matching (replace with ML models for production) +- **Regex-Based**: Simple pattern matching (fast but limited) - **Dual Transport**: Supports both HTTP and stdio ## Categories @@ -164,6 +191,36 @@ if systemPrompt, ok := classifier.GetCategorySystemPrompt(category); ok { } ``` -## License +--- + +## Embedding-Based Server (`server_embedding.py`) + +For **production use with high accuracy**, see the embedding-based server: + +### Quick Start + +```bash +# Install dependencies +pip install -r requirements_embedding.txt + +# Start server (HTTP mode on port 8090) +python3 server_embedding.py --http --port 8090 +``` + +### Features + +- **Qwen3-Embedding-0.6B** model with 1024-dimensional embeddings +- **FAISS vector database** for fast similarity search +- **RAG-style classification** using 95 training examples +- **Same MCP protocol** as regex server (drop-in replacement) +- **Higher accuracy** - Understands semantic meaning, not just patterns + +### Comparison -MIT +| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) | +|---------|---------------------|-----------------------------------| +| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | +| **Speed** | ~1-5ms | ~50-100ms | +| **Memory** | ~10MB | ~600MB | +| **Setup** | Simple | Requires model | +| **Best For** | Prototyping | Production | diff --git a/examples/mcp-classifier-server/requirements_embedding.txt b/examples/mcp-classifier-server/requirements_embedding.txt new file mode 100644 index 00000000..c401dfe8 --- /dev/null +++ b/examples/mcp-classifier-server/requirements_embedding.txt @@ -0,0 +1,16 @@ +# MCP Embedding-Based Classification Server Requirements + +# Core MCP SDK +mcp>=1.0.0 + +# Embedding and Vector Search +transformers>=4.30.0 +torch>=2.0.0 +faiss-cpu>=1.7.4 # Use faiss-gpu if you have GPU support + +# HTTP server support (optional, for HTTP mode) +aiohttp>=3.9.0 + +# Utilities +numpy>=1.24.0 + diff --git a/examples/mcp-classifier-server/server_embedding.py b/examples/mcp-classifier-server/server_embedding.py new file mode 100644 index 00000000..2090a4c8 --- /dev/null +++ b/examples/mcp-classifier-server/server_embedding.py @@ -0,0 +1,745 @@ +#!/usr/bin/env python3 +""" +Embedding-Based MCP Classification Server with Intelligent Routing + +This is an example MCP server that demonstrates: +1. Text classification using semantic embeddings (RAG-style) +2. Dynamic category discovery via list_categories +3. Intelligent routing decisions (model selection and reasoning control) +4. FAISS vector database for similarity search + +The server implements two MCP tools: +- 'list_categories': Returns available categories with per-category system prompts and descriptions +- 'classify_text': Classifies text using semantic similarity and returns routing recommendations + +Protocol: +- list_categories returns: { + "categories": ["math", "science", "technology", ...], + "category_system_prompts": { + "math": "You are a mathematics expert...", + ... + }, + "category_descriptions": { + "math": "Mathematical and computational queries", + ... + } + } +- classify_text returns: { + "class": 0, + "confidence": 0.85, + "model": "openai/gpt-oss-20b", + "use_reasoning": true, + "probabilities": [...] + } + +Usage: + # Stdio mode (for testing with MCP clients) + python server_embedding.py + + # HTTP mode (for semantic router) + python server_embedding.py --http --port 8090 +""" + +import argparse +import csv +import json +import logging +import math +import os +from pathlib import Path +from typing import Any + +import faiss +import numpy as np +import torch +from mcp.server import Server +from mcp.server.stdio import stdio_server +from mcp.types import TextContent, Tool +from transformers import AutoModel, AutoTokenizer + +# Configure logging +logging.basicConfig( + level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s" +) +logger = logging.getLogger(__name__) + +# Category definitions with system prompts +CATEGORY_CONFIG = { + "math": { + "description": "Mathematical and computational queries", + "system_prompt": """You are a mathematics expert. When answering math questions: +- Show step-by-step solutions with clear explanations +- Use proper mathematical notation and terminology +- Verify calculations and provide intermediate steps +- Explain the underlying concepts and principles +- Offer alternative approaches when applicable""", + }, + "science": { + "description": "Scientific concepts and queries", + "system_prompt": """You are a science expert. When answering science questions: +- Provide evidence-based answers grounded in scientific research +- Explain relevant scientific concepts and principles +- Use appropriate scientific terminology +- Cite the scientific method and experimental evidence when relevant +- Distinguish between established facts and current theories""", + }, + "technology": { + "description": "Technology and computing topics", + "system_prompt": """You are a technology expert. When answering tech questions: +- Include practical examples and code snippets when relevant +- Follow best practices and industry standards +- Explain both high-level concepts and implementation details +- Consider security, performance, and maintainability +- Recommend appropriate tools and technologies for the use case""", + }, + "history": { + "description": "Historical events and topics", + "system_prompt": """You are a history expert. When answering historical questions: +- Provide accurate dates, names, and historical context +- Cite time periods and geographical locations +- Explain the causes, events, and consequences +- Consider multiple perspectives and historical interpretations +- Connect historical events to their broader significance""", + }, + "general": { + "description": "General questions and topics", + "system_prompt": """You are a knowledgeable assistant. When answering general questions: +- Provide balanced, well-rounded responses +- Draw from multiple domains of knowledge when relevant +- Be clear, concise, and accurate +- Adapt your explanation to the complexity of the question +- Acknowledge limitations and uncertainties when appropriate""", + }, +} + + +class EmbeddingClassifier: + """Embedding-based text classifier using FAISS vector search.""" + + def __init__( + self, + model_name: str = "Qwen/Qwen3-Embedding-0.6B", + csv_path: str = "training_data.csv", + index_path: str = "faiss_index.bin", + device: str = "auto", + ): + """ + Initialize the embedding classifier. + + Args: + model_name: Name of the embedding model to use + csv_path: Path to the CSV training data file + index_path: Path to save/load FAISS index + device: Device to use ("cuda", "cpu", or "auto" for auto-detection) + """ + self.model_name = model_name + self.csv_path = csv_path + self.index_path = index_path + + logger.info(f"Initializing embedding model: {model_name}") + self.tokenizer = AutoTokenizer.from_pretrained( + model_name, trust_remote_code=True + ) + self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True) + + # Set device based on user preference + if device == "auto": + self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + elif device == "cuda": + if not torch.cuda.is_available(): + logger.warning("CUDA requested but not available, falling back to CPU") + self.device = torch.device("cpu") + else: + self.device = torch.device("cuda") + else: + self.device = torch.device("cpu") + + logger.info(f"Using device: {self.device}") + self.model.to(self.device) + self.model.eval() + + # Qwen3-Embedding-0.6B has embedding dimension of 1024 + self.embedding_dim = 1024 + + self.index = None + self.category_names = list(CATEGORY_CONFIG.keys()) + self.category_to_index = { + name: idx for idx, name in enumerate(self.category_names) + } + self.num_categories = len(self.category_names) + + logger.info(f"Loading training data from {csv_path}") + self.texts, self.categories = self._load_csv_data() + logger.info(f"Loaded {len(self.texts)} training examples") + + # Load or build FAISS index + if os.path.exists(index_path): + logger.info(f"Loading existing FAISS index from {index_path}") + self.index = faiss.read_index(index_path) + logger.info(f"Index loaded with {self.index.ntotal} vectors") + + # Verify index size matches CSV + if self.index.ntotal != len(self.texts): + logger.warning( + f"Index size ({self.index.ntotal}) doesn't match CSV ({len(self.texts)}). Rebuilding..." + ) + self._build_index() + else: + logger.info("No existing index found, building new one...") + self._build_index() + + def _encode_texts(self, texts: list[str], batch_size: int = 8) -> np.ndarray: + """ + Encode texts into embeddings using Qwen3 model. + + Args: + texts: List of texts to encode + batch_size: Batch size for encoding + + Returns: + numpy array of embeddings + """ + embeddings = [] + + for i in range(0, len(texts), batch_size): + batch_texts = texts[i : i + batch_size] + + # Tokenize + inputs = self.tokenizer( + batch_texts, + padding=True, + truncation=True, + max_length=512, + return_tensors="pt", + ) + inputs = {k: v.to(self.device) for k, v in inputs.items()} + + # Generate embeddings + with torch.no_grad(): + outputs = self.model(**inputs) + # Mean pooling over sequence length + batch_embeddings = outputs.last_hidden_state.mean(dim=1).cpu().numpy() + embeddings.append(batch_embeddings) + + return np.vstack(embeddings) + + def _load_csv_data(self) -> tuple[list[str], list[str]]: + """ + Load training data from CSV file. + + Returns: + Tuple of (texts, categories) + """ + texts = [] + categories = [] + + logger.info(f"Loading training data from {self.csv_path}") + + with open(self.csv_path, "r", encoding="utf-8") as f: + reader = csv.DictReader(f) + for row in reader: + texts.append(row["text"]) + categories.append(row["category"]) + + logger.info(f"Loaded {len(texts)} training examples") + return texts, categories + + def _build_index(self): + """Build FAISS index from loaded CSV data.""" + logger.info("Building FAISS index from training data...") + + # Generate embeddings + logger.info(f"Generating embeddings for {len(self.texts)} examples...") + embeddings = self._encode_texts(self.texts) + + # Normalize embeddings for cosine similarity + faiss.normalize_L2(embeddings.astype("float32")) + + # Build FAISS index + logger.info(f"Creating FAISS index with dimension {self.embedding_dim}...") + self.index = faiss.IndexFlatIP(self.embedding_dim) # Inner product for cosine + self.index.add(embeddings) + + # Save index + logger.info(f"Saving index to {self.index_path}") + faiss.write_index(self.index, self.index_path) + + logger.info(f"Index built successfully with {self.index.ntotal} vectors") + + def classify( + self, text: str, k: int = 20, with_probabilities: bool = False + ) -> dict[str, Any]: + """ + Classify text using semantic similarity search. + + Args: + text: Input text to classify + k: Number of nearest neighbors to retrieve + with_probabilities: Whether to return full probability distribution + + Returns: + Dictionary with classification results + """ + # Generate embedding for query text + query_embedding = self._encode_texts([text]) + faiss.normalize_L2(query_embedding.astype("float32")) + + # Search for k nearest neighbors + similarities, indices = self.index.search(query_embedding, k) + + # Get categories of nearest neighbors + neighbor_categories = [self.categories[idx] for idx in indices[0]] + neighbor_similarities = similarities[0] + + # Calculate confidence scores for each category + category_scores = {cat: 0.0 for cat in self.category_names} + + for category, similarity in zip(neighbor_categories, neighbor_similarities): + # Weight by similarity score (cosine similarity is already in [0, 1] after normalization) + category_scores[category] += similarity + + # Normalize scores + total_score = sum(category_scores.values()) + if total_score > 0: + category_scores = { + cat: score / total_score for cat, score in category_scores.items() + } + + # Find best category + best_category = max(category_scores.items(), key=lambda x: x[1]) + best_category_name = best_category[0] + best_confidence = best_category[1] + + # Get class index + class_index = self.category_to_index[best_category_name] + + # Decide routing + model, use_reasoning = self._decide_routing( + text, best_category_name, best_confidence + ) + + result = { + "class": int(class_index), + "confidence": float(best_confidence), + "model": model, + "use_reasoning": use_reasoning, + } + + if with_probabilities: + # Create probability distribution (convert to native Python types) + probabilities = [float(category_scores[cat]) for cat in self.category_names] + result["probabilities"] = probabilities + + # Calculate entropy + entropy_value = self._calculate_entropy(probabilities) + result["entropy"] = float(entropy_value) + + logger.info( + f"Classification result: class={class_index} ({best_category_name}), " + f"confidence={best_confidence:.3f}, entropy={entropy_value:.3f}, " + f"model={model}, use_reasoning={use_reasoning}" + ) + else: + logger.info( + f"Classification result: class={class_index} ({best_category_name}), " + f"confidence={best_confidence:.3f}, model={model}, use_reasoning={use_reasoning}" + ) + + return result + + def _calculate_entropy(self, probabilities: list[float]) -> float: + """ + Calculate Shannon entropy of the probability distribution. + + Args: + probabilities: List of probability values + + Returns: + Entropy value + """ + entropy = 0.0 + for p in probabilities: + if p > 0: + entropy -= p * math.log2(p) + return entropy + + def _decide_routing( + self, text: str, category_name: str, confidence: float + ) -> tuple[str, bool]: + """ + Decide which model to use and whether to enable reasoning. + + Args: + text: Input text being classified + category_name: Predicted category + confidence: Classification confidence + + Returns: + Tuple of (model_name, use_reasoning) + """ + text_lower = text.lower() + word_count = len(text.split()) + + # Check for complexity indicators + complex_words = [ + "why", + "how", + "explain", + "analyze", + "compare", + "evaluate", + "describe", + ] + has_complex_words = any(word in text_lower for word in complex_words) + + # Long queries with complex words → use reasoning + if word_count > 20 and has_complex_words: + return "openai/gpt-oss-20b", True + + # Math category with simple queries → no reasoning needed + if category_name == "math" and word_count < 15: + return "openai/gpt-oss-20b", False + + # High confidence → can use simpler model + if confidence > 0.9: + return "openai/gpt-oss-20b", False + + # Low confidence → use reasoning to be safe + if confidence < 0.6: + return "openai/gpt-oss-20b", True + + # Default: use reasoning for better quality + return "openai/gpt-oss-20b", True + + +# Initialize classifier globally +# Note: This is safe for aiohttp as it uses a single-threaded event loop. +# For multi-process deployments, each process gets its own instance. +classifier = None +classifier_device = "auto" # Default device setting + + +def get_classifier(): + """Get or create the global classifier instance.""" + global classifier + if classifier is None: + # Get script directory + script_dir = Path(__file__).parent + csv_path = script_dir / "training_data.csv" + index_path = script_dir / "faiss_index.bin" + + classifier = EmbeddingClassifier( + model_name="Qwen/Qwen3-Embedding-0.6B", + csv_path=str(csv_path), + index_path=str(index_path), + device=classifier_device, + ) + return classifier + + +# Initialize MCP server +app = Server("embedding-classifier") + + +@app.list_tools() +async def list_tools() -> list[Tool]: + """List available tools.""" + clf = get_classifier() + return [ + Tool( + name="classify_text", + description=( + "Classify text into categories using semantic embeddings and provide intelligent routing recommendations. " + f"Categories: {', '.join(clf.category_names)}. " + "Returns: class index, confidence, recommended model, and reasoning flag. " + "Optionally returns full probability distribution for entropy analysis." + ), + inputSchema={ + "type": "object", + "properties": { + "text": {"type": "string", "description": "The text to classify"}, + "with_probabilities": { + "type": "boolean", + "description": "Whether to return full probability distribution for entropy analysis", + "default": False, + }, + }, + "required": ["text"], + }, + ), + Tool( + name="list_categories", + description=( + "List all available classification categories with per-category system prompts and descriptions. " + "Returns: categories (array), category_system_prompts (object), category_descriptions (object). " + "Each category can have its own system prompt that the router injects for category-specific LLM context." + ), + inputSchema={"type": "object", "properties": {}}, + ), + ] + + +@app.call_tool() +async def call_tool(name: str, arguments: Any) -> list[TextContent]: + """Handle tool calls.""" + clf = get_classifier() + + if name == "classify_text": + text = arguments.get("text", "") + with_probabilities = arguments.get("with_probabilities", False) + + if not text: + return [ + TextContent(type="text", text=json.dumps({"error": "No text provided"})) + ] + + try: + result = clf.classify(text, with_probabilities=with_probabilities) + return [TextContent(type="text", text=json.dumps(result))] + except Exception as e: + logger.error(f"Error classifying text: {e}", exc_info=True) + return [TextContent(type="text", text=json.dumps({"error": str(e)}))] + + elif name == "list_categories": + # Return category information + category_descriptions = { + name: CATEGORY_CONFIG[name]["description"] for name in clf.category_names + } + + category_system_prompts = { + name: CATEGORY_CONFIG[name]["system_prompt"] for name in clf.category_names + } + + categories_response = { + "categories": clf.category_names, + "category_system_prompts": category_system_prompts, + "category_descriptions": category_descriptions, + } + + logger.info( + f"Returning {len(clf.category_names)} categories with {len(category_system_prompts)} system prompts: {clf.category_names}" + ) + return [TextContent(type="text", text=json.dumps(categories_response))] + + else: + return [ + TextContent( + type="text", text=json.dumps({"error": f"Unknown tool: {name}"}) + ) + ] + + +async def main_stdio(device: str = "auto"): + """Run the MCP server in stdio mode.""" + global classifier_device + classifier_device = device + + logger.info("Starting Embedding-Based MCP Classification Server (stdio mode)") + clf = get_classifier() + logger.info(f"Available categories: {', '.join(clf.category_names)}") + logger.info(f"Model: {clf.model_name}") + logger.info(f"Device: {clf.device}") + logger.info(f"Index size: {clf.index.ntotal} vectors") + + async with stdio_server() as (read_stream, write_stream): + await app.run(read_stream, write_stream, app.create_initialization_options()) + + +async def main_http(port: int = 8091, device: str = "auto"): + """Run the MCP server in HTTP mode.""" + global classifier_device + classifier_device = device + + try: + from aiohttp import web + except ImportError: + logger.error( + "aiohttp is required for HTTP mode. Install it with: pip install aiohttp" + ) + return + + logger.info(f"Starting Embedding-Based MCP Classification Server (HTTP mode)") + clf = get_classifier() + logger.info(f"Available categories: {', '.join(clf.category_names)}") + logger.info(f"Model: {clf.model_name}") + logger.info(f"Device: {clf.device}") + logger.info(f"Index size: {clf.index.ntotal} vectors") + logger.info(f"Listening on http://0.0.0.0:{port}/mcp") + + async def handle_mcp_request(request): + """Handle MCP requests over HTTP.""" + try: + data = await request.json() + method = data.get("method", "") + + # Extract method from URL path if not in JSON + if not method: + path = request.path + if path.startswith("/mcp/"): + method = path[5:] + elif path == "/mcp": + method = "" + + params = data.get("params", data if not data.get("method") else {}) + request_id = data.get("id", 1) + + logger.debug( + f"Received MCP request: method={method}, path={request.path}, id={request_id}" + ) + + # Handle initialize + if method == "initialize": + init_result = { + "protocolVersion": "2024-11-05", + "capabilities": { + "tools": {}, + }, + "serverInfo": {"name": "embedding-classifier", "version": "1.0.0"}, + } + + if request.path.startswith("/mcp/") and request.path != "/mcp": + return web.json_response(init_result) + else: + result = {"jsonrpc": "2.0", "id": request_id, "result": init_result} + return web.json_response(result) + + # Handle tools/list + elif method == "tools/list": + tools_list = await list_tools() + tools_data = [ + { + "name": tool.name, + "description": tool.description, + "inputSchema": tool.inputSchema, + } + for tool in tools_list + ] + + if request.path.startswith("/mcp/") and request.path != "/mcp": + return web.json_response({"tools": tools_data}) + else: + result = { + "jsonrpc": "2.0", + "id": request_id, + "result": {"tools": tools_data}, + } + return web.json_response(result) + + # Handle tools/call + elif method == "tools/call": + tool_name = params.get("name", "") + arguments = params.get("arguments", {}) + + tool_result = await call_tool(tool_name, arguments) + + # Convert TextContent to dict + content = [{"type": tc.type, "text": tc.text} for tc in tool_result] + + result_data = {"content": content, "isError": False} + + if request.path.startswith("/mcp/") and request.path != "/mcp": + return web.json_response(result_data) + else: + result = {"jsonrpc": "2.0", "id": request_id, "result": result_data} + return web.json_response(result) + + # Handle ping + elif method == "ping": + result = {"jsonrpc": "2.0", "id": request_id, "result": {}} + return web.json_response(result) + + else: + error = { + "jsonrpc": "2.0", + "id": request_id, + "error": {"code": -32601, "message": f"Method not found: {method}"}, + } + return web.json_response(error, status=404) + + except Exception as e: + logger.error(f"Error handling request: {e}", exc_info=True) + error = { + "jsonrpc": "2.0", + "id": ( + data.get("id") + if "data" in locals() and isinstance(data, dict) + else None + ), + "error": {"code": -32603, "message": f"Internal error: {str(e)}"}, + } + return web.json_response(error, status=500) + + async def health_check(request): + """Health check endpoint.""" + clf = get_classifier() + return web.json_response( + { + "status": "ok", + "categories": clf.category_names, + "model": clf.model_name, + "index_size": clf.index.ntotal, + } + ) + + # Create web application + http_app = web.Application() + + # Main JSON-RPC endpoint + http_app.router.add_post("/mcp", handle_mcp_request) + + # REST-style endpoints + http_app.router.add_post("/mcp/initialize", handle_mcp_request) + http_app.router.add_post("/mcp/tools/list", handle_mcp_request) + http_app.router.add_post("/mcp/tools/call", handle_mcp_request) + http_app.router.add_post("/mcp/resources/list", handle_mcp_request) + http_app.router.add_post("/mcp/resources/read", handle_mcp_request) + http_app.router.add_post("/mcp/prompts/list", handle_mcp_request) + http_app.router.add_post("/mcp/prompts/get", handle_mcp_request) + http_app.router.add_post("/mcp/ping", handle_mcp_request) + + # Health check + http_app.router.add_get("/health", health_check) + + # Run the server + runner = web.AppRunner(http_app) + await runner.setup() + site = web.TCPSite(runner, "0.0.0.0", port) + await site.start() + + logger.info(f"Server is ready at http://0.0.0.0:{port}/mcp") + logger.info(f"Health check available at http://0.0.0.0:{port}/health") + + # Keep the server running + try: + while True: + await asyncio.sleep(3600) + except KeyboardInterrupt: + logger.info("Shutting down server...") + finally: + await runner.cleanup() + + +if __name__ == "__main__": + import asyncio + + # Parse command line arguments + parser = argparse.ArgumentParser( + description="MCP Embedding-Based Classification Server" + ) + parser.add_argument( + "--http", action="store_true", help="Run in HTTP mode instead of stdio" + ) + parser.add_argument("--port", type=int, default=8091, help="HTTP port to listen on") + parser.add_argument( + "--device", + type=str, + default="auto", + choices=["auto", "cuda", "cpu"], + help="Device to use for inference (auto=auto-detect, cuda=force GPU, cpu=force CPU)", + ) + args = parser.parse_args() + + if args.http: + asyncio.run(main_http(args.port, args.device)) + else: + asyncio.run(main_stdio(args.device)) diff --git a/examples/mcp-classifier-server/training_data.csv b/examples/mcp-classifier-server/training_data.csv new file mode 100644 index 00000000..fab6e4e1 --- /dev/null +++ b/examples/mcp-classifier-server/training_data.csv @@ -0,0 +1,97 @@ +category,text,description +math,What is the derivative of x squared?,Calculus question about derivatives +math,Calculate 15 multiplied by 23,Basic arithmetic calculation +math,Solve the equation 2x + 5 = 15 for x,Linear equation solving +math,What is the integral of sin(x)?,Integration problem +math,Find the area of a circle with radius 5,Geometry problem +math,What is the probability of rolling a 6 on a die?,Probability question +math,Calculate the mean of 10 20 30 40 50,Statistics calculation +math,What is the value of log base 10 of 100?,Logarithm question +math,Solve this matrix multiplication problem,Linear algebra question +math,What is the sum of angles in a triangle?,Geometry fundamentals +math,Find the roots of the quadratic equation x^2 - 5x + 6 = 0,Quadratic equations +math,What is the Pythagorean theorem?,Mathematical theorem +math,Calculate the standard deviation of this dataset,Statistical analysis +math,What is a complex number?,Mathematical concepts +math,Explain the concept of limits in calculus,Calculus fundamentals +science,What is photosynthesis?,Biology question +science,Explain Newton's laws of motion,Physics fundamentals +science,What is the structure of an atom?,Chemistry basics +science,How does DNA replication work?,Molecular biology +science,What causes gravity?,Physics question +science,Explain the water cycle,Earth science +science,What is the periodic table?,Chemistry fundamentals +science,How do cells divide?,Biology process +science,What is evolution by natural selection?,Evolutionary biology +science,Explain the concept of energy conservation,Physics principle +science,What are the properties of acids and bases?,Chemistry concepts +science,How does the human immune system work?,Biology and health +science,What is plate tectonics?,Geology question +science,Explain the Big Bang theory,Cosmology and astronomy +science,What is an ecosystem?,Ecology question +science,How do vaccines work?,Immunology and medicine +science,What is entropy in thermodynamics?,Physics concept +science,Explain photosynthesis in plants,Botany and biology +science,What are chemical bonds?,Chemistry fundamentals +science,How does the brain process information?,Neuroscience question +technology,What is machine learning?,AI and ML concepts +technology,How does the internet work?,Networking basics +technology,Explain what an API is,Software development +technology,What is cloud computing?,Modern technology infrastructure +technology,How do neural networks work?,Deep learning concepts +technology,What is the difference between Python and Java?,Programming languages +technology,Explain database normalization,Database design +technology,What is version control with Git?,Software development tools +technology,How does encryption work?,Cybersecurity basics +technology,What is a REST API?,Web development concepts +technology,Explain object-oriented programming,Programming paradigms +technology,What is Docker and containerization?,DevOps technology +technology,How do web servers work?,Internet infrastructure +technology,What is blockchain technology?,Distributed systems +technology,Explain the concept of algorithms,Computer science fundamentals +technology,What is responsive web design?,Frontend development +technology,How does a compiler work?,Computer science concepts +technology,What is agile software development?,Software engineering methodology +technology,Explain microservices architecture,Software architecture +technology,What is continuous integration and deployment?,DevOps practices +history,What caused World War I?,20th century history +history,Who was Julius Caesar?,Ancient Rome history +history,Explain the Renaissance period,European history +history,What was the Industrial Revolution?,Economic and social history +history,When did the American Revolution occur?,American history +history,What was the significance of the Silk Road?,Ancient trade history +history,Who were the pharaohs of ancient Egypt?,Ancient civilization +history,Explain the French Revolution,18th century European history +history,What was the Cold War?,20th century geopolitical history +history,Who was Genghis Khan?,Medieval Asian history +history,What happened during the Bronze Age?,Prehistoric history +history,Explain the fall of the Roman Empire,Ancient history +history,What was the significance of the Magna Carta?,Medieval English history +history,Who was Alexander the Great?,Ancient Greek history +history,What caused the Great Depression?,Economic history +history,Explain the Civil Rights Movement,20th century American history +history,What was the Ottoman Empire?,Middle Eastern history +history,Who discovered America?,Age of Exploration +history,What happened during the Viking Age?,Medieval Scandinavian history +history,Explain the Cultural Revolution in China,20th century Asian history +general,What is the capital of France?,Geography question +general,How do I make a good cup of coffee?,Everyday skills +general,What are some tips for public speaking?,Communication skills +general,How can I improve my time management?,Personal development +general,What is mindfulness meditation?,Mental health and wellness +general,How do I write a good resume?,Career advice +general,What are some healthy eating habits?,Nutrition and health +general,How can I learn a new language effectively?,Education and learning +general,What is the meaning of life?,Philosophical question +general,How do I take care of a houseplant?,Home and gardening +general,What are the benefits of exercise?,Health and fitness +general,How do I make new friends as an adult?,Social relationships +general,What is emotional intelligence?,Psychology and personal development +general,How can I reduce stress?,Mental health +general,What are some good books to read?,Literature and entertainment +general,How do I start investing?,Personal finance +general,What is work-life balance?,Career and lifestyle +general,How do I improve my sleep quality?,Health and wellness +general,What are some creative hobbies to try?,Arts and crafts +general,How can I be more environmentally friendly?,Sustainability and lifestyle + diff --git a/website/docs/tutorials/mcp-classification/overview.md b/website/docs/tutorials/mcp-classification/overview.md new file mode 100644 index 00000000..a0f30f09 --- /dev/null +++ b/website/docs/tutorials/mcp-classification/overview.md @@ -0,0 +1,213 @@ +# MCP Classification Overview + +The Model Context Protocol (MCP) classification server feature enables semantic router to dynamically discover categories and classification logic from external MCP servers, providing a flexible and extensible classification system. + +## What is MCP Classification? + +MCP classification allows you to: + +- **Externalize classification logic** - Move category definitions and classification models outside the router +- **Dynamic category discovery** - Categories are discovered at runtime via the `list_categories` tool +- **Hot-reload capabilities** - Update categories without restarting the router +- **Custom implementations** - Use any classification approach (regex, ML models, embeddings, etc.) +- **Per-category system prompts** - Each category can have specialized LLM instructions + +## Architecture + +```mermaid +graph TB + A[User Query] --> B[Semantic Router] + B --> C[MCP Classifier] + + C --> D[MCP Server HTTP/Stdio] + D --> E[classify_text tool] + D --> F[list_categories tool] + + E --> G[Classification Result] + G --> H[class, confidence, model, use_reasoning] + + F --> I[Category Discovery] + I --> J[categories + system_prompts] + + H --> K[Intelligent Routing] + J --> K + + K --> L[LLM with Category Prompt] +``` + +## Core Concepts + +### MCP Tools + +MCP classification servers must implement two tools: + +#### 1. `list_categories` + +Returns available categories with metadata: + +```json +{ + "categories": ["math", "science", "technology", ...], + "category_system_prompts": { + "math": "You are a mathematics expert...", + "science": "You are a science expert..." + }, + "category_descriptions": { + "math": "Mathematical and computational queries", + "science": "Scientific concepts and queries" + } +} +``` + +#### 2. `classify_text` + +Classifies input text and returns routing information: + +```json +{ + "class": 0, + "confidence": 0.85, + "model": "openai/gpt-oss-20b", + "use_reasoning": true, + "probabilities": [0.85, 0.10, 0.03, 0.02], + "entropy": 0.65 +} +``` + +### Routing Intelligence + +MCP classifiers can return intelligent routing decisions: + +- **`model`** - Recommended model for the query type +- **`use_reasoning`** - Whether to enable reasoning/chain-of-thought +- **`confidence`** - Classification certainty (for fallback decisions) +- **`entropy`** - Distribution uncertainty (for monitoring) + +### Per-Category System Prompts + +Each category can have a specialized system prompt that the router injects into LLM requests, providing domain-specific instructions: + +``` +Category: math → "You are a mathematics expert. Show step-by-step solutions..." +Category: code → "You are a coding expert. Follow best practices..." +``` + +## Benefits + +### Flexibility + +- **Multiple implementations** - Regex, ML models, embeddings, hybrid approaches +- **Language agnostic** - MCP servers can be written in any language +- **Easy experimentation** - Swap classification logic without router changes + +### Scalability + +- **Distributed classification** - Run classifiers on separate servers +- **Load balancing** - Multiple classifier instances +- **Caching** - Classification results can be cached + +### Maintainability + +- **Separation of concerns** - Classification logic separate from routing logic +- **Independent updates** - Update categories without router downtime +- **Testing** - Test classification independently + +## Implementation Options + +### 1. Regex-Based (Simple) + +**Best for:** Prototyping, simple rules, low-latency requirements + +- Pattern matching with regular expressions +- ~1-5ms classification time +- Minimal resource usage (~10MB memory) +- Easy to understand and modify + +### 2. Embedding-Based (Recommended) + +**Best for:** Production use, high accuracy requirements + +- Semantic understanding with embedding models +- ~50-100ms classification time (CPU) +- Higher accuracy, handles variations +- RAG-style with vector database + +### 3. ML Model-Based + +**Best for:** Custom classification needs + +- Fine-tuned classification models (BERT, etc.) +- Domain-specific training +- Balanced accuracy and speed + +### 4. Hybrid Approaches + +Combine multiple methods for optimal results: + +- Fast regex for obvious cases +- ML/embeddings for ambiguous queries +- Fallback chains for robustness + +## Configuration + +Enable MCP classification in your `config.yaml`: + +```yaml +# Classifier configuration +classifier: + # Disable in-tree category classifier (leave model_id empty) + category_model: + model_id: "" # Empty = disabled + + # Enable MCP-based category classifier (HTTP transport only) + mcp_category_model: + enabled: true # Enable MCP classifier + transport_type: "http" # HTTP transport + url: "http://localhost:8090/mcp" # MCP server endpoint + threshold: 0.6 # Confidence threshold + timeout_seconds: 30 # Request timeout +``` + +## When to Use MCP Classification + +### ✅ Use MCP Classification When: + +- You need flexible, updateable category definitions +- You want to experiment with different classification approaches +- You need per-category system prompts +- You have custom classification logic +- You want distributed classification +- You need hot-reload capabilities + +### ⚠️ Consider Alternatives When: + +- You need absolute minimum latency (under 5ms) +- You have static categories that never change +- You want the simplest possible setup +- You don't need external classification logic + +## Next Steps + +- [Protocol Specification](./protocol.md) - Detailed MCP protocol for classification +- [Example Servers](https://github.com/vllm-project/semantic-router/tree/main/examples/mcp-classifier-server) - Reference implementations + +## Example Servers + +The repository includes two reference implementations in `examples/mcp-classifier-server/`: + +### 1. Regex-Based (`server.py`) + +- Simple pattern matching +- Fast prototyping (less than 5ms classification) +- Easy to understand and modify +- No ML dependencies required + +### 2. Embedding-Based (`server_embedding.py`) + +- Qwen3-Embedding-0.6B model +- FAISS vector search for semantic similarity +- High accuracy semantic classification +- Production-ready with device selection (CPU/GPU) +- Includes 95 training examples + +Both servers implement the same MCP protocol and can be used interchangeably. diff --git a/website/docs/tutorials/mcp-classification/protocol.md b/website/docs/tutorials/mcp-classification/protocol.md new file mode 100644 index 00000000..d9ca7b2d --- /dev/null +++ b/website/docs/tutorials/mcp-classification/protocol.md @@ -0,0 +1,454 @@ +# MCP Classification Protocol Specification + +This document defines the protocol specification for MCP classification servers compatible with semantic router. + +## Protocol Overview + +MCP classification servers communicate using the Model Context Protocol (MCP) and must implement two required tools: `list_categories` and `classify_text`. + +## Transport Modes + +MCP servers can operate in two modes: + +### 1. HTTP Mode + +RESTful HTTP/JSON-RPC endpoint: + +```bash +POST http://localhost:8090/mcp/tools/call +Content-Type: application/json + +{ + "name": "classify_text", + "arguments": { + "text": "What is 2 + 2?" + } +} +``` + +**Best for:** Production deployments, distributed systems, multiple router instances + +### 2. Stdio Mode + +Standard input/output communication: + +```bash +python server.py # Reads from stdin, writes to stdout +``` + +**Best for:** Local development, MCP Inspector testing, embedded scenarios + +## Required Tools + +### Tool 1: `list_categories` + +Discovers available categories and their metadata. + +#### Request + +```json +{ + "name": "list_categories", + "arguments": {} +} +``` + +#### Response + +```json +{ + "content": [{ + "type": "text", + "text": "{ + \"categories\": [\"math\", \"science\", \"technology\", \"history\", \"general\"], + \"category_system_prompts\": { + \"math\": \"You are a mathematics expert. When answering math questions:\\n- Show step-by-step solutions with clear explanations\\n- Use proper mathematical notation and terminology\\n- Verify calculations and provide intermediate steps\\n- Explain the underlying concepts and principles\\n- Offer alternative approaches when applicable\", + \"science\": \"You are a science expert. When answering science questions:\\n- Provide evidence-based answers grounded in scientific research\\n- Explain relevant scientific concepts and principles\\n- Use appropriate scientific terminology\\n- Cite the scientific method and experimental evidence when relevant\\n- Distinguish between established facts and current theories\" + }, + \"category_descriptions\": { + \"math\": \"Mathematical and computational queries\", + \"science\": \"Scientific concepts and queries\", + \"technology\": \"Technology and computing topics\", + \"history\": \"Historical events and topics\", + \"general\": \"General questions and topics\" + } + }" + }], + "isError": false +} +``` + +#### Field Specifications + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `categories` | array[string] | **Yes** | List of category names (order determines indices) | +| `category_system_prompts` | object | Optional | Per-category system prompts for LLM context | +| `category_descriptions` | object | Optional | Human-readable category descriptions | + +**Notes:** + +- Category order defines the class indices (first = 0, second = 1, etc.) +- `category_system_prompts` should be provided for best routing results +- System prompts are injected by the router when making LLM requests + +### Tool 2: `classify_text` + +Classifies input text and returns routing recommendations. + +#### Request + +```json +{ + "name": "classify_text", + "arguments": { + "text": "What is the derivative of x squared?", + "with_probabilities": true + } +} +``` + +#### Request Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `text` | string | **Yes** | The text to classify | +| `with_probabilities` | boolean | Optional | Whether to return full probability distribution (default: false) | + +#### Response + +```json +{ + "content": [{ + "type": "text", + "text": "{ + \"class\": 0, + \"confidence\": 0.92, + \"model\": \"openai/gpt-oss-20b\", + \"use_reasoning\": false, + \"probabilities\": [0.92, 0.03, 0.02, 0.02, 0.01], + \"entropy\": 0.45 + }" + }], + "isError": false +} +``` + +#### Response Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `class` | integer | **Yes** | Category index (0-based, matches `list_categories` order) | +| `confidence` | float | **Yes** | Classification confidence score (0.0 to 1.0) | +| `model` | string | **Yes** | Recommended model for this query type | +| `use_reasoning` | boolean | **Yes** | Whether to enable reasoning/chain-of-thought | +| `probabilities` | array[float] | Optional | Probability distribution across all categories | +| `entropy` | float | Optional | Shannon entropy of the distribution | + +**Notes:** + +- `class` must be a valid index into the `categories` array from `list_categories` +- `confidence` should represent the certainty of the classification +- `model` should use the format expected by your LLM backend (e.g., "openai/gpt-4") +- `use_reasoning` guides the router's reasoning parameter selection +- `probabilities` length must match the number of categories +- `entropy` can be used for uncertainty monitoring + +## Routing Intelligence + +### Model Selection + +The `model` field allows classifiers to recommend different models based on query characteristics: + +```python +# Examples of routing logic +if category == "math" and is_simple_calculation: + return "openai/gpt-oss-20b", False # Fast model, no reasoning + +elif category == "code" and is_complex_task: + return "deepseek/deepseek-coder", True # Specialized model with reasoning + +elif confidence < 0.6: + return "openai/gpt-4", True # High-quality model for uncertain cases +``` + +### Reasoning Control + +The `use_reasoning` field enables/disables chain-of-thought reasoning: + +```python +# Reasoning decision logic +if word_count > 20 and has_complex_words: + use_reasoning = True # Long complex queries benefit from reasoning + +elif category == "math" and is_simple: + use_reasoning = False # Simple math doesn't need reasoning overhead + +elif confidence < 0.6: + use_reasoning = True # Low confidence → use reasoning for safety +``` + +## HTTP API Details + +### Endpoint Structure + +MCP servers should support both styles: + +#### Style 1: Single Endpoint (JSON-RPC) + +``` +POST /mcp +Content-Type: application/json + +{ + "method": "tools/call", + "params": { + "name": "classify_text", + "arguments": { "text": "..." } + }, + "id": 1 +} +``` + +#### Style 2: REST-Style (Recommended) + +``` +POST /mcp/tools/call +Content-Type: application/json + +{ + "name": "classify_text", + "arguments": { "text": "..." } +} +``` + +### Health Check Endpoint + +Optional but recommended: + +``` +GET /health + +Response: +{ + "status": "ok", + "categories": ["math", "science", ...], + "model": "Qwen/Qwen3-Embedding-0.6B", + "index_size": 95 +} +``` + +### Error Handling + +Return errors in the MCP format: + +```json +{ + "content": [{ + "type": "text", + "text": "{\"error\": \"Error message here\"}" + }], + "isError": false +} +``` + +Or as JSON-RPC error: + +```json +{ + "jsonrpc": "2.0", + "id": 1, + "error": { + "code": -32603, + "message": "Internal error: ..." + } +} +``` + +## Initialization Sequence + +When the semantic router starts: + +```mermaid +sequenceDiagram + participant Router + participant MCP Server + + Router->>MCP Server: POST /mcp/initialize + MCP Server-->>Router: Server capabilities + + Router->>MCP Server: POST /mcp/tools/list + MCP Server-->>Router: Available tools + + Router->>MCP Server: classify_text → list_categories + MCP Server-->>Router: Categories + prompts + + Note over Router: Router is ready + + Router->>MCP Server: classify_text → classify_text + MCP Server-->>Router: Classification result +``` + +## Configuration in Semantic Router + +### Basic Configuration + +```yaml +classification: + type: mcp + mcp_server: + url: "http://localhost:8090/mcp" + tools: + classify: "classify_text" + list_categories: "list_categories" +``` + +### Advanced Configuration + +```yaml +classification: + type: mcp + mcp_server: + url: "http://localhost:8090/mcp" + tools: + classify: "classify_text" + list_categories: "list_categories" + timeout: 5s # Request timeout + max_retries: 3 # Retry failed requests + cache_categories: true # Cache list_categories result + refresh_interval: 5m # Re-fetch categories periodically + num_categories: 5 # Expected number (validated against server) + confidence_threshold: 0.6 # Minimum confidence for classification + fallback_category: "general" # Category when confidence too low +``` + +## Performance Considerations + +### Latency + +- Target classification latency: under 100ms for production +- Use caching for `list_categories` (changes infrequently) +- Consider async/concurrent classification for batch requests + +### Caching + +Router may cache: + +- `list_categories` response (categories rarely change) +- Recent `classify_text` results (identical queries) + +Implement cache-friendly behavior: + +- Deterministic results for same input +- Reasonable confidence scores +- Stable category definitions + +### Load Balancing + +For high-volume deployments: + +- Run multiple MCP server instances +- Use load balancer in front of servers +- Consider stateless design (no server-side sessions) + +## Validation + +### Category Index Validation + +```python +# Server must ensure: +assert 0 <= class_index < len(categories) +assert class_index == category_to_index[category_name] +``` + +### Probability Validation + +```python +# If returning probabilities: +assert len(probabilities) == len(categories) +assert 0.95 <= sum(probabilities) <= 1.05 # Allow rounding error +assert all(0 <= p <= 1 for p in probabilities) +``` + +### Confidence Validation + +```python +# Confidence should be meaningful: +assert 0 <= confidence <= 1 +assert confidence >= max(probabilities) * 0.9 # Roughly consistent +``` + +## Testing Your Implementation + +### Using cURL + +```bash +# Test list_categories +curl -X POST http://localhost:8090/mcp/tools/call \ + -H "Content-Type: application/json" \ + -d '{"name": "list_categories", "arguments": {}}' + +# Test classify_text +curl -X POST http://localhost:8090/mcp/tools/call \ + -H "Content-Type: application/json" \ + -d '{"name": "classify_text", "arguments": {"text": "What is 2+2?"}}' +``` + +### Using MCP Inspector + +```bash +npm install -g @modelcontextprotocol/inspector +mcp-inspector python server.py +``` + +### Integration Test + +```bash +# Start your MCP server +python server.py --http --port 8090 + +# Configure semantic router to use it +# Send test queries through the router +# Verify correct classification and routing +``` + +## Security Considerations + +### Input Validation + +- Sanitize all text inputs +- Enforce maximum text length +- Rate limit classification requests + +### Authentication + +Consider adding authentication for production: + +```yaml +classification: + type: mcp + mcp_server: + url: "http://localhost:8090/mcp" + headers: + Authorization: "Bearer ${MCP_API_KEY}" +``` + +### Network Security + +- Use HTTPS in production +- Implement proper TLS certificate validation +- Restrict network access to MCP server + +## Versioning + +Include version information in server responses: + +```json +{ + "protocolVersion": "2024-11-05", + "serverInfo": { + "name": "embedding-classifier", + "version": "1.0.0" + } +} +```