Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 95 additions & 12 deletions examples/mcp-classifier-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Example MCP servers that provide text classification with intelligent routing for the semantic router.

## 📦 Two Implementations
## 📦 Three Implementations

This directory contains **two MCP classification servers**:
This directory contains **three MCP classification servers**:

### 1. **Regex-Based Server** (`server.py`)

Expand All @@ -13,17 +13,27 @@ This directory contains **two MCP classification servers**:
- ✅ **No Dependencies** - Just MCP SDK
- 📝 **Best For**: Prototyping, simple rules, low-latency requirements

### 2. **Embedding-Based Server** (`server_embedding.py`) 🆕
### 2. **Embedding-Based Server** (`server_embedding.py`)

- ✅ **High Accuracy** - Semantic understanding with Qwen3-Embedding-0.6B
- ✅ **RAG-Style** - FAISS vector database with similarity search
- ✅ **Flexible** - Handles paraphrases, synonyms, variations
- 📝 **Best For**: Production use, high-accuracy requirements
- 📝 **Best For**: Production use when you have good training examples

### 3. **Generative Model Server** (`server_generative.py`) 🆕

- ✅ **Highest Accuracy** - Fine-tuned Qwen3 generative model
- ✅ **True Probabilities** - Softmax-based probability distributions
- ✅ **Better Generalization** - Learns category patterns, not just examples
- ✅ **Entropy Calculation** - Shannon entropy for uncertainty quantification
- ✅ **HuggingFace Support** - Load models from HuggingFace Hub or local paths
- 📝 **Best For**: Production use with fine-tuned models (70-85% accuracy)

**Choose based on your needs:**

- **Quick start / Testing?** → Use `server.py` (regex-based)
- **Production / Accuracy?** → Use `server_embedding.py` (embedding-based)
- **Production with training examples?** → Use `server_embedding.py` (embedding-based)
- **Production with fine-tuned model?** → Use `server_generative.py` (generative model)

---

Expand Down Expand Up @@ -217,10 +227,83 @@ python3 server_embedding.py --http --port 8090

### Comparison

| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) |
|---------|---------------------|-----------------------------------|
| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| **Speed** | ~1-5ms | ~50-100ms |
| **Memory** | ~10MB | ~600MB |
| **Setup** | Simple | Requires model |
| **Best For** | Prototyping | Production |
| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) | Generative (`server_generative.py`) |
|---------|---------------------|-----------------------------------|-------------------------------------|
| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| **Speed** | ~1-5ms | ~50-100ms | ~100-200ms (GPU) |
| **Memory** | ~10MB | ~600MB | ~2GB (GPU) / ~4GB (CPU) |
| **Setup** | Simple | CSV + embeddings | Fine-tuned model required |
| **Probabilities** | Rule-based | Similarity scores | Softmax (true) |
| **Entropy** | No | Manual calculation | Built-in (Shannon) |
| **Best For** | Prototyping | Examples-based production | Model-based production |

---

## Generative Model Server (`server_generative.py`)

For **production use with a fine-tuned model and highest accuracy**, see the generative model server.

### Quick Start

**Option 1: Use Pre-trained HuggingFace Model** (Easiest)

```bash
# Server automatically downloads from HuggingFace Hub
python server_generative.py --http --port 8092 --model-path llm-semantic-router/qwen3_generative_classifier_r16
```

**Option 2: Train Your Own Model**

Step 1: Train the model

```bash
cd ../../../src/training/training_lora/classifier_model_fine_tuning_lora/
python ft_qwen3_generative_lora.py --mode train --epochs 8 --lora-rank 16
# Creates: qwen3_generative_classifier_r16/
```

Step 2: Start the server

```bash
cd - # Back to examples/mcp-classifier-server/
python server_generative.py --http --port 8092 --model-path ../../../src/training/training_lora/classifier_model_fine_tuning_lora/qwen3_generative_classifier_r16
```

### Features

- **Fine-tuned Qwen3-0.6B** generative model with LoRA
- **Softmax probabilities** from model logits (true probability distribution)
- **Shannon entropy** for uncertainty quantification
- **14 MMLU-Pro categories** (biology, business, chemistry, CS, economics, engineering, health, history, law, math, other, philosophy, physics, psychology)
- **Same MCP protocol** as other servers (drop-in replacement)
- **Highest accuracy** - 70-85% on validation set

### Why Use Generative Server?

**Advantages over Embedding Server:**

- ✅ True probability distributions (softmax-based, not similarity-based)
- ✅ Better generalization beyond training examples
- ✅ More accurate classification (70-85% vs ~60-70%)
- ✅ Built-in entropy calculation for uncertainty
- ✅ Fine-tuned on task-specific data

**When to Use:**

- You have training data to fine-tune a model
- Need highest accuracy for production
- Want true probability distributions
- Need uncertainty quantification (entropy)
- Can afford 2-4GB memory footprint

### Testing

Test the generative server with sample queries:

```bash
python test_generative.py --model-path qwen3_generative_classifier_r16
```

### Documentation

For detailed documentation, see [README_GENERATIVE.md](README_GENERATIVE.md).
15 changes: 15 additions & 0 deletions examples/mcp-classifier-server/requirements_generative.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Requirements for Generative Model-Based MCP Classification Server
# server_generative.py

# Core dependencies
torch>=2.0.0
transformers>=4.30.0
peft>=0.4.0
huggingface_hub>=0.16.0

# MCP SDK
mcp>=0.1.0

# HTTP mode (optional)
aiohttp>=3.8.0

39 changes: 35 additions & 4 deletions examples/mcp-classifier-server/server_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,9 +592,14 @@ async def handle_mcp_request(request):
init_result = {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {},
"tools": {}, # We support tools
# Note: We don't support resources or prompts
},
"serverInfo": {
"name": "embedding-classifier",
"version": "1.0.0",
"description": "Embedding-based text classification with semantic similarity",
},
"serverInfo": {"name": "embedding-classifier", "version": "1.0.0"},
}

if request.path.startswith("/mcp/") and request.path != "/mcp":
Expand Down Expand Up @@ -648,13 +653,38 @@ async def handle_mcp_request(request):
result = {"jsonrpc": "2.0", "id": request_id, "result": {}}
return web.json_response(result)

# Handle unsupported but valid MCP methods gracefully
elif method in [
"resources/list",
"resources/read",
"prompts/list",
"prompts/get",
]:
# These are valid MCP methods but not implemented in this server
# Return empty results instead of error for better compatibility
logger.debug(
f"Unsupported method called: {method} (returning empty result)"
)

if method == "resources/list":
result_data = {"resources": []}
elif method == "prompts/list":
result_data = {"prompts": []}
else:
result_data = {}

result = {"jsonrpc": "2.0", "id": request_id, "result": result_data}
return web.json_response(result)

else:
# Unknown method - return error with HTTP 200 (per JSON-RPC spec)
logger.warning(f"Unknown method called: {method}")
error = {
"jsonrpc": "2.0",
"id": request_id,
"error": {"code": -32601, "message": f"Method not found: {method}"},
}
return web.json_response(error, status=404)
return web.json_response(error)

except Exception as e:
logger.error(f"Error handling request: {e}", exc_info=True)
Expand All @@ -667,7 +697,8 @@ async def handle_mcp_request(request):
),
"error": {"code": -32603, "message": f"Internal error: {str(e)}"},
}
return web.json_response(error, status=500)
# Per JSON-RPC 2.0 spec, return HTTP 200 even for errors
return web.json_response(error)

async def health_check(request):
"""Health check endpoint."""
Expand Down
Loading
Loading