Skip to content

Commit 0c9002f

Browse files
committed
add the decoder based classification mcp server
Signed-off-by: Huamin Chen <[email protected]>
1 parent 06f717a commit 0c9002f

File tree

3 files changed

+982
-12
lines changed

3 files changed

+982
-12
lines changed

examples/mcp-classifier-server/README.md

Lines changed: 95 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
Example MCP servers that provide text classification with intelligent routing for the semantic router.
44

5-
## 📦 Two Implementations
5+
## 📦 Three Implementations
66

7-
This directory contains **two MCP classification servers**:
7+
This directory contains **three MCP classification servers**:
88

99
### 1. **Regex-Based Server** (`server.py`)
1010

@@ -13,17 +13,27 @@ This directory contains **two MCP classification servers**:
1313
-**No Dependencies** - Just MCP SDK
1414
- 📝 **Best For**: Prototyping, simple rules, low-latency requirements
1515

16-
### 2. **Embedding-Based Server** (`server_embedding.py`) 🆕
16+
### 2. **Embedding-Based Server** (`server_embedding.py`)
1717

1818
-**High Accuracy** - Semantic understanding with Qwen3-Embedding-0.6B
1919
-**RAG-Style** - FAISS vector database with similarity search
2020
-**Flexible** - Handles paraphrases, synonyms, variations
21-
- 📝 **Best For**: Production use, high-accuracy requirements
21+
- 📝 **Best For**: Production use when you have good training examples
22+
23+
### 3. **Generative Model Server** (`server_generative.py`) 🆕
24+
25+
-**Highest Accuracy** - Fine-tuned Qwen3 generative model
26+
-**True Probabilities** - Softmax-based probability distributions
27+
-**Better Generalization** - Learns category patterns, not just examples
28+
-**Entropy Calculation** - Shannon entropy for uncertainty quantification
29+
-**HuggingFace Support** - Load models from HuggingFace Hub or local paths
30+
- 📝 **Best For**: Production use with fine-tuned models (70-85% accuracy)
2231

2332
**Choose based on your needs:**
2433

2534
- **Quick start / Testing?** → Use `server.py` (regex-based)
26-
- **Production / Accuracy?** → Use `server_embedding.py` (embedding-based)
35+
- **Production with training examples?** → Use `server_embedding.py` (embedding-based)
36+
- **Production with fine-tuned model?** → Use `server_generative.py` (generative model)
2737

2838
---
2939

@@ -217,10 +227,83 @@ python3 server_embedding.py --http --port 8090
217227

218228
### Comparison
219229

220-
| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) |
221-
|---------|---------------------|-----------------------------------|
222-
| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
223-
| **Speed** | ~1-5ms | ~50-100ms |
224-
| **Memory** | ~10MB | ~600MB |
225-
| **Setup** | Simple | Requires model |
226-
| **Best For** | Prototyping | Production |
230+
| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) | Generative (`server_generative.py`) |
231+
|---------|---------------------|-----------------------------------|-------------------------------------|
232+
| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
233+
| **Speed** | ~1-5ms | ~50-100ms | ~100-200ms (GPU) |
234+
| **Memory** | ~10MB | ~600MB | ~2GB (GPU) / ~4GB (CPU) |
235+
| **Setup** | Simple | CSV + embeddings | Fine-tuned model required |
236+
| **Probabilities** | Rule-based | Similarity scores | Softmax (true) |
237+
| **Entropy** | No | Manual calculation | Built-in (Shannon) |
238+
| **Best For** | Prototyping | Examples-based production | Model-based production |
239+
240+
---
241+
242+
## Generative Model Server (`server_generative.py`)
243+
244+
For **production use with a fine-tuned model and highest accuracy**, see the generative model server.
245+
246+
### Quick Start
247+
248+
**Option 1: Use Pre-trained HuggingFace Model** (Easiest)
249+
250+
```bash
251+
# Server automatically downloads from HuggingFace Hub
252+
python server_generative.py --http --port 8092 --model-path llm-semantic-router/qwen3_generative_classifier_r16
253+
```
254+
255+
**Option 2: Train Your Own Model**
256+
257+
Step 1: Train the model
258+
259+
```bash
260+
cd ../../../src/training/training_lora/classifier_model_fine_tuning_lora/
261+
python ft_qwen3_generative_lora.py --mode train --epochs 8 --lora-rank 16
262+
# Creates: qwen3_generative_classifier_r16/
263+
```
264+
265+
Step 2: Start the server
266+
267+
```bash
268+
cd - # Back to examples/mcp-classifier-server/
269+
python server_generative.py --http --port 8092 --model-path ../../../src/training/training_lora/classifier_model_fine_tuning_lora/qwen3_generative_classifier_r16
270+
```
271+
272+
### Features
273+
274+
- **Fine-tuned Qwen3-0.6B** generative model with LoRA
275+
- **Softmax probabilities** from model logits (true probability distribution)
276+
- **Shannon entropy** for uncertainty quantification
277+
- **14 MMLU-Pro categories** (biology, business, chemistry, CS, economics, engineering, health, history, law, math, other, philosophy, physics, psychology)
278+
- **Same MCP protocol** as other servers (drop-in replacement)
279+
- **Highest accuracy** - 70-85% on validation set
280+
281+
### Why Use Generative Server?
282+
283+
**Advantages over Embedding Server:**
284+
285+
- ✅ True probability distributions (softmax-based, not similarity-based)
286+
- ✅ Better generalization beyond training examples
287+
- ✅ More accurate classification (70-85% vs ~60-70%)
288+
- ✅ Built-in entropy calculation for uncertainty
289+
- ✅ Fine-tuned on task-specific data
290+
291+
**When to Use:**
292+
293+
- You have training data to fine-tune a model
294+
- Need highest accuracy for production
295+
- Want true probability distributions
296+
- Need uncertainty quantification (entropy)
297+
- Can afford 2-4GB memory footprint
298+
299+
### Testing
300+
301+
Test the generative server with sample queries:
302+
303+
```bash
304+
python test_generative.py --model-path qwen3_generative_classifier_r16
305+
```
306+
307+
### Documentation
308+
309+
For detailed documentation, see [README_GENERATIVE.md](README_GENERATIVE.md).
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Requirements for Generative Model-Based MCP Classification Server
2+
# server_generative.py
3+
4+
# Core dependencies
5+
torch>=2.0.0
6+
transformers>=4.30.0
7+
peft>=0.4.0
8+
huggingface_hub>=0.16.0
9+
10+
# MCP SDK
11+
mcp>=0.1.0
12+
13+
# HTTP mode (optional)
14+
aiohttp>=3.8.0
15+

0 commit comments

Comments
 (0)