22
33Example MCP servers that provide text classification with intelligent routing for the semantic router.
44
5- ## 📦 Two Implementations
5+ ## 📦 Three Implementations
66
7- This directory contains ** two MCP classification servers** :
7+ This directory contains ** three MCP classification servers** :
88
99### 1. ** Regex-Based Server** (` server.py ` )
1010
@@ -13,17 +13,27 @@ This directory contains **two MCP classification servers**:
1313- ✅ ** No Dependencies** - Just MCP SDK
1414- 📝 ** Best For** : Prototyping, simple rules, low-latency requirements
1515
16- ### 2. ** Embedding-Based Server** (` server_embedding.py ` ) 🆕
16+ ### 2. ** Embedding-Based Server** (` server_embedding.py ` )
1717
1818- ✅ ** High Accuracy** - Semantic understanding with Qwen3-Embedding-0.6B
1919- ✅ ** RAG-Style** - FAISS vector database with similarity search
2020- ✅ ** Flexible** - Handles paraphrases, synonyms, variations
21- - 📝 ** Best For** : Production use, high-accuracy requirements
21+ - 📝 ** Best For** : Production use when you have good training examples
22+
23+ ### 3. ** Generative Model Server** (` server_generative.py ` ) 🆕
24+
25+ - ✅ ** Highest Accuracy** - Fine-tuned Qwen3 generative model
26+ - ✅ ** True Probabilities** - Softmax-based probability distributions
27+ - ✅ ** Better Generalization** - Learns category patterns, not just examples
28+ - ✅ ** Entropy Calculation** - Shannon entropy for uncertainty quantification
29+ - ✅ ** HuggingFace Support** - Load models from HuggingFace Hub or local paths
30+ - 📝 ** Best For** : Production use with fine-tuned models (70-85% accuracy)
2231
2332** Choose based on your needs:**
2433
2534- ** Quick start / Testing?** → Use ` server.py ` (regex-based)
26- - ** Production / Accuracy?** → Use ` server_embedding.py ` (embedding-based)
35+ - ** Production with training examples?** → Use ` server_embedding.py ` (embedding-based)
36+ - ** Production with fine-tuned model?** → Use ` server_generative.py ` (generative model)
2737
2838---
2939
@@ -217,10 +227,83 @@ python3 server_embedding.py --http --port 8090
217227
218228# ## Comparison
219229
220- | Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) |
221- |---------|---------------------|-----------------------------------|
222- | **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
223- | **Speed** | ~1-5ms | ~50-100ms |
224- | **Memory** | ~10MB | ~600MB |
225- | **Setup** | Simple | Requires model |
226- | **Best For** | Prototyping | Production |
230+ | Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) | Generative (`server_generative.py`) |
231+ |---------|---------------------|-----------------------------------|-------------------------------------|
232+ | **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
233+ | **Speed** | ~1-5ms | ~50-100ms | ~100-200ms (GPU) |
234+ | **Memory** | ~10MB | ~600MB | ~2GB (GPU) / ~4GB (CPU) |
235+ | **Setup** | Simple | CSV + embeddings | Fine-tuned model required |
236+ | **Probabilities** | Rule-based | Similarity scores | Softmax (true) |
237+ | **Entropy** | No | Manual calculation | Built-in (Shannon) |
238+ | **Best For** | Prototyping | Examples-based production | Model-based production |
239+
240+ ---
241+
242+ # # Generative Model Server (`server_generative.py`)
243+
244+ For **production use with a fine-tuned model and highest accuracy**, see the generative model server.
245+
246+ # ## Quick Start
247+
248+ **Option 1: Use Pre-trained HuggingFace Model** (Easiest)
249+
250+ ` ` ` bash
251+ # Server automatically downloads from HuggingFace Hub
252+ python server_generative.py --http --port 8092 --model-path llm-semantic-router/qwen3_generative_classifier_r16
253+ ` ` `
254+
255+ **Option 2: Train Your Own Model**
256+
257+ Step 1 : Train the model
258+
259+ ` ` ` bash
260+ cd ../../../src/training/training_lora/classifier_model_fine_tuning_lora/
261+ python ft_qwen3_generative_lora.py --mode train --epochs 8 --lora-rank 16
262+ # Creates: qwen3_generative_classifier_r16/
263+ ` ` `
264+
265+ Step 2 : Start the server
266+
267+ ` ` ` bash
268+ cd - # Back to examples/mcp-classifier-server/
269+ python server_generative.py --http --port 8092 --model-path ../../../src/training/training_lora/classifier_model_fine_tuning_lora/qwen3_generative_classifier_r16
270+ ` ` `
271+
272+ # ## Features
273+
274+ - **Fine-tuned Qwen3-0.6B** generative model with LoRA
275+ - **Softmax probabilities** from model logits (true probability distribution)
276+ - **Shannon entropy** for uncertainty quantification
277+ - **14 MMLU-Pro categories** (biology, business, chemistry, CS, economics, engineering, health, history, law, math, other, philosophy, physics, psychology)
278+ - **Same MCP protocol** as other servers (drop-in replacement)
279+ - **Highest accuracy** - 70-85% on validation set
280+
281+ # ## Why Use Generative Server?
282+
283+ **Advantages over Embedding Server:**
284+
285+ - ✅ True probability distributions (softmax-based, not similarity-based)
286+ - ✅ Better generalization beyond training examples
287+ - ✅ More accurate classification (70-85% vs ~60-70%)
288+ - ✅ Built-in entropy calculation for uncertainty
289+ - ✅ Fine-tuned on task-specific data
290+
291+ **When to Use:**
292+
293+ - You have training data to fine-tune a model
294+ - Need highest accuracy for production
295+ - Want true probability distributions
296+ - Need uncertainty quantification (entropy)
297+ - Can afford 2-4GB memory footprint
298+
299+ # ## Testing
300+
301+ Test the generative server with sample queries :
302+
303+ ` ` ` bash
304+ python test_generative.py --model-path qwen3_generative_classifier_r16
305+ ` ` `
306+
307+ # ## Documentation
308+
309+ For detailed documentation, see [README_GENERATIVE.md](README_GENERATIVE.md).
0 commit comments