add the decoder based classification mcp server

rootfs · rootfs · commit 0c9002fe7461 · 2025-10-14T23:25:22.000Z
Signed-off-by: Huamin Chen &lt;hchen@redhat.com&gt;
diff --git a/examples/mcp-classifier-server/README.md b/examples/mcp-classifier-server/README.md
@@ -2,9 +2,9 @@
 
 Example MCP servers that provide text classification with intelligent routing for the semantic router.
 
-## 📦 Two Implementations
+## 📦 Three Implementations
 
-This directory contains **two MCP classification servers**:
+This directory contains **three MCP classification servers**:
 
 ### 1. **Regex-Based Server** (`server.py`)
 
@@ -13,17 +13,27 @@ This directory contains **two MCP classification servers**:
 - ✅ **No Dependencies** - Just MCP SDK
 - 📝 **Best For**: Prototyping, simple rules, low-latency requirements
 
-### 2. **Embedding-Based Server** (`server_embedding.py`) 🆕
+### 2. **Embedding-Based Server** (`server_embedding.py`)
 
 - ✅ **High Accuracy** - Semantic understanding with Qwen3-Embedding-0.6B
 - ✅ **RAG-Style** - FAISS vector database with similarity search
 - ✅ **Flexible** - Handles paraphrases, synonyms, variations
-- 📝 **Best For**: Production use, high-accuracy requirements
+- 📝 **Best For**: Production use when you have good training examples
+
+### 3. **Generative Model Server** (`server_generative.py`) 🆕
+
+- ✅ **Highest Accuracy** - Fine-tuned Qwen3 generative model
+- ✅ **True Probabilities** - Softmax-based probability distributions
+- ✅ **Better Generalization** - Learns category patterns, not just examples
+- ✅ **Entropy Calculation** - Shannon entropy for uncertainty quantification
+- ✅ **HuggingFace Support** - Load models from HuggingFace Hub or local paths
+- 📝 **Best For**: Production use with fine-tuned models (70-85% accuracy)
 
 **Choose based on your needs:**
 
 - **Quick start / Testing?** → Use `server.py` (regex-based)
-- **Production / Accuracy?** → Use `server_embedding.py` (embedding-based)
+- **Production with training examples?** → Use `server_embedding.py` (embedding-based)
+- **Production with fine-tuned model?** → Use `server_generative.py` (generative model)
 
 ---
 
@@ -217,10 +227,83 @@ python3 server_embedding.py --http --port 8090
 
 ### Comparison
 
-| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) |
-|---------|---------------------|-----------------------------------|
-| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
-| **Speed** | ~1-5ms | ~50-100ms |
-| **Memory** | ~10MB | ~600MB |
-| **Setup** | Simple | Requires model |
-| **Best For** | Prototyping | Production |
+| Feature | Regex (`server.py`) | Embedding (`server_embedding.py`) | Generative (`server_generative.py`) |
+|---------|---------------------|-----------------------------------|-------------------------------------|
+| **Accuracy** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
+| **Speed** | ~1-5ms | ~50-100ms | ~100-200ms (GPU) |
+| **Memory** | ~10MB | ~600MB | ~2GB (GPU) / ~4GB (CPU) |
+| **Setup** | Simple | CSV + embeddings | Fine-tuned model required |
+| **Probabilities** | Rule-based | Similarity scores | Softmax (true) |
+| **Entropy** | No | Manual calculation | Built-in (Shannon) |
+| **Best For** | Prototyping | Examples-based production | Model-based production |
+
+---
+
+## Generative Model Server (`server_generative.py`)
+
+For **production use with a fine-tuned model and highest accuracy**, see the generative model server.
+
+### Quick Start
+
+**Option 1: Use Pre-trained HuggingFace Model** (Easiest)
+
+```bash
+# Server automatically downloads from HuggingFace Hub
+python server_generative.py --http --port 8092 --model-path llm-semantic-router/qwen3_generative_classifier_r16
+```
+
+**Option 2: Train Your Own Model**
+
+Step 1: Train the model
+
+```bash
+cd ../../../src/training/training_lora/classifier_model_fine_tuning_lora/
+python ft_qwen3_generative_lora.py --mode train --epochs 8 --lora-rank 16
+# Creates: qwen3_generative_classifier_r16/
+```
+
+Step 2: Start the server
+
+```bash
+cd -  # Back to examples/mcp-classifier-server/
+python server_generative.py --http --port 8092 --model-path ../../../src/training/training_lora/classifier_model_fine_tuning_lora/qwen3_generative_classifier_r16
+```
+
+### Features
+
+- **Fine-tuned Qwen3-0.6B** generative model with LoRA
+- **Softmax probabilities** from model logits (true probability distribution)
+- **Shannon entropy** for uncertainty quantification
+- **14 MMLU-Pro categories** (biology, business, chemistry, CS, economics, engineering, health, history, law, math, other, philosophy, physics, psychology)
+- **Same MCP protocol** as other servers (drop-in replacement)
+- **Highest accuracy** - 70-85% on validation set
+
+### Why Use Generative Server?
+
+**Advantages over Embedding Server:**
+
+- ✅ True probability distributions (softmax-based, not similarity-based)
+- ✅ Better generalization beyond training examples
+- ✅ More accurate classification (70-85% vs ~60-70%)
+- ✅ Built-in entropy calculation for uncertainty
+- ✅ Fine-tuned on task-specific data
+
+**When to Use:**
+
+- You have training data to fine-tune a model
+- Need highest accuracy for production
+- Want true probability distributions
+- Need uncertainty quantification (entropy)
+- Can afford 2-4GB memory footprint
+
+### Testing
+
+Test the generative server with sample queries:
+
+```bash
+python test_generative.py --model-path qwen3_generative_classifier_r16
+```
+
+### Documentation
+
+For detailed documentation, see [README_GENERATIVE.md](README_GENERATIVE.md).
diff --git a/examples/mcp-classifier-server/requirements_generative.txt b/examples/mcp-classifier-server/requirements_generative.txt
@@ -0,0 +1,15 @@
+# Requirements for Generative Model-Based MCP Classification Server
+# server_generative.py
+
+# Core dependencies
+torch>=2.0.0
+transformers>=4.30.0
+peft>=0.4.0
+huggingface_hub>=0.16.0
+
+# MCP SDK
+mcp>=0.1.0
+
+# HTTP mode (optional)
+aiohttp>=3.8.0
+
diff --git a/examples/mcp-classifier-server/server_generative.py b/examples/mcp-classifier-server/server_generative.py