Skip to content

Conversation

rootfs
Copy link
Collaborator

@rootfs rootfs commented Oct 13, 2025

What type of PR is this?

Follow up #368 with example embedding based mcp classification server and documentation of how to extend the in-tree classification with mcp based approach.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

@rootfs rootfs requested a review from Xunzhuo as a code owner October 13, 2025 22:06
Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit c4cdf30
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68ed98efa0f92000080b1184
😎 Deploy Preview https://deploy-preview-417--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rootfs rootfs requested review from Copilot and wangchen615 October 13, 2025 22:06
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements MCP classification server support for semantic router by adding comprehensive documentation and a production-ready embedding-based classifier example. This enables externalized, dynamic classification logic that can be updated without router restarts.

Key changes:

  • Complete MCP classification protocol specification with HTTP/stdio transport modes
  • Overview documentation explaining architecture, benefits, and implementation options
  • Production-ready embedding classifier using Qwen3-Embedding-0.6B model with FAISS vector search
  • Training dataset with 95 labeled examples across 5 categories

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
website/docs/tutorials/mcp-classification/protocol.md Comprehensive protocol specification for MCP classification servers
website/docs/tutorials/mcp-classification/overview.md Architecture overview and implementation guidance for MCP classification
examples/mcp-classifier-server/training_data.csv Training dataset with categorized examples for embedding classifier
examples/mcp-classifier-server/server_embedding.py Production embedding-based MCP classifier implementation
examples/mcp-classifier-server/requirements_embedding.txt Dependencies for embedding-based classifier
examples/mcp-classifier-server/README.md Updated documentation comparing regex and embedding implementations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
```

#### Response
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The JSON structure is embedded as a string within JSON, making it difficult to parse and validate. Consider showing the parsed structure directly or using a cleaner format for better readability.

Copilot uses AI. Check for mistakes.

Comment on lines +419 to +423
return classifier


# Initialize MCP server
app = Server("embedding-classifier")
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The global classifier variable and initialization pattern could lead to issues in multi-threaded environments. Consider using a thread-safe singleton pattern or dependency injection instead of global state.

Copilot uses AI. Check for mistakes.

model_name, trust_remote_code=True
)
self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
self.device = torch.device("cpu")
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device is hardcoded to CPU, which will significantly impact performance if GPU is available. Consider auto-detecting GPU availability with fallback to CPU.

Suggested change
self.device = torch.device("cpu")
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Copilot uses AI. Check for mistakes.

Comment on lines 684 to 687
import asyncio

while True:
await asyncio.sleep(3600)
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asyncio import inside the function creates unnecessary overhead. Move the import to the top of the file with other imports.

Copilot uses AI. Check for mistakes.

logger.error(f"Error handling request: {e}", exc_info=True)
error = {
"jsonrpc": "2.0",
"id": request.get("id") if isinstance(request, dict) else None,
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable request shadows the aiohttp request object. This should reference the JSON data instead of the HTTP request object.

Copilot uses AI. Check for mistakes.

Copy link

github-actions bot commented Oct 13, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • examples/mcp-classifier-server/requirements_embedding.txt
  • examples/mcp-classifier-server/server_embedding.py
  • examples/mcp-classifier-server/training_data.csv
  • examples/mcp-classifier-server/README.md

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/tutorials/mcp-classification/overview.md
  • website/docs/tutorials/mcp-classification/protocol.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
@rootfs rootfs requested a review from Copilot October 14, 2025 00:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

## Initialization Sequence

When the semantic router starts:

Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The mermaid diagram may not render correctly if the website doesn't support mermaid syntax. Consider adding a note about mermaid support or providing an alternative representation.

Suggested change
> **Note:** The following diagram uses [Mermaid](https://mermaid-js.github.io/mermaid/) syntax. It will only render if your documentation viewer supports Mermaid diagrams.
>
> **Text alternative:**
> - Router sends `POST /mcp/initialize` to MCP Server
> - MCP Server responds with server capabilities
> - Router sends `POST /mcp/tools/list` to MCP Server
> - MCP Server responds with available tools
> - Router calls `classify_text → list_categories` on MCP Server
> - MCP Server responds with categories and prompts
> - Router is ready
> - Router calls `classify_text → classify_text` on MCP Server
> - MCP Server responds with classification result

Copilot uses AI. Check for mistakes.

Comment on lines +415 to +419
# Initialize classifier globally
# Note: This is safe for aiohttp as it uses a single-threaded event loop.
# For multi-process deployments, each process gets its own instance.
classifier = None
classifier_device = "auto" # Default device setting
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global state with lazy initialization can cause race conditions if multiple threads access get_classifier() simultaneously. Consider using threading.Lock or asyncio.Lock for thread-safe initialization.

Copilot uses AI. Check for mistakes.

Comment on lines +661 to +667
error = {
"jsonrpc": "2.0",
"id": (
data.get("id")
if "data" in locals() and isinstance(data, dict)
else None
),
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling logic checks for 'data' in locals() which may not work as expected in exception handlers where 'data' might not be defined. This could cause a NameError. Consider using a try-except block around data access instead.

Suggested change
error = {
"jsonrpc": "2.0",
"id": (
data.get("id")
if "data" in locals() and isinstance(data, dict)
else None
),
try:
request_id = data.get("id") if isinstance(data, dict) else None
except Exception:
request_id = None
error = {
"jsonrpc": "2.0",
"id": request_id,

Copilot uses AI. Check for mistakes.

Comment on lines +713 to +717
try:
while True:
await asyncio.sleep(3600)
except KeyboardInterrupt:
logger.info("Shutting down server...")
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The infinite loop with 1-hour sleep intervals is unnecessarily resource-intensive. Consider using asyncio.Event or a more efficient waiting mechanism for graceful shutdown.

Suggested change
try:
while True:
await asyncio.sleep(3600)
except KeyboardInterrupt:
logger.info("Shutting down server...")
shutdown_event = asyncio.Event()
try:
await shutdown_event.wait()
except KeyboardInterrupt:
logger.info("Shutting down server...")
shutdown_event.set()

Copilot uses AI. Check for mistakes.

general,How do I improve my sleep quality?,Health and wellness
general,What are some creative hobbies to try?,Arts and crafts
general,How can I be more environmentally friendly?,Sustainability and lifestyle

Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV file has an empty trailing line which could cause parsing issues in some CSV readers. Consider removing the empty line at the end of the file.

Suggested change

Copilot uses AI. Check for mistakes.

@rootfs rootfs merged commit 7d3a9f8 into vllm-project:main Oct 14, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants