Skip to content

Evaluate and migrate away from fastmcp: introduce official MCP SDK server engine with phased rollout #485

@jolestar

Description

@jolestar

Context

We recently hit issues caused by fastmcp lagging behind the latest official MCP SDK/protocol changes. Because fastmcp sits in our critical path (via @nuwa-ai/payment-kit -> FastMcpStarter and services like nuwa-services/mcp-server-proxy), SDK/protocol drift can quickly turn into hard-to-debug runtime incompatibilities.

Today we depend on:

  • fastmcp for server/session/tool plumbing (FastMCP, FastMCPSession)
  • mcp-proxy for the HTTP/SSE endpoint (startHTTPServer)
  • @modelcontextprotocol/sdk for types/compat, but not as the server runtime

This issue proposes a path to reduce risk by using the official @modelcontextprotocol/sdk as the server implementation, while keeping our Nuwa-specific payment/auth/tool abstractions intact.

Goal

  • Make the official @modelcontextprotocol/sdk the primary MCP server runtime for production services.
  • Keep McpPaymentKit (billing/auth/settlement) and our tool registration API stable.
  • Provide a low-risk dual-engine period (fastmcp + official SDK) with an easy rollback switch.

Non-goals

  • Rewriting business tools.
  • Changing Nuwa payment/auth protocol semantics.
  • Broad refactors outside MCP server runtime.

Proposed Design

1) Dual-engine server entrypoint

Add a unified server factory in @nuwa-ai/payment-kit:

  • createMcpServer({ engine: 'sdk' | 'fastmcp', ...opts })
    • Default remains fastmcp initially.
    • engine is configurable via env for services (e.g., MCP_ENGINE=sdk|fastmcp).

Keep existing createFastMcpServer* APIs for backward compatibility during rollout.

2) Official SDK engine (Streamable HTTP)

Implement a new starter (e.g., SdkMcpStarter) using:

  • @modelcontextprotocol/sdk/server/mcp (McpServer)
  • @modelcontextprotocol/sdk/server/streamableHttp (StreamableHTTPServerTransport)

Because StreamableHTTPServerTransport is per-session, we implement a lightweight session router:

  • POST /mcp (initialize): create a new transport with sessionIdGenerator, create a new McpServer, register tools/prompts/resources, connect transport, then delegate request handling to transport.handleRequest(...).
  • Subsequent GET/POST/DELETE /mcp: route by mcp-session-id header to the correct transport and call handleRequest(...).
  • Maintain Map<sessionId, sessionState> with timestamps for /ready and session GC.

3) Keep Nuwa reserved params and return format

  • Preserve support for __nuwa_auth and __nuwa_payment in tool args.
  • Continue returning MCP-compatible CallToolResult shape: { content: [...] }.

4) Parity endpoints

Keep current extra endpoints/behavior:

  • GET /health
  • GET /ready (based on initialized sessions)
  • GET /.well-known/nuwa-payment/info
  • OPTIONS preflight
  • customRouteHandler hook (before MCP handling)

Phased Rollout Plan

Phase 0: Baseline

  • Document current dependencies and pain points.
  • Define parity checklist (below).

Phase 1: Introduce dual-engine switch (no behavior change)

  • Add createMcpServer({ engine }) and wire engine=fastmcp to existing FastMcpStarter.
  • Add minimal contract tests that can be reused by both engines.

Phase 2: Implement official SDK engine

  • Implement SdkMcpStarter with session router + Streamable HTTP.
  • Reuse existing tool registration and billing integration.
  • Run existing MCP E2E tests against both engines (fastmcp remains default in CI; sdk can be optional/nightly initially).

Phase 3: Service-level canary

  • Add MCP_ENGINE to nuwa-services/mcp-server-proxy (and other MCP services if needed).
  • Canary sdk engine in staging / a small production slice.
  • Measure: init success rate, tool error rate, latency, session counts, memory.

Phase 4: Flip default and deprecate fastmcp

  • Default MCP_ENGINE=sdk.
  • Keep rollback to fastmcp for 1–2 releases.
  • Remove fastmcp dependency once stable (including jest transform ignores, docs, etc.).

Acceptance Criteria (Parity Checklist)

  • Tool registration works (free + paid) and billing settlement behavior matches current.
  • __nuwa_auth and __nuwa_payment are accepted and processed.
  • /mcp supports Streamable HTTP (SSE) and session management works correctly.
  • /health, /ready, and /.well-known/nuwa-payment/info behave as before.
  • Existing payment-kit MCP E2E tests pass with both engines (at least locally; CI strategy TBD).
  • Rollback is a config change only (MCP_ENGINE=fastmcp).

Risks & Mitigations

  • Session lifecycle/memory leaks: implement session TTL + GC + server shutdown cleanup.
  • Behavior differences in schema validation: keep permissive schema defaults and extend Zod with Nuwa reserved fields.
  • Breaking changes for consumers: keep old APIs and make the switch opt-in until canary proves stability.

Open Questions

  • Do we want sdk engine to be the default for all services at once or start with mcp-server-proxy only?
  • What is the required support window for fastmcp as fallback (1 release vs 2)?
  • CI strategy: run sdk engine tests always, or nightly/optional until stable?

References (repo locations)

  • nuwa-kit/typescript/packages/payment-kit/src/transport/mcp/FastMcpStarter.ts
  • nuwa-services/mcp-server-proxy/src/server.ts

If we agree on this direction, I can follow up with a PR implementing Phase 1 + Phase 2 skeleton (dual-engine switch + initial SdkMcpStarter) and a minimal parity test harness.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions