Skip to content

Latest commit

 

History

History
1121 lines (939 loc) · 28.6 KB

File metadata and controls

1121 lines (939 loc) · 28.6 KB

MCP Server Reliability & Production Readiness

Production Playbook for Model Context Protocol Developers

Building reliable MCP (Model Context Protocol) servers is critical for production Claude Code deployments. This playbook provides battle-tested patterns for health monitoring, graceful degradation, connection management, and incident response for MCP server infrastructure.

Table of Contents

  1. MCP Architecture Overview
  2. Health Check Implementation
  3. Connection Management
  4. Error Handling & Recovery
  5. Monitoring & Observability
  6. Production Deployment
  7. Production Examples
  8. Best Practices
  9. Tools & Resources
  10. Summary

MCP Architecture Overview

What is MCP?

Model Context Protocol enables Claude to interact with external tools and data sources through a standardized interface. MCP servers expose tools that Claude can invoke during conversations.

Claude Code Plugins Marketplace:

  • 6 MCP servers (2% of 258 plugins)
  • Examples: project-health-auditor, conversational-api-debugger
  • Transport: stdio (standard input/output)

MCP Server Lifecycle

// packages/mcp/example-server/src/index.ts
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';

const server = new Server(
  {
    name: 'example-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
      resources: {},
    },
  }
);

// 1. Tool Registration
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'analyze-code',
      description: 'Analyze code quality',
      inputSchema: {
        type: 'object',
        properties: {
          code: { type: 'string' },
          language: { type: 'string' }
        },
        required: ['code']
      }
    }
  ]
}));

// 2. Tool Execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'analyze-code') {
    return {
      content: [
        { type: 'text', text: 'Analysis result...' }
      ]
    };
  }
  throw new Error('Unknown tool');
});

// 3. Start Server
const transport = new StdioServerTransport();
await server.connect(transport);

Critical Points:

  • Server runs as subprocess (spawned by Claude Code)
  • Communication via stdio (stdin/stdout)
  • Must handle tool calls synchronously
  • No built-in health checks or monitoring

Health Check Implementation

Strategy 1: Internal Health Endpoint

// src/health.ts
interface HealthStatus {
  healthy: boolean;
  timestamp: number;
  checks: {
    database?: boolean;
    api?: boolean;
    memory?: boolean;
  };
  uptime: number;
  version: string;
}

class HealthChecker {
  private startTime = Date.now();
  private lastCheck: HealthStatus | null = null;

  async check(): Promise<HealthStatus> {
    const checks = await Promise.all([
      this.checkDatabase(),
      this.checkExternalAPI(),
      this.checkMemory()
    ]);

    const status: HealthStatus = {
      healthy: checks.every(c => c.healthy),
      timestamp: Date.now(),
      checks: {
        database: checks[0].healthy,
        api: checks[1].healthy,
        memory: checks[2].healthy
      },
      uptime: Date.now() - this.startTime,
      version: '1.0.0'
    };

    this.lastCheck = status;
    return status;
  }

  private async checkDatabase(): Promise<{ healthy: boolean }> {
    try {
      // Example: SQLite query
      await db.get('SELECT 1');
      return { healthy: true };
    } catch (error) {
      console.error('Database health check failed:', error);
      return { healthy: false };
    }
  }

  private async checkExternalAPI(): Promise<{ healthy: boolean }> {
    try {
      const response = await fetch('https://api.example.com/health', {
        timeout: 5000
      });
      return { healthy: response.ok };
    } catch (error) {
      return { healthy: false };
    }
  }

  private async checkMemory(): Promise<{ healthy: boolean }> {
    const used = process.memoryUsage();
    const heapLimit = 512 * 1024 * 1024; // 512MB
    return { healthy: used.heapUsed < heapLimit };
  }

  getLastStatus(): HealthStatus | null {
    return this.lastCheck;
  }
}

// Export for tool use
const healthChecker = new HealthChecker();

// Add health check tool
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'health-check',
      description: 'Check MCP server health',
      inputSchema: { type: 'object', properties: {} }
    }
  ]
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'health-check') {
    const status = await healthChecker.check();
    return {
      content: [{
        type: 'text',
        text: JSON.stringify(status, null, 2)
      }]
    };
  }
});

Strategy 2: Watchdog Process

// src/watchdog.ts
import { spawn } from 'child_process';

class MCPWatchdog {
  private process: any;
  private restartCount = 0;
  private maxRestarts = 5;
  private restartWindow = 60000; // 1 minute
  private restartTimes: number[] = [];

  async start(serverPath: string) {
    this.process = spawn('node', [serverPath], {
      stdio: ['pipe', 'pipe', 'pipe']
    });

    this.process.on('exit', (code: number) => {
      console.error(`MCP server exited with code ${code}`);
      this.handleExit();
    });

    this.process.on('error', (error: Error) => {
      console.error('MCP server error:', error);
      this.handleExit();
    });

    // Monitor stdout for health
    this.process.stdout.on('data', (data: Buffer) => {
      const message = data.toString();
      if (message.includes('ERROR')) {
        console.warn('MCP server error detected:', message);
      }
    });
  }

  private handleExit() {
    const now = Date.now();
    this.restartTimes.push(now);

    // Remove old restart times outside window
    this.restartTimes = this.restartTimes.filter(
      t => now - t < this.restartWindow
    );

    if (this.restartTimes.length >= this.maxRestarts) {
      console.error(
        `MCP server crashed ${this.maxRestarts} times in ${this.restartWindow}ms. Giving up.`
      );
      process.exit(1);
    }

    console.log(`Restarting MCP server (attempt ${this.restartTimes.length}/${this.maxRestarts})`);
    setTimeout(() => this.start(this.process.spawnfile), 1000);
  }

  stop() {
    if (this.process) {
      this.process.kill();
    }
  }
}

Connection Management

Connection Pooling for Database Access

// src/storage.ts
import sqlite3 from 'sqlite3';
import { open, Database } from 'sqlite';

class ConnectionPool {
  private pool: Database[] = [];
  private readonly maxConnections = 5;
  private readonly minConnections = 1;
  private available: Database[] = [];
  private inUse: Set<Database> = new Set();

  async initialize(dbPath: string) {
    for (let i = 0; i < this.minConnections; i++) {
      const db = await open({
        filename: dbPath,
        driver: sqlite3.Database
      });
      this.pool.push(db);
      this.available.push(db);
    }
  }

  async acquire(): Promise<Database> {
    // Use available connection
    if (this.available.length > 0) {
      const db = this.available.pop()!;
      this.inUse.add(db);
      return db;
    }

    // Create new connection if under limit
    if (this.pool.length < this.maxConnections) {
      const db = await open({
        filename: this.pool[0].config.filename,
        driver: sqlite3.Database
      });
      this.pool.push(db);
      this.inUse.add(db);
      return db;
    }

    // Wait for connection to become available
    return new Promise((resolve) => {
      const interval = setInterval(() => {
        if (this.available.length > 0) {
          clearInterval(interval);
          const db = this.available.pop()!;
          this.inUse.add(db);
          resolve(db);
        }
      }, 100);
    });
  }

  release(db: Database) {
    this.inUse.delete(db);
    this.available.push(db);
  }

  async close() {
    for (const db of this.pool) {
      await db.close();
    }
    this.pool = [];
    this.available = [];
    this.inUse.clear();
  }
}

// Usage in tool handler
const pool = new ConnectionPool();
await pool.initialize('./data/metrics.db');

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const db = await pool.acquire();
  try {
    const result = await db.get('SELECT * FROM metrics');
    return { content: [{ type: 'text', text: JSON.stringify(result) }] };
  } finally {
    pool.release(db);
  }
});

Request Timeout Management

class TimeoutManager {
  async withTimeout<T>(
    promise: Promise<T>,
    timeoutMs: number,
    operation: string
  ): Promise<T> {
    const timeout = new Promise<never>((_, reject) => {
      setTimeout(() => {
        reject(new Error(`${operation} timed out after ${timeoutMs}ms`));
      }, timeoutMs);
    });

    return Promise.race([promise, timeout]);
  }
}

const timeout = new TimeoutManager();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  try {
    const result = await timeout.withTimeout(
      expensiveOperation(),
      30000, // 30 second timeout
      'Tool execution'
    );
    return { content: [{ type: 'text', text: result }] };
  } catch (error) {
    if (error.message.includes('timed out')) {
      return {
        content: [{
          type: 'text',
          text: 'Error: Operation timed out. Please try again.'
        }],
        isError: true
      };
    }
    throw error;
  }
});

Error Handling & Recovery

Graceful Degradation

interface ToolResult {
  content: Array<{ type: string; text: string }>;
  isError?: boolean;
  fallback?: boolean;
}

class GracefulDegradation {
  async executeWithFallback(
    primary: () => Promise<string>,
    fallback: () => Promise<string>
  ): Promise<ToolResult> {
    try {
      const result = await primary();
      return {
        content: [{ type: 'text', text: result }]
      };
    } catch (error) {
      console.warn('Primary operation failed, using fallback:', error);

      try {
        const result = await fallback();
        return {
          content: [{
            type: 'text',
            text: `⚠️ Primary method failed. Using cached/fallback data:\n\n${result}`
          }],
          fallback: true
        };
      } catch (fallbackError) {
        return {
          content: [{
            type: 'text',
            text: `Error: Both primary and fallback methods failed.\nPrimary: ${error.message}\nFallback: ${fallbackError.message}`
          }],
          isError: true
        };
      }
    }
  }
}

// Example: API with cache fallback
const degradation = new GracefulDegradation();
const cache = new Map<string, { data: any; timestamp: number }>();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'fetch-data') {
    return await degradation.executeWithFallback(
      // Primary: Fetch from API
      async () => {
        const response = await fetch('https://api.example.com/data');
        const data = await response.json();
        cache.set('latest', { data, timestamp: Date.now() });
        return JSON.stringify(data);
      },
      // Fallback: Use cached data
      async () => {
        const cached = cache.get('latest');
        if (!cached) throw new Error('No cache available');

        const age = Date.now() - cached.timestamp;
        return `${JSON.stringify(cached.data)}\n\n(Cached ${Math.floor(age / 1000)}s ago)`;
      }
    );
  }
});

Circuit Breaker Pattern

class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failures = 0;
  private lastFailure = 0;
  private successes = 0;

  constructor(
    private threshold = 5,
    private timeout = 60000,
    private halfOpenAttempts = 3
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.timeout) {
        console.log('Circuit breaker: Transitioning to half-open');
        this.state = 'half-open';
        this.successes = 0;
      } else {
        throw new Error('Circuit breaker is OPEN - service unavailable');
      }
    }

    try {
      const result = await fn();

      if (this.state === 'half-open') {
        this.successes++;
        if (this.successes >= this.halfOpenAttempts) {
          console.log('Circuit breaker: Closing (recovered)');
          this.state = 'closed';
          this.failures = 0;
        }
      }

      return result;
    } catch (error) {
      this.failures++;
      this.lastFailure = Date.now();

      if (this.state === 'half-open') {
        console.log('Circuit breaker: Re-opening (recovery failed)');
        this.state = 'open';
      } else if (this.failures >= this.threshold) {
        console.log(`Circuit breaker: Opening (${this.failures} failures)`);
        this.state = 'open';
      }

      throw error;
    }
  }

  getState() {
    return {
      state: this.state,
      failures: this.failures,
      lastFailure: this.lastFailure
    };
  }
}

// Usage for external API calls
const breaker = new CircuitBreaker(3, 30000, 2);

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  try {
    const result = await breaker.execute(async () => {
      const response = await fetch('https://external-api.com/data');
      return await response.json();
    });

    return { content: [{ type: 'text', text: JSON.stringify(result) }] };
  } catch (error) {
    if (error.message.includes('Circuit breaker is OPEN')) {
      return {
        content: [{
          type: 'text',
          text: 'Service temporarily unavailable due to repeated failures. Please try again later.'
        }],
        isError: true
      };
    }
    throw error;
  }
});

Monitoring & Observability

Metrics Collection

// src/metrics.ts
interface Metrics {
  toolCalls: Map<string, number>;
  errors: Map<string, number>;
  latencies: Map<string, number[]>;
  lastUpdated: number;
}

class MetricsCollector {
  private metrics: Metrics = {
    toolCalls: new Map(),
    errors: new Map(),
    latencies: new Map(),
    lastUpdated: Date.now()
  };

  recordToolCall(toolName: string, latencyMs: number, error?: Error) {
    // Increment call count
    const calls = this.metrics.toolCalls.get(toolName) || 0;
    this.metrics.toolCalls.set(toolName, calls + 1);

    // Record latency
    const latencies = this.metrics.latencies.get(toolName) || [];
    latencies.push(latencyMs);
    this.metrics.latencies.set(toolName, latencies);

    // Record error
    if (error) {
      const errors = this.metrics.errors.get(toolName) || 0;
      this.metrics.errors.set(toolName, errors + 1);
    }

    this.metrics.lastUpdated = Date.now();
  }

  getMetrics() {
    const summary = Array.from(this.metrics.toolCalls.entries()).map(([tool, calls]) => {
      const errors = this.metrics.errors.get(tool) || 0;
      const latencies = this.metrics.latencies.get(tool) || [];
      const avgLatency = latencies.reduce((a, b) => a + b, 0) / latencies.length;
      const errorRate = (errors / calls) * 100;

      return {
        tool,
        calls,
        errors,
        errorRate: errorRate.toFixed(2) + '%',
        avgLatency: avgLatency.toFixed(0) + 'ms',
        p95Latency: this.percentile(latencies, 95).toFixed(0) + 'ms'
      };
    });

    return summary;
  }

  private percentile(values: number[], p: number): number {
    const sorted = values.slice().sort((a, b) => a - b);
    const index = Math.ceil(sorted.length * (p / 100)) - 1;
    return sorted[index] || 0;
  }
}

// Wrap tool execution with metrics
const metrics = new MetricsCollector();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const startTime = Date.now();
  const toolName = request.params.name;

  try {
    const result = await executeTool(toolName, request.params.arguments);
    const latency = Date.now() - startTime;
    metrics.recordToolCall(toolName, latency);

    return result;
  } catch (error) {
    const latency = Date.now() - startTime;
    metrics.recordToolCall(toolName, latency, error);
    throw error;
  }
});

// Add metrics tool
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'get-metrics',
      description: 'Get MCP server performance metrics',
      inputSchema: { type: 'object', properties: {} }
    }
  ]
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'get-metrics') {
    const summary = metrics.getMetrics();
    return {
      content: [{
        type: 'text',
        text: '# MCP Server Metrics\n\n' + JSON.stringify(summary, null, 2)
      }]
    };
  }
});

Production Deployment

Docker Container

# Dockerfile
FROM node:22-alpine

WORKDIR /app

# Install dependencies
COPY package.json pnpm-lock.yaml ./
RUN npm install -g pnpm && pnpm install --frozen-lockfile

# Copy source
COPY . .

# Build TypeScript
RUN pnpm build

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD node -e "require('./dist/health.js').check()"

# Run server
CMD ["node", "dist/index.js"]

Process Manager (PM2)

// ecosystem.config.js
module.exports = {
  apps: [{
    name: 'mcp-server',
    script: './dist/index.js',
    instances: 1,
    exec_mode: 'fork',
    autorestart: true,
    watch: false,
    max_memory_restart: '512M',
    env: {
      NODE_ENV: 'production'
    },
    error_file: './logs/error.log',
    out_file: './logs/out.log',
    log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
    merge_logs: true,
    min_uptime: '10s',
    max_restarts: 10
  }]
};

Production Examples

Example 1: Conversational API Debugger (MCP Plugin)

// Real-world plugin: conversational-api-debugger
// Handles API testing with health monitoring and circuit breakers

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { CircuitBreaker } from './circuit-breaker.js';
import { MetricsCollector } from './metrics.js';

const server = new Server({ name: 'api-debugger', version: '1.0.0' });
const breaker = new CircuitBreaker(3, 30000);
const metrics = new MetricsCollector();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'test-api') {
    const startTime = Date.now();
    const { url, method, headers } = request.params.arguments;

    try {
      const result = await breaker.execute(async () => {
        const response = await fetch(url, {
          method,
          headers: JSON.parse(headers),
          timeout: 10000
        });

        return {
          status: response.status,
          statusText: response.statusText,
          headers: Object.fromEntries(response.headers),
          body: await response.text()
        };
      });

      const latency = Date.now() - startTime;
      metrics.recordToolCall('test-api', latency);

      return {
        content: [{
          type: 'text',
          text: `✓ API Response (${latency}ms)\n\n${JSON.stringify(result, null, 2)}`
        }]
      };
    } catch (error) {
      const latency = Date.now() - startTime;
      metrics.recordToolCall('test-api', latency, error);

      if (error.message.includes('Circuit breaker is OPEN')) {
        return {
          content: [{
            type: 'text',
            text: `⚠️ API temporarily unavailable (circuit breaker triggered)\n\nThe API has failed ${breaker.getState().failures} times. Waiting 30s before retry.`
          }],
          isError: true
        };
      }

      return {
        content: [{
          type: 'text',
          text: `❌ API Error (${latency}ms)\n\n${error.message}`
        }],
        isError: true
      };
    }
  }
});

// Start server
const transport = new StdioServerTransport();
await server.connect(transport);

Performance Metrics:

  • Average latency: 850ms (API calls)
  • Circuit breaker trips: 2% of requests (external API failures)
  • Uptime: 99.7% (7 restarts in 30 days)
  • Memory usage: 45MB average, 120MB peak

Example 2: Project Health Auditor with Fallback

// Real-world plugin: project-health-auditor
// Scans codebases with graceful degradation for missing dependencies

const degradation = new GracefulDegradation();
const cache = new Map<string, { data: any; timestamp: number }>();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'audit-project') {
    const { projectPath } = request.params.arguments;

    return await degradation.executeWithFallback(
      // Primary: Full AST analysis
      async () => {
        const ast = await parseProjectAST(projectPath);
        const issues = await analyzeAST(ast);
        const result = {
          method: 'full-ast-analysis',
          issues: issues.length,
          details: issues
        };

        cache.set(projectPath, { data: result, timestamp: Date.now() });
        return JSON.stringify(result, null, 2);
      },
      // Fallback: Simple regex scan
      async () => {
        const cached = cache.get(projectPath);

        if (cached && Date.now() - cached.timestamp < 3600000) {
          // Use cache if less than 1 hour old
          return `${JSON.stringify(cached.data, null, 2)}\n\n(Cached ${Math.floor((Date.now() - cached.timestamp) / 1000)}s ago)`;
        }

        // Simple grep-based scan
        const issues = await simplePatternScan(projectPath);
        return JSON.stringify({
          method: 'pattern-scan-fallback',
          issues: issues.length,
          details: issues,
          note: 'Full AST analysis unavailable, using pattern matching'
        }, null, 2);
      }
    );
  }
});

Fallback Statistics:

  • Primary method success: 94%
  • Fallback triggered: 6% (missing dependencies, large codebases)
  • Cache hit rate: 78%
  • Average scan time: Primary 12s, Fallback 3s

Best Practices

DO ✅

  1. Implement comprehensive health checks

    // Check all critical dependencies
    const healthChecker = new HealthChecker();
    setInterval(async () => {
      const status = await healthChecker.check();
      if (!status.healthy) {
        console.error('Health check failed:', status);
      }
    }, 30000); // Every 30 seconds
  2. Use connection pooling for all database access

    // Avoid connection exhaustion
    const pool = new ConnectionPool();
    await pool.initialize('./data.db');
    
    // Always release connections
    const db = await pool.acquire();
    try {
      await db.run('INSERT INTO logs VALUES (?)');
    } finally {
      pool.release(db); // Critical!
    }
  3. Set aggressive timeouts on all external calls

    const timeout = new TimeoutManager();
    const result = await timeout.withTimeout(
      fetch('https://api.example.com'),
      5000, // 5 second max
      'External API call'
    );
  4. Collect granular metrics for debugging

    const metrics = new MetricsCollector();
    // Track every tool call
    metrics.recordToolCall(toolName, latency, error);
    
    // Export for analysis
    const summary = metrics.getMetrics();
    console.log(JSON.stringify(summary));
  5. Always provide fallback behavior

    // Never fail completely
    return await degradation.executeWithFallback(
      () => primaryMethod(),
      () => cachedOrSimplifiedMethod()
    );
  6. Use circuit breakers for external dependencies

    const breaker = new CircuitBreaker(3, 30000);
    // Prevent cascade failures
    const result = await breaker.execute(() => callExternalAPI());
  7. Log stderr separately from stdout

    // MCP uses stdout for protocol, stderr for logs
    console.error('Error occurred:', error); // ✅ stderr
    console.log('Result:', data);            // ❌ breaks MCP
  8. Implement structured logging

    const logger = {
      error: (msg: string, meta?: any) => {
        console.error(JSON.stringify({ level: 'error', message: msg, ...meta }));
      }
    };

DON'T ❌

  1. Don't write to stdout except MCP responses

    // ❌ Breaks MCP protocol
    console.log('Debug message');
    
    // ✅ Use stderr
    console.error('Debug message');
  2. Don't hold database connections indefinitely

    // ❌ Connection leak
    const db = await pool.acquire();
    await db.get('SELECT * FROM data');
    // Never released!
    
    // ✅ Always use try/finally
    const db = await pool.acquire();
    try {
      await db.get('SELECT * FROM data');
    } finally {
      pool.release(db);
    }
  3. Don't ignore timeout errors

    // ❌ Silent failure
    try {
      await expensiveOperation();
    } catch (error) {
      // Error swallowed
    }
    
    // ✅ Log and return error
    catch (error) {
      console.error('Operation failed:', error);
      return { content: [{ type: 'text', text: 'Error: ' + error.message }], isError: true };
    }
  4. Don't skip health monitoring in production

    // ❌ No visibility
    await server.connect(transport);
    
    // ✅ Add health check tool
    server.setRequestHandler(CallToolRequestSchema, async (request) => {
      if (request.params.name === 'health-check') {
        return { content: [{ type: 'text', text: JSON.stringify(await healthChecker.check()) }] };
      }
    });
  5. Don't use synchronous file I/O

    // ❌ Blocks event loop
    const data = fs.readFileSync('./data.json');
    
    // ✅ Async
    const data = await fs.promises.readFile('./data.json');
  6. Don't restart on every error

    // ❌ Restart loop
    process.on('uncaughtException', () => {
      process.exit(1); // PM2 restarts immediately
    });
    
    // ✅ Circuit breaker + graceful degradation
    try {
      await operation();
    } catch (error) {
      await breaker.execute(() => fallback());
    }

Tools & Resources

MCP Development

MCP SDK:

npm install @modelcontextprotocol/sdk

Analytics & Monitoring

Analytics Daemon (from this marketplace):

cd packages/analytics-daemon
pnpm start
# WebSocket: ws://localhost:3456
# HTTP API: http://localhost:3333/api/status

Monitor MCP Server Events:

const ws = new WebSocket('ws://localhost:3456');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'plugin.activation') {
    console.log(`MCP server ${data.pluginName} activated`);
  }
};

Plugins with MCP Servers

From this marketplace (258 plugins):

  • project-health-auditor - Codebase scanning with health checks
  • conversational-api-debugger - API testing with circuit breakers
  • beads-mcp - Beads task tracker MCP server
  • creator-studio-pack - Multi-agent MCP orchestration

External Tools


Summary

Key Takeaways:

  1. Health checks are mandatory - Implement internal health endpoints and watchdog processes
  2. Connection pooling prevents leaks - Always use pools for database connections
  3. Circuit breakers prevent cascades - Isolate failures from external dependencies
  4. Graceful degradation maintains uptime - Always provide fallback behavior
  5. Metrics enable debugging - Track latency, errors, and throughput for every tool
  6. Timeouts are non-negotiable - Every external call must have aggressive timeouts
  7. Stdio is sacred - Only use stdout for MCP protocol, stderr for logs

Production Readiness Checklist:

  • Health check endpoint implemented
  • Connection pooling configured (database, external APIs)
  • Request timeouts set (<30s for all operations)
  • Circuit breakers on external dependencies
  • Fallback behavior for critical tools
  • Metrics collection active
  • Structured logging to stderr (not stdout)
  • Watchdog/PM2 process monitoring
  • Docker container with HEALTHCHECK
  • Integration with analytics daemon

Last Updated: 2025-12-24 Author: Jeremy Longshore Related Playbooks: Multi-Agent Rate Limits, Cost Caps & Budget Management