Skip to content

Conversation

andrewginns
Copy link
Owner

@andrewginns andrewginns commented May 22, 2025

Problem:

The existing MCP usage examples relied on external Node.js-based mermaid validation servers and lacked comprehensive evaluation complexity levels for testing agents' ability to handle mermaid diagram validation tasks of varying difficulty.

Solution:

This PR addresses these issues by:

  1. Replacing external mermaid validator with custom Python implementation: Implemented a native Python MCP server (mcp_servers/mermaid_validator.py) using mermaid-cli for diagram validation, providing better integration and debugging capabilities.

  2. Adding structured evaluation difficulty levels: Created three distinct levels of invalid mermaid diagrams (easy, medium, hard) with progressively complex syntax errors to thoroughly test agent capabilities.

  3. Improving observability and cleanup: Enhanced tracing with Logfire graceful shutdown and better logging throughout the MCP interaction flows.

  4. Reorganising server structure: Moved MCP servers to dedicated mcp_servers/ directory for better organisation.

Unlocks:

  • Enhanced evaluation coverage: Agents can now be tested against mermaid validation tasks of varying complexity levels
  • Better debugging and monitoring: Custom Python implementation provides detailed logging and error reporting
  • Improved local development experience: Self-contained Python servers eliminate external dependencies
  • Structured evaluation framework: Multiple difficulty levels enable comprehensive assessment of agent capabilities

Detailed breakdown of changes:

Core Infrastructure Changes:

  • 12:257:mcp_servers/mermaid_validator.py - New Python MCP server for mermaid diagram validation using mermaid-cli with comprehensive logging and error handling
  • 20:53:README.md - Updated documentation to reflect new Python server structure and mermaid validator
  • example_server.pymcp_servers/example_server.py - Moved example server to dedicated directory

Multi-Level Evaluation System:

  • 1:257:agents_mcp_usage/multi_mcp/mermaid_diagrams.py - Added three difficulty levels of invalid mermaid diagrams:
    • Easy: Simple syntax errors (e.g., undefined node references like MCPs)
    • Medium: Structural issues with subgraph organisation
    • Hard: Complex parsing errors and circular reference issues
  • 158:199:agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py - Enhanced evaluation dataset with three test cases covering all difficulty levels

Agent Implementation Updates:

  • 36:132:agents_mcp_usage/multi_mcp/multi_mcp_use/adk_mcp.py - Updated to use new Python mermaid validator and added graceful Logfire shutdown
  • agents_mcp_usage/multi_mcp/multi_mcp_use/pydantic_mcp.py - Similar updates for Pydantic-AI implementation
  • Updated all basic MCP examples to reference new server locations

Documentation and Examples:

  • Updated README files across all modules to reflect new Python server structure
  • Enhanced documentation with clearer server organisation and usage examples

This enhancement provides a robust foundation for testing and evaluating agent performance across different complexity levels while maintaining better control over the validation infrastructure.

@andrewginns andrewginns merged commit 6a48db8 into main May 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant