Swap MCP server to python mermaid validator and add levels of difficulty #4

andrewginns · 2025-05-22T18:38:57Z

Problem:

The existing MCP usage examples relied on external Node.js-based mermaid validation servers and lacked comprehensive evaluation complexity levels for testing agents' ability to handle mermaid diagram validation tasks of varying difficulty.

Solution:

This PR addresses these issues by:

Replacing external mermaid validator with custom Python implementation: Implemented a native Python MCP server (mcp_servers/mermaid_validator.py) using mermaid-cli for diagram validation, providing better integration and debugging capabilities.
Adding structured evaluation difficulty levels: Created three distinct levels of invalid mermaid diagrams (easy, medium, hard) with progressively complex syntax errors to thoroughly test agent capabilities.
Improving observability and cleanup: Enhanced tracing with Logfire graceful shutdown and better logging throughout the MCP interaction flows.
Reorganising server structure: Moved MCP servers to dedicated mcp_servers/ directory for better organisation.

Unlocks:

Enhanced evaluation coverage: Agents can now be tested against mermaid validation tasks of varying complexity levels
Better debugging and monitoring: Custom Python implementation provides detailed logging and error reporting
Improved local development experience: Self-contained Python servers eliminate external dependencies
Structured evaluation framework: Multiple difficulty levels enable comprehensive assessment of agent capabilities

Detailed breakdown of changes:

Core Infrastructure Changes:

12:257:mcp_servers/mermaid_validator.py - New Python MCP server for mermaid diagram validation using mermaid-cli with comprehensive logging and error handling
20:53:README.md - Updated documentation to reflect new Python server structure and mermaid validator
example_server.py → mcp_servers/example_server.py - Moved example server to dedicated directory

Multi-Level Evaluation System:

1:257:agents_mcp_usage/multi_mcp/mermaid_diagrams.py - Added three difficulty levels of invalid mermaid diagrams:
- Easy: Simple syntax errors (e.g., undefined node references like MCPs)
- Medium: Structural issues with subgraph organisation
- Hard: Complex parsing errors and circular reference issues
158:199:agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py - Enhanced evaluation dataset with three test cases covering all difficulty levels

Agent Implementation Updates:

36:132:agents_mcp_usage/multi_mcp/multi_mcp_use/adk_mcp.py - Updated to use new Python mermaid validator and added graceful Logfire shutdown
agents_mcp_usage/multi_mcp/multi_mcp_use/pydantic_mcp.py - Similar updates for Pydantic-AI implementation
Updated all basic MCP examples to reference new server locations

Documentation and Examples:

Updated README files across all modules to reflect new Python server structure
Enhanced documentation with clearer server organisation and usage examples

This enhancement provides a robust foundation for testing and evaluating agent performance across different complexity levels while maintaining better control over the validation infrastructure.

…ructure and usage

andrewginns added 3 commits May 22, 2025 17:27

chore: Swap MCP server to python mermaid validator

00ddb9a

chore: Gracefully exit logfire trace

30c14c1

chore: Update README and examples to reflect new Python MCP server st…

5620aa6

…ructure and usage

andrewginns merged commit 6a48db8 into main May 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Swap MCP server to python mermaid validator and add levels of difficulty #4

Swap MCP server to python mermaid validator and add levels of difficulty #4

Uh oh!

andrewginns commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Swap MCP server to python mermaid validator and add levels of difficulty #4

Swap MCP server to python mermaid validator and add levels of difficulty #4

Uh oh!

Conversation

andrewginns commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Unlocks:

Detailed breakdown of changes:

Uh oh!

Uh oh!

Uh oh!

andrewginns commented May 22, 2025 •

edited

Loading