Swap MCP server to python mermaid validator and add levels of difficulty #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
The existing MCP usage examples relied on external Node.js-based mermaid validation servers and lacked comprehensive evaluation complexity levels for testing agents' ability to handle mermaid diagram validation tasks of varying difficulty.
Solution:
This PR addresses these issues by:
Replacing external mermaid validator with custom Python implementation: Implemented a native Python MCP server (
mcp_servers/mermaid_validator.py
) using mermaid-cli for diagram validation, providing better integration and debugging capabilities.Adding structured evaluation difficulty levels: Created three distinct levels of invalid mermaid diagrams (easy, medium, hard) with progressively complex syntax errors to thoroughly test agent capabilities.
Improving observability and cleanup: Enhanced tracing with Logfire graceful shutdown and better logging throughout the MCP interaction flows.
Reorganising server structure: Moved MCP servers to dedicated
mcp_servers/
directory for better organisation.Unlocks:
Detailed breakdown of changes:
Core Infrastructure Changes:
12:257:mcp_servers/mermaid_validator.py
- New Python MCP server for mermaid diagram validation using mermaid-cli with comprehensive logging and error handling20:53:README.md
- Updated documentation to reflect new Python server structure and mermaid validatorexample_server.py
→mcp_servers/example_server.py
- Moved example server to dedicated directoryMulti-Level Evaluation System:
1:257:agents_mcp_usage/multi_mcp/mermaid_diagrams.py
- Added three difficulty levels of invalid mermaid diagrams:MCPs
)158:199:agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py
- Enhanced evaluation dataset with three test cases covering all difficulty levelsAgent Implementation Updates:
36:132:agents_mcp_usage/multi_mcp/multi_mcp_use/adk_mcp.py
- Updated to use new Python mermaid validator and added graceful Logfire shutdownagents_mcp_usage/multi_mcp/multi_mcp_use/pydantic_mcp.py
- Similar updates for Pydantic-AI implementationDocumentation and Examples:
This enhancement provides a robust foundation for testing and evaluating agent performance across different complexity levels while maintaining better control over the validation infrastructure.