Skip to content

Conversation

andrewginns
Copy link
Owner

@andrewginns andrewginns commented May 22, 2025

This PR introduces structured levels of difficulty for mermaid diagram evaluation cases, expands and refines the evaluation logic, and adds a new standalone MCP server tool for mermaid diagram validation.

Problem:

Previously, all mermaid diagram evaluation cases were at a single (implicit) difficulty level, and validation logic was embedded and tightly coupled in one place. This made it hard to test more nuanced LLM capabilities, extend evaluation coverage, or debug diagram validation issues. Additionally, mermaid validation was dependent on subprocess logic and lacked a dedicated, reusable server tool.

Solution:

  • Added explicit “easy”, “medium”, and “hard” levels for invalid mermaid diagram cases.
  • Incorporated a new mcp_servers/mermaid_validator.py MCP server tool, leveraging mermaid-cli via npx and structured logging with loguru.
  • Refactored evaluation code to use the new server tool, support more robust string cleaning, and improve logging.
  • Updated dependencies in pyproject.toml and uv.lock to include loguru for advanced logging.
  • Improved test case metadata and expanded the evaluation rubric.

Unlocks:

  • Enables targeted and incremental evaluation of LLMs’ ability to repair mermaid diagrams of varying complexity.
  • Facilitates debugging and extension of mermaid diagram validation as a reusable, standalone service.
  • Allows for richer analysis and troubleshooting with enhanced logging and separation of concerns.

Detailed breakdown of changes:

Provide a detailed description of what you changed.

  • agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py

    • Refactored to add “easy”, “medium”, and “hard” difficulty levels for invalid diagram test cases.
    • Switched from subprocess-based validation to using the new MCP server’s validation tool.
    • Improved diagram string cleaning and validation logic.
    • Enhanced logging for evaluation steps.
  • agents_mcp_usage/multi_mcp/mermaid_diagrams.py

    • Added new invalid_mermaid_diagram_medium definition.
    • Clarified and reordered diagram definitions.
  • mcp_servers/mermaid_validator.py (new)

    • Implements an MCP server exposing mermaid diagram validation as a tool.
    • Uses mermaid-cli via npx, with robust cleanup, error handling, and loguru-based logging.
    • Includes both tool and prompt endpoints for validation, as well as a resource for an example diagram.
  • pyproject.toml and uv.lock

    • Added loguru as a dependency for structured logging.
    • Updated lockfile to reflect new and updated dependencies.

…on for consistent invocation and custom MCP decorated functions. Add logging
@andrewginns andrewginns merged commit 3eb1d03 into main May 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant