mcp-data-check

Evaluate MCP server accuracy against known questions and answers.

Installation

pip install mcp-data-check

Or install from source:

pip install -e .

Usage

Python API

from mcp_data_check import run_evaluation

results = run_evaluation(
    questions_filepath="questions.csv",
    api_key="sk-ant-...",
    server_url="https://mcp.example.com/sse"
)

print(f"Pass rate: {results['summary']['pass_rate']:.1%}")
print(f"Passed: {results['summary']['passed']}/{results['summary']['total']}")

Command Line

mcp-data-check https://mcp.example.com/sse -q questions.csv -k YOUR_API_KEY

Options:

-q, --questions: Path to questions CSV file (required)
-k, --api-key: Anthropic API key (defaults to ANTHROPIC_API_KEY env var)
-o, --output: Output directory for results (default: ./results)
-m, --model: Claude model to use (default: claude-sonnet-4-20250514)
-n, --server-name: Name for the MCP server (default: mcp-server)
-v, --verbose: Print detailed progress

Questions CSV Format

The questions CSV file must have three columns:

Column	Description
`question`	The question to ask the MCP server
`expected_answer`	The expected answer to compare against
`eval_type`	Evaluation method: `numeric`, `string`, or `llm_judge`

Example:

question,expected_answer,eval_type
How many grants were awarded in 2023?,1234,numeric
What organization received the most funding?,NIH,string
Explain the grant distribution,Most grants went to research institutions...,llm_judge

Evaluation Types

numeric: Extracts numbers from responses and compares with 5% tolerance
string: Checks if expected string appears in response (case-insensitive)
llm_judge: Uses Claude to semantically evaluate if the response is correct

Return Value

The run_evaluation function returns a dictionary:

{
    "summary": {
        "total": 10,
        "passed": 8,
        "failed": 2,
        "pass_rate": 0.8,
        "by_eval_type": {
            "numeric": {"total": 5, "passed": 4},
            "string": {"total": 3, "passed": 3},
            "llm_judge": {"total": 2, "passed": 1}
        }
    },
    "results": [
        {
            "question": "...",
            "expected_answer": "...",
            "eval_type": "numeric",
            "model_response": "...",
            "passed": True,
            "details": {...},
            "error": None,
            "time_to_answer": 2.35,
            "tools_called": [
                {
                    "tool_name": "get_grants",
                    "server_name": "mcp-server",
                    "input": {"year": 2023}
                }
            ]
        },
        ...
    ],
    "metadata": {
        "server_url": "https://mcp.example.com/sse",
        "model": "claude-sonnet-4-20250514",
        "timestamp": "20250127_143022"
    }
}

Result Fields

Each result in the results array contains:

Field	Description
`question`	The original question asked
`expected_answer`	The expected answer from the CSV
`eval_type`	Evaluation method used
`model_response`	The model's full response text
`passed`	Whether the evaluation passed
`details`	Additional evaluation details
`error`	Error message if the evaluation failed
`time_to_answer`	Response time in seconds for the MCP server call
`tools_called`	List of MCP tools invoked during the response

The tools_called array contains objects with:

tool_name: Name of the MCP tool called
server_name: Name of the MCP server that provided the tool
input: Parameters passed to the tool

Requirements

Python 3.10+
Anthropic API key with MCP beta access

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
src/mcp_data_check		src/mcp_data_check
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-data-check

Installation

Usage

Python API

Command Line

Questions CSV Format

Evaluation Types

Return Value

Result Fields

Requirements

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-data-check

Installation

Usage

Python API

Command Line

Questions CSV Format

Evaluation Types

Return Value

Result Fields

Requirements

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages