Adding MCP Evals with Opik by czajkub · Pull Request #44 · the-momentum/apple-health-mcp-server

czajkub · 2025-09-30T13:58:41Z

This pull request adds evaluations and tests for this server using Opik. The tests are run on a prepared dataset and include:

Unit tests for database calls for each tool with DuckDB (in query_tests.py)
(redundant) e2e test template with llm-as-a-judge and checking tool calls (in e2e_tests.py)
experiments using opik (in opik/tool_calls.py): evaluating answers based on a set of questions in Opik and judging them on metrics like hallucination or answer relevancy

As an addition, unit tests and opik tests are added to Github Actions, however an Opik API key and workspace name need to be set in secrets

Opik experiment result example:

Actions results:

It is also possible to show the results of individual tests instead of averages in the pipeline (for Opik)

also change parquetpath to path and add .parquet suffix to the path in config

…rver

…h-mcp-server into tableschemas

scripts/duckdb_importer.py

…cp-server into opiktests

KaliszS

We need instructions to run those tests locally without actions. So in order to achieve taht we need to change .env.example and README.

…cp-server into opiktests

czajkub · 2025-10-08T13:11:06Z

pyproject.toml

 dev = [
    "fastapi>=0.116.2",
+    "opik>=1.8.56",
+    "pydantic-ai>=1.0.10",


pydantic-ai is for the agent used in opik tests

czajkub · 2025-10-08T13:11:23Z

tests/agent.py

generated from (old) template

czajkub · 2025-10-08T13:11:54Z

tests/duckdb.example

generated from gist

czajkub added 30 commits September 12, 2025 10:51

added sum to trend data

8a7b919

added device grouping to duckdb for test

7795bf7

added device as well to query

a70831c

ch and duck device/interval grouping

74c5428

docstring tweak

954654c

docstring improving

1afd7fe

remove debug code

0e42919

standardise errors and change trend docstrings

c3cbcb6

add localhost support for parquet

18e16d0

also change parquetpath to path and add .parquet suffix to the path in config

Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…

3beeb1f

…rver

Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…

32e6dda

…rver

remove debug from client

fab22eb

unterminated string

b619143

remove debug and add fileserver example

1c1678f

Update README.md

bd0bb50

add fastapi to dev group

664bbad

Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…

f0a90b1

…rver

Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…

d6b01f8

…rver

Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…

84b6428

…rver

Merge branch 'tableschemas' of https://github.com/czajkub/apple-healt…

4204fc6

…h-mcp-server into tableschemas

workouts and stats added as pq files

60e64bb

concat check

e73004b

asfas

4fa826d

import fix

d60551b

is nto noene

fabea70

tests

554d9a5

order by sourcename + add unit tests for all queries from duckdb

5b92857

linting i think + change textvalue case + all unit tests added

5fa274e

name fix

504576b

stupid linter

73c40ca

czajkub added 2 commits October 1, 2025 15:50

Merge branch 'the-momentum:main' into opiktests

003cb32

Create tests.md

c5a5924

czajkub commented Oct 2, 2025

View reviewed changes

scripts/duckdb_importer.py Show resolved Hide resolved

czajkub added 8 commits October 2, 2025 10:44

rollback unstable changes

f8afe7e

Merge branch 'opiktests' of https://github.com/czajkub/apple-health-m…

b3afd7c

…cp-server into opiktests

remove redundant tests

674d75d

lint

920a617

Update tests.md

2f38793

Update tests.md

a83535d

test improvement

4402014

Merge branch 'opiktests' of https://github.com/czajkub/apple-health-m…

08c1286

…cp-server into opiktests

KaliszS reviewed Oct 7, 2025

View reviewed changes

KaliszS assigned czajkub Oct 7, 2025

czajkub added 12 commits October 7, 2025 21:35

add config for opik in .env

a8de0be

Merge branch 'the-momentum:main' into opiktests

1116778

Update README.md

47452da

Update tests.md

b0674e7

lint

2b8185e

Merge branch 'opiktests' of https://github.com/czajkub/apple-health-m…

883b51b

…cp-server into opiktests

Merge branch 'main' into opiktests

082bab2

add test for workouts and tweak current tests

06e5f92

changed example file

c1b72a8

changed workflow example

fee3925

changed pytest path

08b3c4e

linter

afe214d

czajkub commented Oct 8, 2025

View reviewed changes

tests/agent.py

Copy link

Collaborator Author

czajkub Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated from (old) template

czajkub commented Oct 8, 2025

View reviewed changes

tests/duckdb.example

Copy link

Collaborator Author

czajkub Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated from gist

czajkub added 2 commits October 8, 2025 15:16

change composite action default value

39e9eee

Update tests.md

2cacf9f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding MCP Evals with Opik#44

Adding MCP Evals with Opik#44
czajkub wants to merge 89 commits intothe-momentum:mainfrom
czajkub:opiktests

czajkub commented Sep 30, 2025

Uh oh!

Uh oh!

KaliszS left a comment

Uh oh!

czajkub Oct 8, 2025

Uh oh!

czajkub Oct 8, 2025

Uh oh!

czajkub Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

czajkub commented Sep 30, 2025

Uh oh!

Uh oh!

KaliszS left a comment

Choose a reason for hiding this comment

Uh oh!

czajkub Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

czajkub Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

czajkub Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants