Skip to content

feat: wire MCP server into CLI as evalhub mcp subcommand#99

Open
tarilabs wants to merge 2 commits intoeval-hub:mainfrom
tarilabs:tarilabs-20260327-mcpwithcli
Open

feat: wire MCP server into CLI as evalhub mcp subcommand#99
tarilabs wants to merge 2 commits intoeval-hub:mainfrom
tarilabs:tarilabs-20260327-mcpwithcli

Conversation

@tarilabs
Copy link
Copy Markdown
Member

@tarilabs tarilabs commented Mar 27, 2026

What and why

  • Add evalhub mcp subcommand reusing CLI profile/config for base-url, token, tenant
  • Remove standalone evalhub-mcp entry point and src/evalhub/mcp/__main__.py
  • MCP extra now depends on CLI extra (eval-hub-sdk[cli])
  • Lazy-import mcp package; show install hint when missing
  • Add tests for subcommand registration, config resolution, and CLI flag overrides

Type

  • feat
  • fix
  • docs
  • refactor / chore
  • test / ci

Testing

  • Tests added or updated
  • Tested manually

see thread for manual tests

Breaking changes

Summary by CodeRabbit

  • New Features

    • Added an mcp subcommand to the main CLI to start the MCP server using centralized configuration.
  • Bug Fixes & Improvements

    • Consolidated MCP installation under the eval-hub-sdk optional extra and removed the separate console script.
  • Tests

    • Added unit tests covering the MCP subcommand’s help, install guidance, and configuration precedence.

- Add `evalhub mcp` subcommand reusing CLI profile/config for base-url, token, tenant
- Remove standalone `evalhub-mcp` entry point and `src/evalhub/mcp/__main__.py`
- MCP extra now depends on CLI extra (`eval-hub-sdk[cli]`)
- Lazy-import `mcp` package; show install hint when missing
- Add tests for subcommand registration, config resolution, and CLI flag overrides

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f801452-7fb7-4ace-ba94-a8cad66943ca

📥 Commits

Reviewing files that changed from the base of the PR and between 21c74ad and dd7af69.

📒 Files selected for processing (1)
  • tests/unit/test_cli_mcp.py
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/test_cli_mcp.py

📝 Walkthrough

Walkthrough

Consolidates the EvalHub MCP server entry from a standalone evalhub-mcp module/console script into a new mcp subcommand on the main evalhub CLI, updates the mcp optional dependency to eval-hub-sdk[cli], removes the legacy __main__.py, and adds unit tests for the new CLI behavior.

Changes

Cohort / File(s) Summary
Package Configuration
pyproject.toml
Replaced eval-hub-sdk[core] and explicit click>=8.1.0 in the mcp optional deps with eval-hub-sdk[cli]. Removed the evalhub-mcp console script entry.
CLI Implementation
src/evalhub/cli/main.py
Added mcp subcommand to start MCP server over stdio, resolves config/profile values (base_url, token, tenant, insecure, timeout), dynamically imports mcp package and surfaces install guidance on missing dependency, constructs AsyncEvalHubClient, registers client, and runs server.
Legacy Entry Point Removal
src/evalhub/mcp/__main__.py
Removed the previous Click-based __main__ CLI entry (tenant/base-url/token handling and oc whoami -t fallback) and its helper for auth token resolution.
Tests
tests/unit/test_cli_mcp.py
Added unit tests for evalhub mcp: help text presence, missing mcp package error message with pip guidance, profile-based config resolution, and CLI-flag override behavior (patching asyncio.run, client, and server).

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as evalhub CLI
    participant Config as Config/Profile
    participant MCP as mcp package
    participant Client as AsyncEvalHubClient
    participant Server as MCP Server

    User->>CLI: evalhub mcp [--base-url] [--token] [--tenant]
    CLI->>Config: Load active profile (EVALHUB_CONFIG/EVALHUB_TENANT)
    Config-->>CLI: Profile values (base_url, token, tenant, insecure, timeout)
    CLI->>CLI: Resolve final params (CLI flags override profile)
    CLI->>MCP: import mcp
    alt mcp missing
        MCP-->>CLI: ImportError
        CLI-->>User: ClickException with pip install guidance
    else mcp present
        MCP-->>CLI: mcp loaded
        CLI->>Client: AsyncEvalHubClient(base_url, token, tenant, insecure, timeout)
        Client-->>CLI: client instance
        CLI->>Server: set_client(client)
        CLI->>Server: asyncio.run(mcp_server.run_stdio_async())
        Server-->>User: MCP server running on stdio
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mariusdanciu
  • ruivieira

Poem

🐰 I hopped from script to subcommand bright,
Config carrots shine in profile light,
I nibble flags that override the rest,
Now one neat CLI runs the MCP quest.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: wiring the MCP server into the CLI as a subcommand named 'mcp', which aligns with the changeset's primary objective.
Description check ✅ Passed The description covers all required template sections: what/why with concrete details, type checkboxes appropriately marked (feat and refactor/chore), testing confirmation, and breaking changes addressed (none).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/test_cli_mcp.py (1)

16-22: Environment variable cleanup is not isolated.

The fixture directly modifies os.environ, which could cause issues if tests are run in parallel or if an exception occurs before cleanup. Consider using CliRunner(env=...) or monkeypatch for safer isolation.

♻️ Safer alternative using monkeypatch
 `@pytest.fixture`()
-def config_file(tmp_path: Path) -> Iterator[Path]:
+def config_file(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Iterator[Path]:
     """Provide a temporary config file path and set EVALHUB_CONFIG."""
     path = tmp_path / "config.yaml"
-    os.environ["EVALHUB_CONFIG"] = str(path)
+    monkeypatch.setenv("EVALHUB_CONFIG", str(path))
     yield path
-    os.environ.pop("EVALHUB_CONFIG", None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_mcp.py` around lines 16 - 22, The fixture config_file
currently mutates os.environ directly which isn't isolated; update it to use
pytest's monkeypatch (add monkeypatch to the fixture signature) and call
monkeypatch.setenv("EVALHUB_CONFIG", str(path)) before yield so the environment
is automatically restored after each test, or alternatively when invoking the
CLI use Click's CliRunner(env={...}) for test-specific envs; keep the fixture
name config_file and the tmp_path usage but remove direct os.environ
modification and the manual os.environ.pop cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/evalhub/cli/main.py`:
- Around line 902-909: The created AsyncEvalHubClient is stored via
set_client(client) but never closed; either construct it with an async context
manager (use "async with AsyncEvalHubClient(...)" and call set_client inside
that block so the client is auto-closed when mcp_server.run_stdio_async()
returns) or ensure explicit shutdown by awaiting its async close method after
the server run (call await client.aclose() or asyncio.run(client.aclose()) in a
finally/shutdown hook). Update the code around AsyncEvalHubClient creation and
the call to mcp_server.run_stdio_async() to use one of these approaches
(referencing AsyncEvalHubClient, set_client, mcp_server.run_stdio_async, and the
client.aclose()/__aenter__/__aexit__ methods) so the underlying httpx client is
always cleaned up on exit.

In `@tests/unit/test_cli_mcp.py`:
- Around line 99-105: Add the missing assertion that verifies set_client was
invoked in the test_mcp_cli_flags_override_profile test: after the existing
mocks/assertions (including mock_client_cls.assert_called_once_with(...)) add a
call to assert mock_set_client.assert_called_once() to mirror
test_mcp_resolves_from_profile; locate the mock named mock_set_client in the
test function and assert it was called exactly once to ensure the client was set
during the CLI flags override scenario.

---

Nitpick comments:
In `@tests/unit/test_cli_mcp.py`:
- Around line 16-22: The fixture config_file currently mutates os.environ
directly which isn't isolated; update it to use pytest's monkeypatch (add
monkeypatch to the fixture signature) and call
monkeypatch.setenv("EVALHUB_CONFIG", str(path)) before yield so the environment
is automatically restored after each test, or alternatively when invoking the
CLI use Click's CliRunner(env={...}) for test-specific envs; keep the fixture
name config_file and the tmp_path usage but remove direct os.environ
modification and the manual os.environ.pop cleanup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6304f8a8-dca9-40d7-9547-2e62fdd6aaf7

📥 Commits

Reviewing files that changed from the base of the PR and between 6539586 and 21c74ad.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml
  • src/evalhub/cli/main.py
  • src/evalhub/mcp/__main__.py
  • tests/unit/test_cli_mcp.py
💤 Files with no reviewable changes (1)
  • src/evalhub/mcp/main.py

@tarilabs
Copy link
Copy Markdown
Member Author

LIVE DEMO that refactor (wire mcp into cli) works as intended

ensure authenticated with

oc whoami

for completeness:

# IFF error: You must be logged in to the server (Unauthorized)
oc login ...

ensure evalhub command is available:

evalhub  
Usage: evalhub [OPTIONS] COMMAND [ARGS]...

  EvalHub CLI - manage evaluations, providers, collections, and configuration.

Options:
  --version        Show the version and exit.
  --profile TEXT   Configuration profile to use (overrides active profile).
  --base-url TEXT  EvalHub server URL (overrides profile config).
  --token TEXT     Authentication token (overrides profile config).
  -v, --verbose    Enable verbose output (show SDK logs).
  -h, --help       Show this message and exit.

Commands:
  collections  Browse and manage benchmark collections.
  config       View and update CLI configuration.
  eval         Submit and manage evaluation jobs.
  health       Check health of the EvalHub service.
  mcp          Start the EvalHub MCP server (stdio transport).
  providers    List and inspect evaluation providers.
  version      Print version and build info.

you can notice mcp is a subcommand.

export EVALHUB_TOKEN=$(oc whoami -t)
npx @modelcontextprotocol/inspector

using:

command:
evalhub

arguments:
--base-url https://evalhub-opendatahub.apps.rosa.(REDACTED).openshiftapps.com mcp --tenant team-a

works as intended

image

if missing the auth token gets 401 in MCP inspector and on the server:

{"level":"error","ts":"2026-03-27T21:11:21.465Z","caller":"server/authentication.go:40","msg":"Request not authenticated","path":"/api/v1/evaluations/providers","method":"GET","stacktrace":"github.com/eval-hub/eval-hub/internal/eval_hub/server.WithAuthentication.func1\n\t/build/internal/eval_hub/server/authentication.go:40\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2322\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:3340\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:2109"}

otherwise works as intended:

{"level":"info","ts":"2026-03-27T21:11:57.947Z","caller":"server/authentication.go:23","msg":"Authenticating request","path":"/api/v1/evaluations/providers","method":"GET"}
{"level":"info","ts":"2026-03-27T21:11:57.962Z","caller":"auth/authorization.go:60","msg":"Authorizing","record":{"User":{"Name":"mmortari","UID":"e375f99f-9620-4cf1-9c18-d6054f0ffa3f","Groups":["cluster-admins","system:authenticated:oauth","system:authenticated"],"Extra":{"scopes.authorization.openshift.io":["user:full"]}},"Verb":"get","Namespace":"team-a","APIGroup":"trustyai.opendatahub.io","APIVersion":"","Resource":"providers","Subresource":"","Name":"","ResourceRequest":true,"Path":"","FieldSelectorRequirements":null,"FieldSelectorParsingErr":null,"LabelSelectorRequirements":null,"LabelSelectorParsingErr":null}}
{"level":"info","ts":"2026-03-27T21:11:57.966Z","caller":"server/authorization.go:60","msg":"Request authorized","path":"/api/v1/evaluations/providers","method":"GET","user":"mmortari"}
{"level":"info","ts":"2026-03-27T21:11:57.966Z","caller":"handlers/providers.go:117","msg":"Request started","request_id":"e03ec4f9-bdae-463e-9412-b1f2f3603075","method":"GET","uri":"/api/v1/evaluations/providers","user_agent":"python-httpx/0.28.1","remote_addr":"10.128.6.31:47414","tenant":"team-a","user":"mmortari","filter":"{\"limit\":50,\"offset\":0,\"params\":map[name: owner: tags:]}"}
{"level":"info","ts":"2026-03-27T21:11:57.968Z","caller":"server/execution_context.go:158","msg":"Request successful","request_id":"e03ec4f9-bdae-463e-9412-b1f2f3603075","method":"GET","uri":"/api/v1/evaluations/providers","user_agent":"python-httpx/0.28.1","remote_addr":"10.128.6.31:47414","tenant":"team-a","user":"mmortari","code":200,"duration":0.002043061}

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Copy link
Copy Markdown

@mariusdanciu mariusdanciu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ruivieira
Copy link
Copy Markdown
Member

One thought (don't feel strongly about it, and I can't recall if this was discussed previously): right now the mcp extra depends on eval-hub-sdk[cli], which means installing MCP pulls in Click, Rich, and all the CLI machinery. In practice the MCP server only needs the client SDK + mcp library.

Would it make sense to flip this?

  • mcp extra stays standalone: just eval-hub-sdk[core] + mcp>=1.0.0
  • cli extra gains a dependency on eval-hub-sdk[mcp]

That way:

  • pip install eval-hub-sdk[mcp] gives you a lightweight MCP server usable as a library or via python -m evalhub.mcp (no CLI deps)
  • pip install eval-hub-sdk[cli] gives you everything including evalhub mcp

The evalhub mcp subcommand would still work exactly as-is; it just means the MCP server code doesn't drag in CLI dependencies when used standalone. A thin __main__.py could handle the python -m case with minimal config (env vars or a small arg parser).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants