feat(mcp): Add AIRBYTE_MCP_DOMAINS and AIRBYTE_MCP_DOMAINS_DISABLED env vars#890
Conversation
…nv vars Add environment variable options to control which MCP tool domains are advertised: - AIRBYTE_MCP_DOMAINS: CSV of enabled domains (e.g., 'registry,cloud') - AIRBYTE_MCP_DOMAINS_DISABLED: CSV of disabled domains (e.g., 'registry') Domain filtering logic: - If neither is set: all domains enabled - If only AIRBYTE_MCP_DOMAINS is set: only those domains enabled - If only AIRBYTE_MCP_DOMAINS_DISABLED is set: all except those enabled - If both are set: disabled list subtracts from enabled list Added comprehensive unit tests for all scenarios. Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764726550-mcp-domain-filtering' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764726550-mcp-domain-filtering'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughAdds environment-configurable MCP domains and domain-aware gating for MCP tool registration, including cached resolution, validation/warnings for unknown domains, a public is_domain_enabled(domain) helper, and unit tests covering parsing and registration behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Possibly related PRs
Suggested reviewers
Would you like additional reviewers added for platform or testing coverage, wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
airbyte/mcp/_tool_utils.py (2)
31-45: Nice, robust env parsing for domain listsThe CSV → lowercased
set[str]parsing with whitespace trimming and empty filtering is clean and matches the tests. Since bothAIRBYTE_MCP_DOMAINSandAIRBYTE_MCP_DOMAINS_DISABLEDshare the same parsing pattern, would it be worth extracting a tiny helper like_parse_domain_env(var_name: str) -> set[str]to reduce duplication and keep future tweaks (e.g., logging invalid entries) in one place, wdyt?
85-116: is_domain_enabled logic looks sound and well-coveredThe four-way logic (no envs → allow all; only enabled; only disabled; both with disabled subtracting) reads clearly and matches the docstring and tests. The case-insensitive handling via
domain.lower()and lowercased sets is especially nice for operator UX. Since this function is now part of the public surface, would adding a short note in the docstring about case-insensitivity help future readers understand that behavior at a glance, wdyt?tests/unit_tests/test_mcp_tool_utils.py (3)
11-155: Parametrized coverage foris_domain_enabledlooks greatThe matrix here nicely exercises all combinations of enabled/disabled sets plus a couple of case-insensitivity scenarios, which aligns well with the implementation. To fully mirror the implementation behavior, would you consider adding a case where both lists are set and the domain appears in neither (e.g., enabled
{"cloud"}, disabled{"registry"}, domain"other", expectingFalse) just to make that branch explicit in tests, wdyt?
158-295: Add a case for uppercasetool_domainunder readonly mode?These tests do a nice job validating the interaction between domain filters and
AIRBYTE_CLOUD_MCP_READONLY_MODE. Given thatis_domain_enabledis case-insensitive butshould_register_toolcurrently checksdomain == "cloud"for readonly, would it be worth adding a case like:
enabled_domains={"cloud"},tool_domain="CLOUD",readonly_mode=True,is_readonly=False,expected=Falseto codify the intended behavior and back up the suggested change in
should_register_tool, wdyt?
346-380: Consider parity tests forAIRBYTE_MCP_DOMAINS_DISABLEDparsingThe disabled-domains parsing tests validate basic cases (empty, single, multiple), which is solid. Since the implementation shares the same parsing logic as
AIRBYTE_MCP_DOMAINS, would you be up for adding one or two extra cases mirroring the main env var tests (e.g., uppercase normalization and empty entries) to ensure behavior stays in sync if parsing ever evolves, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/_tool_utils.py(3 hunks)tests/unit_tests/test_mcp_tool_utils.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit_tests/test_mcp_tool_utils.py (1)
airbyte/mcp/_tool_utils.py (2)
is_domain_enabled(85-115)should_register_tool(118-139)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (1)
tests/unit_tests/test_mcp_tool_utils.py (1)
298-343: Env parsing tests forAIRBYTE_MCP_DOMAINSare thoroughNice use of
patch.dict+importlib.reloadto validate parsing behavior without polluting global state. The cases cover empties, spacing, multiple values, uppercase normalization, and empty entries, which lines up with the comprehension in_tool_utils. Since you already reload the module after exiting the patched env, this should also keep the rest of the suite clean. Nothing blocking here from my side.
…-insensitive handling - Move AIRBYTE_MCP_DOMAINS and AIRBYTE_MCP_DOMAINS_DISABLED to airbyte/mcp/constants.py for better documentation in pdoc docs - Fix case-insensitive readonly mode check in should_register_tool - Standardize test imports to use module import style consistently - Update tests to reload constants module for env var parsing tests Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unit_tests/test_mcp_tool_utils.py (2)
333-344: Consider wrapping the cleanup reload in atry/finallyfor test isolation, wdyt?If the assertion on line 341 fails, the cleanup
importlib.reload(constants)on line 343 won't execute, potentially leaving the constants module in a modified state for subsequent tests.def test_domain_env_var_parsing(env_value: str, expected_set: set[str]) -> None: """Test that AIRBYTE_MCP_DOMAINS env var is parsed correctly.""" import importlib import airbyte.mcp.constants as constants - with patch.dict("os.environ", {"AIRBYTE_MCP_DOMAINS": env_value}, clear=False): - importlib.reload(constants) - assert constants.AIRBYTE_MCP_DOMAINS == expected_set - - importlib.reload(constants) + try: + with patch.dict("os.environ", {"AIRBYTE_MCP_DOMAINS": env_value}, clear=False): + importlib.reload(constants) + assert constants.AIRBYTE_MCP_DOMAINS == expected_set + finally: + importlib.reload(constants)
366-380: Sametry/finallypattern could apply here for consistency.Same cleanup concern as the other env var parsing test—if you decide to apply the
try/finallypattern above, might as well keep this one consistent, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/_tool_utils.py(4 hunks)airbyte/mcp/constants.py(1 hunks)tests/unit_tests/test_mcp_tool_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/_tool_utils.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: yohannj
Repo: airbytehq/PyAirbyte PR: 716
File: airbyte/logs.py:384-402
Timestamp: 2025-07-11T19:53:44.427Z
Learning: In the PyAirbyte project, when reviewing PRs, maintain clear separation of concerns. Don't suggest changes that are outside the scope of the PR's main objective, even if they would improve consistency or fix other issues. This helps with reviewing changes and potential reverts.
🧬 Code graph analysis (1)
tests/unit_tests/test_mcp_tool_utils.py (1)
airbyte/mcp/_tool_utils.py (2)
is_domain_enabled(74-104)should_register_tool(107-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
tests/unit_tests/test_mcp_tool_utils.py (2)
11-156: Nice comprehensive test coverage foris_domain_enabled!The parametrized test cases cover all the documented scenarios well—empty sets, enabled-only, disabled-only, both set, and case insensitivity. The patching approach is correct, targeting the constants in the module where they're used.
158-296: Good coverage of the domain + readonly mode interaction!I particularly like the test case at line 251-259 (
domain_filter_blocks_even_readonly_cloud_tool) which validates that domain filtering takes precedence—this matches the implementation logic inshould_register_toolwhere domain check comes first.airbyte/mcp/constants.py (2)
14-21: Clean parsing logic with good handling of edge cases!The set comprehension correctly handles comma-separated values with whitespace trimming and lowercase normalization. The empty value filtering with
if d.strip()ensures spurious empty strings from"a,,b"are excluded.
22-51: Excellent documentation!The docstrings clearly explain the behavior, precedence rules, and include practical examples. The note about disabled list taking precedence (subtracting from enabled) aligns with the implementation in
is_domain_enabled.
- Add MCPToolDomain(str, Enum) as single source of truth for valid domains - Derive MCP_TOOL_DOMAINS set from enum values - Add warning when unknown domains are specified in env vars - Add tests for unknown domain warning functionality Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/unit_tests/test_mcp_tool_utils.py (2)
346-381: *Consider mirroring the richer parsing cases for _DISABLEDRight now
test_domain_disabled_env_var_parsingonly covers empty/single/multi, whereas the enabled env var tests also cover spaces, uppercase normalization, and empty entries. Would it be worth adding the same extra cases here for symmetry and future‑proofing, so both env vars are validated against the same edge cases, wdyt?
383-425: Unknown-domain warning tests are clear; maybe add a combined caseThe warning assertions are nicely targeted (specific substring + “Known MCP domains are”) and resilient to extra warnings. If you ever want to tighten coverage further, would you consider adding a parametrized case where both enabled and disabled env vars contain unknown domains, to exercise the branch that concatenates both messages into
_parts, wdyt?airbyte/mcp/constants.py (1)
15-89: Domain constants and env parsing match the documented behaviorThe enum + derived
MCP_TOOL_DOMAINSset, lowercased/trimmed parsing, and unknown-domain validation all look consistent with the described semantics. One small ergonomics thought: would bumpingstacklevelinwarnings.warn(e.g., to 2) so that the warning points at the import site (or higher-level caller) instead of insideconstants.pymake debugging misconfigured env vars a bit friendlier, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/constants.py(1 hunks)tests/unit_tests/test_mcp_tool_utils.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
tests/unit_tests/test_mcp_tool_utils.py (3)
11-155: is_domain_enabled test matrix looks solidThe parametrization here does a nice job of encoding the spec (allowlist/denylist semantics + case-insensitivity) in a readable way, and the direct patching of the imported constants keeps things focused on behavior instead of env parsing. I don’t see any gaps relative to the described rules.
158-295: Good coverage of domain filtering vs readonly modeThe matrix for
test_should_register_toolnicely exercises the interaction between domain filters andAIRBYTE_CLOUD_MCP_READONLY_MODE, including the “domain filter wins over readonly” case. This should protect against regressions in the ordering of checks insideshould_register_tool.
298-343: Env parsing tests for AIRBYTE_MCP_DOMAINS are thoroughThe parsing cases (empty, single, multi, spaces, uppercase, double commas) map cleanly to the implementation and confirm the normalization behavior; reloading the constants module in and out of the patched env block also avoids leaking state across tests. Looks good to me.
AIRBYTE_MCP_DOMAINS and AIRBYTE_MCP_DOMAINS_DISABLED env vars
…u_cache helper - Move MCP domain constants from airbyte/mcp/constants.py to airbyte/constants.py - Store env var values as list[str] | None instead of sets - Add _resolve_mcp_domain_filters() helper with @lru_cache to avoid warning multiple times - Delete airbyte/mcp/constants.py (no bifurcation) - Update tests to work with new structure Addresses PR feedback from AJ. Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/constants.py (2)
241-249: MCP_TOOL_DOMAINS is clear; consider deriving from the Enum to avoid drift?The literal list of domains looks correct and matches the documented values. Since the PR also introduces an
MCPToolDomainenum as the single source of truth, would it make sense to deriveMCP_TOOL_DOMAINSfrom that enum (e.g.,[d.value for d in MCPToolDomain]) if module layering allows, so the set of valid domains can’t silently diverge over time, wdyt?
251-281: Env parsing logic is solid; consider a tiny CSV parsing helper to DRY this up?The normalization and
or Nonebehavior for bothAIRBYTE_MCP_DOMAINSandAIRBYTE_MCP_DOMAINS_DISABLEDlook correct and handle empty/whitespace values gracefully. Since the parsing pattern is duplicated and might be reused for future MCP-related env vars, would extracting a small helper improve readability and reduce chances of subtle divergence, wdyt?For example, you could introduce something like:
def _parse_csv_env(name: str) -> list[str] | None: raw = os.getenv(name, "").strip() if not raw: return None values = [d.strip().lower() for d in raw.split(",") if d.strip()] return values or Noneand then:
AIRBYTE_MCP_DOMAINS: list[str] | None = _parse_csv_env("AIRBYTE_MCP_DOMAINS") AIRBYTE_MCP_DOMAINS_DISABLED: list[str] | None = _parse_csv_env("AIRBYTE_MCP_DOMAINS_DISABLED")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/constants.py(1 hunks)airbyte/mcp/_tool_utils.py(4 hunks)tests/unit_tests/test_mcp_tool_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/mcp/_tool_utils.py
- tests/unit_tests/test_mcp_tool_utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
Combine intermediate variables into single expressions while preserving whitespace trimming and empty value filtering. Co-Authored-By: AJ Steers <aj@airbyte.io>
- Reduce from 450 lines to 104 lines (77% reduction) - Move all imports to top of file (no inline imports) - Combine domain logic tests into single parametrized test - Use compact tuple tables instead of verbose pytest.param blocks Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/unit_tests/test_mcp_tool_utils.py (1)
69-121: Consider DRY‑ing the env‑parsing patch/reload pattern
_ENV_PARSE_CASESandtest_env_parsingcorrectly validate normalization, splitting, and empty‑value handling by reloadingairbyte.constantsunder differentos.environvalues. To keep this test lean and easier to tweak later, would it be worth extracting thepatch.dict(...) + importlib.reload(constants)/ reset‑reload pattern into a small helper or pytest fixture so the body of the test focuses purely on the cases table and assertions, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/unit_tests/test_mcp_tool_utils.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit_tests/test_mcp_tool_utils.py (1)
airbyte/mcp/_tool_utils.py (3)
_resolve_mcp_domain_filters(78-110)is_domain_enabled(113-144)should_register_tool(147-169)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
tests/unit_tests/test_mcp_tool_utils.py (2)
16-67: Domain/readonly matrix covers behavior thoroughlyThe
_DOMAIN_CASESmatrix plustest_domain_logicdo a nice job exercising all the key combinations of enabled/disabled domains and readonly mode, and clearing the_resolve_mcp_domain_filterscache inside the patched context keeps each parametrized case isolated. I don’t see any functional gaps here.
123-153: Warning tests correctly exercise unknown‑domain messagingThe
_WARNING_CASESdata plustest_unknown_domain_warningvalidate both the specific “unknown domain(s)” fragment and the “Known MCP domains are:” suffix, and the combination ofimportlib.reload(constants),importlib.reload(tool_utils), and_resolve_mcp_domain_filters.cache_clear()ensures the LRU cache and module‑level constants reflect the patched environment for each case. This looks solid to me.
Summary
Adds two new environment variables to control which MCP tool domains are advertised by the server:
AIRBYTE_MCP_DOMAINS: CSV of enabled domains (e.g.,registry,cloud)AIRBYTE_MCP_DOMAINS_DISABLED: CSV of disabled domains (e.g.,registry)Domain filtering logic:
registry,cloudregistryregistry,cloudregistryValues are case-insensitive and whitespace is trimmed.
Updates since last revision
MCPToolDomain(str, Enum)as single source of truth for valid domains (cloud,local,registry)airbyte/mcp/constants.pymodule for better pdoc documentationReview & Testing Checklist for Human
AIRBYTE_MCP_DOMAINS=cloudand confirm only cloud tools are advertised (not registry or local tools)AIRBYTE_MCP_DOMAINS_DISABLED=registryand confirm registry tools are hidden but cloud and local are availableshould_register_toolfunction was refactored - confirm thatAIRBYTE_CLOUD_MCP_READONLY_MODEstill works correctly for cloud toolsAIRBYTE_MCP_DOMAINS=cloud,typo_domainand confirm a warning is printed listing valid domainsRecommended test plan:
Notes
AIRBYTE_CLOUD_MCP_READONLY_MODE, etc.)MCPToolDomainenum follows existing pattern fromairbyte/strategies.py(WriteStrategy,WriteMethod)Literal["cloud", "local", "registry"]type hints in_tool_utils.pywere intentionally left unchanged to avoid large refactoring scopeLink to Devin run: https://app.devin.ai/sessions/1dd27d375eba46dbaee62820a6d9e0da
Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.