feat: Add MCP tool annotations for read-only, destructive, and idempotent hints#839
Conversation
…tent hints Add comprehensive MCP annotations to all tools in the PyAirbyte MCP server: - readOnlyHint: Indicates if tool only reads without making changes - destructiveHint: Signals if changes are destructive (updates/deletes) - idempotentHint: Indicates if repeated calls have the same effect - openWorldHint: Specifies if tool interacts with external systems This classification enables future safe mode implementations and helps AI assistants understand tool behavior patterns. Tool classifications: - Local ops: All read-only except sync_source_to_cache (non-destructive write) - Cloud ops: Two destructive tools (update/delete custom definitions) - Connector registry: All read-only Annotations follow FastMCP 2.2.7+ specification. Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1761253837-add-mcp-tool-annotations' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1761253837-add-mcp-tool-annotations'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughAdds a new annotations module and updates MCP tool registrations in cloud_ops, connector_registry, and local_ops to supply per-tool annotation metadata ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Would you like me to re-evaluate any specific tool annotations (e.g., DESTRUCTIVE vs. IDEMPOTENT choices) before you merge? Wdyt? Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
…dHint usage - Create _annotations.py with documented constants for all MCP hints - Add FastMCP default values to each constant's docstring - Replace string literals with constants across all MCP tool registrations - Clarify openWorldHint usage with inline comments: - True for tools requiring Cloud auth or connecting to source connectors - False for local-only operations (cache, .env files, registry cache) - Link to FastMCP docs in annotations module header Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (9)
airbyte/mcp/cloud_ops.py (4)
756-763: Make time-varying reads non-idempotent.These two tools return status/logs that can legitimately change between identical invocations (even with same params), which contradicts your docstring definition of idempotency (“same result and side effects”). Shall we mark them non-idempotent, wdyt?
Apply:
app.tool( get_cloud_sync_status, annotations={ READ_ONLY_HINT: True, DESTRUCTIVE_HINT: False, - IDEMPOTENT_HINT: True, + IDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True, # Requires Cloud auth }, ) app.tool( get_cloud_sync_logs, annotations={ READ_ONLY_HINT: True, DESTRUCTIVE_HINT: False, - IDEMPOTENT_HINT: True, + IDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True, # Requires Cloud auth }, )Also applies to: 766-773
746-753: Re-evaluate destructiveHint for run_cloud_sync.Connections configured for full-refresh/overwrite or resets can drop/replace tables, which is “destructive” per your hint docs. Should we set DESTRUCTIVE_HINT=True here, or at least document that it depends on the connection’s sync mode so safe mode can treat it cautiously, wdyt?
Example change:
app.tool( run_cloud_sync, annotations={ READ_ONLY_HINT: False, - DESTRUCTIVE_HINT: False, + DESTRUCTIVE_HINT: True, # May overwrite/replace depending on sync mode IDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True, # Requires Cloud auth }, )
776-783: Optional: consider non-idempotent for list calls in open world.*list_deployed_* and list_custom_source_definitions can change between calls due to external drift. It’s reasonable to keep them idempotent=True if you define idempotency as “no side effects,” but your docs say “same result and side effects.” Do you want to flip these to False for consistency with that definition, wdyt?
Also applies to: 786-793, 796-803, 816-824
837-844: Delete idempotency semantics: True or False?permanently_delete_custom_source_definition is marked idempotent, but repeated calls will likely error on the second run (not “same result”). Two options:
- Keep True and swallow “not found” with a stable “already deleted or not found” message to maintain idempotent behavior; or
- Set IDEMPOTENT_HINT=False.
Which approach do you prefer, wdyt?
Example if keeping True (pseudocode only; not required in this PR):
- definition.permanently_delete( + try: + definition.permanently_delete( safe_mode=True, - ) + ) + except NotFoundError: + passairbyte/mcp/_annotations.py (1)
14-52: Polish: mark as constants and export explicitly?Would you like to add typing.Final and an all to make the export surface explicit and help linters/types, wdyt?
Suggested tweak:
+from typing import Final + -READ_ONLY_HINT = "readOnlyHint" +READ_ONLY_HINT: Final = "readOnlyHint" -DESTRUCTIVE_HINT = "destructiveHint" +DESTRUCTIVE_HINT: Final = "destructiveHint" -IDEMPOTENT_HINT = "idempotentHint" +IDEMPOTENT_HINT: Final = "idempotentHint" -OPEN_WORLD_HINT = "openWorldHint" +OPEN_WORLD_HINT: Final = "openWorldHint" + +__all__ = [ + "READ_ONLY_HINT", + "DESTRUCTIVE_HINT", + "IDEMPOTENT_HINT", + "OPEN_WORLD_HINT", +]airbyte/mcp/local_ops.py (4)
787-795: Trim _CONFIG_HELP from tools that don’t take config inputs.describe_default_cache, list_cached_streams, list_dotenv_secrets, and run_sql_query don’t accept config-related params. Removing _CONFIG_HELP from their descriptions could reduce noise. Shall we drop it for these four tools, wdyt?
- description=(describe_default_cache.__doc__ or "").rstrip() + "\n" + _CONFIG_HELP, + description=(describe_default_cache.__doc__ or "").rstrip(), @@ - description=(list_cached_streams.__doc__ or "").rstrip() + "\n" + _CONFIG_HELP, + description=(list_cached_streams.__doc__ or "").rstrip(), @@ - description=(list_dotenv_secrets.__doc__ or "").rstrip() + "\n" + _CONFIG_HELP, + description=(list_dotenv_secrets.__doc__ or "").rstrip(), @@ - description=(run_sql_query.__doc__ or "").rstrip() + "\n" + _CONFIG_HELP, + description=(run_sql_query.__doc__ or "").rstrip(),Also applies to: 821-829, 833-840, 865-872
776-784: Consistency: add explicit description for list_connector_config_secrets?Most tools here set a description explicitly. Do you want to set description=(list_connector_config_secrets.doc or "").rstrip() for consistency in tool metadata, wdyt?
427-429: Prefer structured logging over stderr print.Printing to stderr inside MCP tools can be noisy for clients. Would you switch to a logger (or app.log) at INFO/DEBUG, wdyt?
- print(f"Retrieved {len(records)} records from stream '{stream_name}'", sys.stderr) + # logger.debug(...) # or use FastMCP's logging if available
878-883: Classification check for sync_source_to_cache (non-destructive).You marked destructiveHint=False with a comment “additive/merge operations”. If any cache implementation performs overwrite/replace, should we flip to True or keep False but clarify in the description that behavior is cache-dependent, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte/mcp/_annotations.py(1 hunks)airbyte/mcp/cloud_ops.py(2 hunks)airbyte/mcp/connector_registry.py(2 hunks)airbyte/mcp/local_ops.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte/mcp/connector_registry.py
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/mcp/local_ops.py (1)
airbyte/caches/base.py (1)
run_sql_query(195-241)
airbyte/mcp/cloud_ops.py (1)
airbyte/cloud/workspaces.py (2)
publish_custom_source_definition(461-552)list_custom_source_definitions(554-582)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/_annotations.py (1)
20-21: Docstring defaults verified against FastMCP 2.2.7 specification.All four tool annotation defaults in the docstrings match the FastMCP 2.2.7 spec:
readOnlyHint: False ✓destructiveHint: True ✓idempotentHint: False ✓openWorldHint: True ✓No drift detected. The code is good to go!
airbyte/mcp/local_ops.py (1)
771-894: Verification complete—all app.tool registrations are properly configured.The scan across
airbyte/mcp/local_ops.py,airbyte/mcp/cloud_ops.py, andairbyte/mcp/connector_registry.pyconfirms that everyapp.tool()registration includes all four required hint keys (READ_ONLY_HINT,DESTRUCTIVE_HINT,IDEMPOTENT_HINT,OPEN_WORLD_HINT) with no typos detected. The code snippet you provided (lines 771–894) accurately reflects this consistency.
- Remove hints that match FastMCP documented defaults: - READ_ONLY_HINT: False (default) - DESTRUCTIVE_HINT: True (default) - IDEMPOTENT_HINT: False (default) - OPEN_WORLD_HINT: True (default) - Only specify non-default values to reduce noise - Keep inline comments for OPEN_WORLD_HINT: False to clarify local-only operations Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/mcp/local_ops.py (1)
657-667: Avoid NameError from runtime-evaluated local type annotations.DuckDBCache is only imported under TYPE_CHECKING, but it’s referenced in local variable annotations inside functions. In Python without
from __future__ import annotations, those annotations are evaluated at runtime when the line executes, leading toNameError: name 'DuckDBCache' is not defined. Two lightweight fixes:
- Quote the local annotations, or
- Enable postponed evaluation via a module-level future import.
Here’s an in-place, minimal diff using quoted annotations:
@@ - cache: DuckDBCache = get_default_cache() + cache: "DuckDBCache" = get_default_cache() @@ - cache: DuckDBCache = get_default_cache() + cache: "DuckDBCache" = get_default_cache()Alternatively, add the future import near the top (after the module docstring, before other imports):
@@ -"""Local MCP operations.""" +"""Local MCP operations.""" +from __future__ import annotationsEither approach prevents runtime evaluation of the forward reference. Prefer the future import if this pattern appears elsewhere in the module, wdyt?
Also applies to: 752-769, 30-32
🧹 Nitpick comments (3)
airbyte/mcp/local_ops.py (3)
682-715: Permit common read-only SQL forms (WITH CTEs, PRAGMA) and trim leading comments?currently
_is_safe_sqlblocks:
- CTEs starting with
WITH(valid read-only SELECTs).- DuckDB
PRAGMAreads (e.g.,PRAGMA show_tables).- Queries prefixed by comments or whitespace lines.
Would you consider expanding the allowlist and normalizing the first token before checks, e.g.?
@@ - normalized_query = sql_query.strip().upper() + # Strip leading comments and whitespace before checking the first keyword + q = sql_query.lstrip() + while q.startswith("--") or q.startswith("/*"): + # drop single-line or block comments + if q.startswith("--"): + q = q.splitlines()[1:] # remove first line + q = ("\n".join(q)).lstrip() + else: + end = q.find("*/") + q = q[end+2:].lstrip() if end != -1 else "" + normalized_query = q.upper() @@ - allowed_prefixes = ( + allowed_prefixes = ( "SELECT", "DESCRIBE", "DESC", "SHOW", "EXPLAIN", + "WITH", # Allow read-only CTE queries + "PRAGMA", # DuckDB pragma reads )Keeps the safety posture but avoids false negatives for common read-only queries. Optional, wdyt?
785-787: Suggestion verified and accurate—remove _CONFIG_HELP from the 4 tools without config parameters.I've confirmed your observation. The four tools you identified—
describe_default_cache,list_cached_streams,list_dotenv_secrets, andrun_sql_query—genuinely lackconfig,config_file, orconfig_secret_nameparameters.In contrast, the other six tools using
_CONFIG_HELP(get_source_stream_json_schema, get_stream_previews, list_source_streams, read_source_stream_records, sync_source_to_cache, validate_connector_config) do accept config parameters, so keeping_CONFIG_HELPfor them makes sense.Removing it from the non-config tools will indeed improve clarity for users, since they won't see configuration guidance for tools that don't actually need it. The suggested diffs accurately capture all four cases. Wdyt—should we apply these changes?
776-783: Add explicit OPEN_WORLD_HINT and IDEMPOTENT_HINT annotations to tools with external dependencies.Your suggestions are spot-on. I've verified each tool against its implementation:
- list_connector_config_secrets calls
GoogleGSMSecretManager.fetch_connector_secrets()→ needsOPEN_WORLD_HINT: True- get_stream_previews calls
source.get_samples()with variable results → needsIDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True- list_source_streams calls
source.get_available_streams()→ needsOPEN_WORLD_HINT: True- read_source_stream_records returns variable record slices → needs
IDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True- sync_source_to_cache reads from external source + writes to cache → needs
READ_ONLY_HINT: False, IDEMPOTENT_HINT: False, OPEN_WORLD_HINT: True- validate_connector_config calls
source.check()with remote APIs → needsOPEN_WORLD_HINT: True- get_source_stream_json_schema (not fully visible but context-clear) → needs
OPEN_WORLD_HINT: TrueThe pattern you've established (local-only tools already have
OPEN_WORLD_HINT: False) makes the distinction explicit for safe-mode behavior. The diff you provided aligns perfectly with what each function actually does.Does this match your intended semantics for distinguishing local-only vs. external-dependent tools?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/cloud_ops.py(2 hunks)airbyte/mcp/connector_registry.py(2 hunks)airbyte/mcp/local_ops.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/mcp/cloud_ops.py
- airbyte/mcp/connector_registry.py
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/local_ops.py (1)
airbyte/caches/base.py (1)
run_sql_query(195-241)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (2)
airbyte/mcp/local_ops.py (2)
16-21: Nice: centralizing annotation keys via constants.Importing DESTRUCTIVE_HINT/READ_ONLY_HINT/IDEMPOTENT_HINT/OPEN_WORLD_HINT keeps registrations consistent and avoids typos. LGTM, wdyt?
776-873: Manual verification of MCP annotations required—sandbox cannot run poetry.I've confirmed that all annotations are present in the code and structurally consistent. However, since
poetryisn't available in the sandbox, I can't executepoetry run poe mcp-inspectto validate the JSON output and runtime exposure.I did notice a few annotation patterns worth double-checking when you run the inspection:
- get_stream_previews and read_source_stream_records only have
READ_ONLY_HINT: True(missingIDEMPOTENT_HINT)—is this intentional for streaming operations?- sync_source_to_cache uses only
DESTRUCTIVE_HINT: FalsewithoutREAD_ONLY_HINT, which makes sense for a cache mutation, but worth confirming the inspection output reflects this correctly.Could you run the following when you have poetry available?
poetry run poe mcp-inspect --tools | grep -A 10 "local_ops"This will show the actual runtime hints and confirm they match your expectations. Would that work for you?
feat: Add MCP tool annotations for read-only, destructive, and idempotent hints
Summary
This PR adds comprehensive MCP annotations to all 27 tools in the PyAirbyte MCP server, classifying each tool's behavior according to the FastMCP 2.2.7+ specification. These annotations enable AI assistants and future safe mode implementations to understand tool behavior patterns.
Annotations added:
readOnlyHint: Indicates if tool only reads without making changesdestructiveHint: Signals if changes are destructive (updates/deletes)idempotentHint: Indicates if repeated calls have the same effectopenWorldHint: Specifies if tool interacts with external systemsTool breakdown:
sync_source_to_cache(non-destructive write)update_custom_source_definition,permanently_delete_custom_source_definition)This is groundwork for future safe mode implementation - no enforcement logic is included in this PR.
Review & Testing Checklist for Human
Note: This PR only adds metadata annotations and does not implement any enforcement logic. Classification accuracy is critical for future safe mode PRs.
sync_source_to_cacheclassification: Currently marked asdestructiveHint: False(non-destructive), but this depends on write strategy. REPLACE strategy could be destructive - does this need to beTrue?poetry run poe mcp-inspectto verify annotations appear in the MCP tool list and are properly formattedidempotentHint: False(likeget_stream_previews,read_source_stream_records) - these return potentially different data on each call. Confirm this is the intended behavior for idempotency classification.openWorldHint: Trueinteract with external systems (connectors, cloud APIs). Verify this makes sense for each tool.Test Plan
Notes
Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Devin Session: https://app.devin.ai/sessions/026dbd2898474bc2a3ce21aec839533e
Summary by CodeRabbit
New Features
Documentation
Chores
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.