feat(mcp): Add connection status filtering to list_deployed_cloud_connections#882
Conversation
…nections - Add with_connection_status parameter to optionally include sync job status info - Add failing_connections_only parameter to filter for failed connections - Extend CloudConnectionResult model with optional status fields: - last_job_status, last_job_id, last_job_time - currently_running_job_id, currently_running_job_start_time - Skip in-progress syncs when determining last completed job status Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764627630-list-failed-cloud-connections' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1764627630-list-failed-cloud-connections'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughAdds five job-metadata fields to CloudConnectionResult and extends list_deployed_cloud_connections with two flags: with_connection_status and failing_connections_only. When enabled, per-connection sync logs are inspected to populate last-completed and currently-running job details and to optionally filter to failing connections. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Would you like a short changelog entry or a unit-test checklist for the new fields/filters — wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)
885-896: Minor edge case with multiple concurrent running jobs.If there are multiple incomplete jobs in
sync_logs, this loop overwritescurrently_running_job_ideach time until it finds a completed job. Ifsync_logsis ordered newest-first, this would end up storing an older running job rather than the most recent one.Would it make sense to break after the first incomplete job is found (or store only the first one), so we capture the most recent running job? Something like:
if not sync_result.is_job_complete(): + if currently_running_job_id is None: # Only capture first (most recent) running job currently_running_job_id = sync_result.job_id currently_running_job_start_time = sync_result.start_time.isoformat() continueThis is likely a rare edge case in practice, so feel free to defer if it's not a concern, wdyt?
898-901: Verify if "cancelled" should be included in the failing filter.The filter checks for both
"failed"and"cancelled"statuses, but the docstring (line 848) and the earlier docblock (line 857-858) only mention "failed".If including cancelled jobs is intentional (which seems reasonable since cancelled syncs didn't complete successfully), could we update the docstring to mention both? For example:
- "If True, only return connections where the most recent completed sync failed", + "If True, only return connections where the most recent completed sync failed or was cancelled",What do you think?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/cloud_ops.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte/mcp/cloud_ops.py (3)
73-86: Well-structured optional fields with clear documentation.The new fields are appropriately typed and documented, with clear docstrings indicating when they'll be populated. The use of
Nonedefaults for optional status data is clean.
842-860: Clear parameter annotations and well-written docstring.The new parameters are properly annotated and the docstring effectively communicates the behavior, including the implicit enabling of
with_connection_statuswhenfailing_connections_onlyis set. Nice work on the documentation!
917-921: Good placement of the limit check.Applying
max_items_limitafter filtering (inside the loop with a break) is the correct approach here—it ensures we return up to the requested number of matching results rather than limiting before filtering. Efficient implementation!
PyTest Results (Full)388 tests ±0 371 ✅ - 1 24m 29s ⏱️ -16s For more details on these failures, see this check. Results for commit d210983. ± Comparison against base commit ef69ae0. ♻️ This comment has been updated with latest results. |
Aldo Gonzalez (aldogonzalez8)
left a comment
There was a problem hiding this comment.
APPROVED
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte/mcp/cloud_ops.py (1)
842-851: Fix syntax error: default values must be outside Annotated.The pipeline failures correctly identified a syntax error here. In Python function signatures, default values must be assigned using
=after the type annotation, not within theAnnotatedbrackets usingdefault=.Apply this diff to fix the syntax:
with_connection_status: Annotated[ bool | None, "If True, include status info for each connection's most recent sync job", - default=False, - ], + ] = False, failing_connections_only: Annotated[ bool | None, "If True, only return connections where the most recent completed sync failed", - default=False, - ], + ] = False,
🧹 Nitpick comments (1)
airbyte/mcp/cloud_ops.py (1)
885-885: Consider: is limit=5 always sufficient?With
limit=5, if a connection has 5 or more consecutive in-progress syncs, the code might not find the last completed job. While this scenario seems unlikely, it could happen in workspaces with frequent manual sync triggers or aggressive scheduling. Would it be safer to increase this limit (e.g., to 10 or 20), or is 5 a reasonable trade-off between API efficiency and completeness, wdyt?If you'd like to increase the limit:
- sync_logs = connection.get_previous_sync_logs(limit=5) + sync_logs = connection.get_previous_sync_logs(limit=10)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/cloud_ops.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (2)
airbyte/cloud/connections.py (2)
get_previous_sync_logs(208-230)name(86-94)airbyte/cloud/sync_results.py (3)
get_job_status(276-278)is_job_complete(272-274)start_time(304-320)
🪛 GitHub Actions: Generate Docs
airbyte/mcp/cloud_ops.py
[error] 845-845: SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='?
🪛 GitHub Actions: Run Linters
airbyte/mcp/cloud_ops.py
[error] 845-850: Ruff check failed due to SyntaxError: Expected ':' but found '=' in default parameter values. Likely caused by incorrect default values in function annotations at lines 845 and 850. Command: poetry run ruff check .
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
🔇 Additional comments (6)
airbyte/mcp/cloud_ops.py (6)
73-86: LGTM! Well-structured job metadata fields.The new fields for tracking last completed and currently running job information are well-typed and clearly documented. The separation between completed vs. running job data is a good design choice.
853-862: Clear and comprehensive documentation.The updated docstring does a great job explaining the new parameters and their behavior, including the implicit relationship between
failing_connections_onlyandwith_connection_status. Well done!
871-873: Good design: implicit enabling makes sense.Automatically enabling
with_connection_statuswhenfailing_connections_onlyis true is a sensible design choice, since you need the status info to filter anyway.
920-922: Good: max_items_limit now respects filtering.The behavior change where
max_items_limitis applied after filtering (rather than limiting the initial connection list) is the correct approach for this feature. This ensures the limit applies to the final filtered results, which is what users would expect.
877-918: Well-structured implementation with proper filtering logic.The implementation correctly:
- Initializes job tracking variables for each connection
- Conditionally fetches sync logs only when needed
- Skips in-progress syncs to find the last completed job
- Filters failing connections appropriately
- Builds result objects with the new fields
The logic flow is clear and maintainable.
895-895: The implementation on line 895 is correct—no changes needed.
job_status.valuealready produces the expected lowercase strings ("succeeded", "failed", "cancelled"). TheJobStatusEnumfromairbyte_api.modelshas these exact values, as confirmed by the code on line 901 which filters for{"failed", "cancelled"}and the comparisons inapi_util.pyagainst"succeeded"and"failed".
Fix syntax error by using Field() wrapper for default values in with_connection_status and failing_connections_only parameters. Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/mcp/cloud_ops.py (2)
842-855: Simplify the new flags’ types frombool | Noneto plainbool?Since
with_connection_statusandfailing_connections_onlydon’t appear to useNoneas a distinct state (you only branch on their truthiness, andField(default=False)provides the default), would it be simpler and more consistent with the other flags in this file to declare them asboolinstead ofbool | None?Something like:
- with_connection_status: Annotated[ - bool | None, + with_connection_status: Annotated[ + bool, Field( description="If True, include status info for each connection's most recent sync job", default=False, ), ], - failing_connections_only: Annotated[ - bool | None, + failing_connections_only: Annotated[ + bool, Field( description="If True, only return connections with failed/cancelled last sync", default=False, ), ],This would avoid
Noneever being a runtime value for these flags while preserving the same MCP defaults. Wdyt?
875-925: Small loop tweak to avoid overwriting the latest running job infoThe overall control flow around
failing_connections_onlyand the status computation looks good, and the “implicit enable” ofwith_connection_statuswhen filtering for failures is clear.One minor edge case: inside the
sync_logsloop, a second in‑progress job would overwritecurrently_running_job_id/_start_time, even though the first one you saw is presumably the most recent (assumingget_previous_sync_logsreturns latest-first, as inairbyte/cloud/connections.py). That’s probably rare, but you could cheaply make it more robust by only setting the “currently running” fields once:- for sync_result in sync_logs: - job_status = sync_result.get_job_status() - - if not sync_result.is_job_complete(): - currently_running_job_id = sync_result.job_id - currently_running_job_start_time = sync_result.start_time.isoformat() - continue + for sync_result in sync_logs: + job_status = sync_result.get_job_status() + + if not sync_result.is_job_complete(): + if currently_running_job_id is None: + currently_running_job_id = sync_result.job_id + currently_running_job_start_time = sync_result.start_time.isoformat() + continueThis keeps the semantics the same in the common case and avoids accidentally reporting an older running job if multiple show up in the window. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/cloud_ops.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/cloud_ops.py (2)
airbyte/cloud/connections.py (1)
get_previous_sync_logs(208-230)airbyte/cloud/sync_results.py (3)
get_job_status(276-278)is_job_complete(272-274)start_time(304-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/mcp/cloud_ops.py (1)
73-86: New job metadata fields onCloudConnectionResultlook solidThe new optional job/status fields line up well with how
list_deployed_cloud_connectionspopulates them and stay backward compatible thanks to= Nonedefaults. I don’t see any issues here; this shape should be easy for MCP tools to consume.
Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte/mcp/cloud_ops.py (3)
74-87: Clarify semantics for the new job metadata fields?These additions look good and fit the existing patterns. A couple of tiny clarity tweaks you might consider:
last_job_time/currently_running_job_start_timeare populated fromstart_time.isoformat(), but the docstrings say “timestamp of the … sync” without saying whether that’s the start or completion time; would explicitly calling out “start time” (or switching to end/completion time if that’s more useful) make this less ambiguous?- Even when
with_connection_status=True, these fields can remainNone(no prior syncs, or only in‑progress jobs); would you want to mention that explicitly so MCP clients don’t assume non‑null whenever the flag is set?Wdyt?
843-856: Tighten flag typing and keep docs/schema hints fully aligned with behavior?Using
Annotated[bool | None, Field(default=False, …)]works fine with the MCP tooling, but since there isn’t a meaningful third state here, would switching these to plainbool(as withinclude_attemptsand other flags above) simplify type-checking and avoid a tri-state that callers don’t really need?Also, the docstring nicely explains that
failing_connections_onlyimplicitly enableswith_connection_status. Would you consider mirroring that in thewith_connection_statusFielddescription (e.g., “If True, include status info …; this is implicitly enabled whenfailing_connections_onlyis True”) so the generated schema is self-explanatory even without reading the docstring?Wdyt?
Also applies to: 858-867
19-19: Import is valid; None-guard concern is unnecessary, but tests would strengthen this new feature.The import of
FAILED_STATUSESat line 19 is correct, and the filtering logic matches the PR intent: you fetch logs only whenwith_connection_statusis enabled, skip in-progress jobs, derive the last completed status/time, and filter withFAILED_STATUSESwhen needed.Regarding the suggested hardening: the
start_time.isoformat()calls at lines 898 and 904 are safe—thestart_timeproperty (defined inSyncResult) always returns adatetimeobject or raises an exception; it never returnsNone. No guard is needed.The test suggestions remain valuable for locking in behavior:
- A connection with a successful last completed job excluded when
failing_connections_only=True- A connection with a running job and a failed last completed job returns both
currently_running_*andlast_job_*populated- Interaction of
failing_connections_only=Truewithmax_items_limit(e.g., 10 failing among many returns exactly 10)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/cloud_ops.py(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
5b9f2fa
into
main
feat(mcp): Add connection status filtering to list_deployed_cloud_connections
Summary
Extends the existing
list_deployed_cloud_connectionsMCP tool with two new optional parameters to support finding connections in a failed state:with_connection_status: bool = False- When True, includes sync job status info for each connectionfailing_connections_only: bool = False- When True, filters to only return connections where the most recent completed sync failed or was cancelledThe
CloudConnectionResultmodel is extended with optional fields:last_job_status- Status of most recent completed sync (e.g., 'succeeded', 'failed', 'cancelled')last_job_id- Job ID of most recent completed synclast_job_time- ISO 8601 timestamp of most recent completed synccurrently_running_job_id- Job ID of currently running sync (if any)currently_running_job_start_time- Start time of currently running syncThe implementation skips over in-progress syncs to find the last completed job status, which is the intended behavior for identifying failed connections.
Updates since last revision
Annotated[]syntax to wrapdefault=inField()per FastMCP best practicesFAILED_STATUSESconstant fromairbyte.cloud.constantsinstead of bare strings for type-safe enum comparisonReview & Testing Checklist for Human
max_items_limitbehavior change: The limit is now applied AFTER filtering (on results) rather than BEFORE (on connections list). This meansfailing_connections_only=Truewithmax_items_limit=10will return up to 10 failed connections, not filter from the first 10 connections. Confirm this is the desired behavior.FAILED_STATUSES(a set ofJobStatusEnumvalues) - confirmlast_completed_job_status not in FAILED_STATUSESbehaves as expectedRecommended Test Plan
list_deployed_cloud_connections()with default params - should return connections without status fields populatedwith_connection_status=True- should return connections with status fields populatedfailing_connections_only=True- should only return connections with failed/cancelled last synccurrently_running_job_idwhile still reporting the last completed job statusNotes
failing_connections_onlyparameter implicitly enableswith_connection_statussince status info is required for filteringpoe fix-and-check) passRequested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Link to Devin run: https://app.devin.ai/sessions/d3b2767c1ead41c3acf28550efc4516c
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.
Note
Auto-merge may have been disabled. Please check the PR status to confirm.