feat(cloud): Add CloudConnection methods: get_state_artifacts() and get_catalog_artifact()#906
Conversation
…n to CloudConnection Add two new methods to CloudConnection for retrieving connection artifacts: - get_state_artifact_json(): Returns the persisted state as JSON string - get_catalog_artifact_json(): Returns the configured catalog (syncCatalog) as JSON string These methods use the Config API endpoints: - /state/get for state retrieval - /web_backend/connections/get for catalog retrieval This enables live testing workflows to fetch connection artifacts without requiring direct backend database access. Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1765418767-cloud-connection-artifacts' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1765418767-cloud-connection-artifacts'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
CloudConnection methods: get_state_artifact_json and get_catalog_artifact_json
CloudConnection methods: get_state_artifact_json and get_catalog_artifact_jsonCloudConnection methods: get_state_artifact_json() and get_catalog_artifact_json()
|
Warning Rate limit exceededdevin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 18 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds two Config API helpers in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/cloud/connections.py (2)
286-301: Clean implementation, but consider error context enrichment?The method correctly delegates to the API utility function and formats the response as JSON. The pretty-printing with indent=2 is a nice touch for readability.
One small consideration: if the API call fails, the error won't include the connection_id in its context. Would it be worth wrapping this in a try/except to add connection-specific context, wdyt? For example:
def get_state_artifact_json(self) -> str: """Get the connection state as a JSON string. Returns the persisted state for this connection, which can be used for incremental syncs or live testing. Returns: JSON string containing the connection state. """ - state_response = api_util.get_connection_state( - connection_id=self.connection_id, - api_root=self.workspace.api_root, - client_id=self.workspace.client_id, - client_secret=self.workspace.client_secret, - ) - return json.dumps(state_response, indent=2) + try: + state_response = api_util.get_connection_state( + connection_id=self.connection_id, + api_root=self.workspace.api_root, + client_id=self.workspace.client_id, + client_secret=self.workspace.client_secret, + ) + return json.dumps(state_response, indent=2) + except AirbyteError as ex: + raise AirbyteError( + message=f"Failed to retrieve state for connection {self.connection_id}", + context={"connection_id": self.connection_id}, + ) from exNot strictly necessary since the API already provides context, but it could help with debugging.
286-319: Consider adding tests for the new artifact methods?The PR description mentions that unit/integration tests are not included. Since these methods interact with external APIs and parse responses, testing would help ensure:
- Correct handling of successful responses
- Graceful handling when syncCatalog is missing
- Proper error propagation from the API layer
Would you like me to help generate test cases for these two methods? I can create test stubs that mock the api_util functions and verify the JSON serialization behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/_util/api_util.py(1 hunks)airbyte/cloud/connections.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte/_util/api_util.py (1)
airbyte/secrets/base.py (1)
SecretString(38-143)
airbyte/cloud/connections.py (1)
airbyte/_util/api_util.py (2)
get_connection_state(1780-1806)get_connection_catalog(1809-1838)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
airbyte/cloud/connections.py (2)
6-6: LGTM! Standard library import added correctly.The json import is placed appropriately in the imports section.
303-319: Looks good! syncCatalog fallback is handled appropriately.The use of
.get("syncCatalog", {})provides a sensible default when the field is missing. This aligns with the documented behavior from the Config API endpoint, which returns connection info including the syncCatalog field. The defensive approach matches the pattern used elsewhere in the codebase for optional dictionary fields.Regarding error context enrichment—would adding logging or exception handling (e.g., if the API response is unexpectedly malformed) be helpful here, similar to other artifact retrieval patterns, wdyt?
airbyte/_util/api_util.py (1)
1780-1806: Clean implementation that follows established patterns.The function correctly delegates to
_make_config_api_requestand the endpoint path and payload structure align with the PR description. The docstring is clear and complete.One thought: Since the PR description mentions Config API endpoints previously returned 403 with standard OAuth credentials, would you want to add a quick integration test or manual verification that this specific endpoint works as expected with the current authentication setup? That way we can be confident it's compatible before merging. Wdyt?
…eturn types - Rename methods from get_*_artifact_json to get_*_artifact - get_state_artifact: returns list[dict[str, Any]] | None - Returns streamState array directly (not the envelope) - Returns None when stateType is 'not_set' - get_catalog_artifact: returns dict[str, Any] - Returns syncCatalog dict directly - Remove json import (no longer needed) Co-Authored-By: AJ Steers <aj@airbyte.io>
Pluralize method name since it returns a list of state objects. Co-Authored-By: AJ Steers <aj@airbyte.io>
CloudConnection methods: get_state_artifact_json() and get_catalog_artifact_json()CloudConnection methods: get_state_artifacts() and get_catalog_artifact()
…f {}
Return None when syncCatalog is not found in the response, instead of
returning an empty dict. This addresses bot feedback for consistency.
Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/cloud/connections.py (2)
283-305: Clarify behavior for non-stream state types and the JSON/string vs. Python-object contract
get_state_artifacts()looks clean and lines up withapi_util.get_connection_state, but a couple of edge cases might be worth double‑checking:
- Line [302]: we only treat
stateType == "not_set"as meaning “no state” and returnNone. For otherstateTypevalues (e.g. legacy/global) the method will quietly return[]ifstreamStateisn’t present, which might silently drop valid state. Would it be safer either to (a) branch onstateTypeand surface the relevant field, or (b) return the fullstate_responsewhenstateTypeisn’t"stream", so callers can decide, wdyt?- Lines [294]-[305]: the docstring promises “persisted state for this connection”, but the implementation returns only
streamState. If non-stream state types can occur for these Cloud connections, should we document that this is intentionally “per‑stream artifacts only”, or broaden the return shape, wdyt?- PR objectives describe methods named
get_state_artifact_json()returning JSON strings; here we exposeget_state_artifacts()returning Python objects. Is this divergence intentional so that higher‑level callers canjson.dumps()as needed, or do you want the public API to stick with the original JSON-string contract and naming, wdyt?Given this is new surface area, would a small unit/integration test around the
stateType == "not_set"behavior and a “normal streamState present” case be worth adding to lock in expectations, wdyt?
306-323: Confirm semantics whensyncCatalogis missing and alignment with the original JSON-centric API designThe basic flow in
get_catalog_artifact()looks good and nicely reusesapi_util.get_connection_catalog, but a couple of points might merit alignment/clarification:
- Line [323]: returning
{}when"syncCatalog"is absent makes the return type alwaysdict[str, Any], but it doesn’t distinguish “no catalog configured” from “empty catalog”. Would it be clearer to returnNonefor the “no catalog” case (or raise) and document that, or is{}the intended sentinel here, wdyt?- Similar to the state method, the PR objectives talk about a
get_catalog_artifact_json()that returns a JSON string, whereas this method returns a Pythondict. Is that an intentional API design change (prefer structured access; let callers serialize if needed), or should this be renamed/adjusted to match the JSON-string contract, wdyt?If downstream tooling (e.g., your live testing workflows) expects a specific shape for
syncCatalog, it might be worth capturing that expectation in a small test around this helper as well, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/cloud/connections.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/cloud/connections.py (1)
airbyte/_util/api_util.py (2)
get_connection_state(1780-1806)get_connection_catalog(1809-1838)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/cloud/connections.py (1)
6-6: ImportingAnyfits the new artifact-returning methodsPulling in
Anyhere to type the JSON-like artifact payloads from the Config API looks appropriate and consistent with the rest of the file; I don’t see any issues with this change.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/cloud/connections.py (2)
302-304: Consider the empty list vs None semantics.When
stateTypeis"not_set", the method returnsNone(line 303), but whenstreamStateis missing from the response, it returns an empty list[](line 304 default value). This could create semantic ambiguity: does[]mean "no streams" or "state exists but is empty"?If you want strict None-only behavior for "no state," you might consider checking if
streamStateexists explicitly. But if empty list is intentional for missing keys, that's fine too—just worth clarifying in the docstring. Wdyt?Example alternative if you want consistent None behavior:
if state_response.get("stateType") == "not_set": return None - return state_response.get("streamState", []) + stream_state = state_response.get("streamState") + return stream_state if stream_state is not None else None
285-323: Optional: Consider naming consistency between the two methods.One method is plural (
get_state_artifacts()) while the other is singular (get_catalog_artifact()). This isn't wrong, but it might be clearer if they followed the same pattern—either both plural or both singular. For instance,get_state_artifacts()andget_catalog_artifacts(), orget_state_artifact()andget_catalog_artifact(). Just a thought—wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/cloud/connections.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/cloud/connections.py (1)
airbyte/_util/api_util.py (2)
get_connection_state(1780-1806)get_connection_catalog(1809-1838)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
airbyte/cloud/connections.py (3)
6-6: LGTM! Import addition looks good.The
Anyimport is properly placed in theTYPE_CHECKINGblock and is necessary for the new method signatures below.
285-304: Heads up: Implementation differs from PR objectives.The PR objectives mention a method named
get_state_artifact_json()that returns a JSON string, but the implementation isget_state_artifacts()(plural, no_jsonsuffix) returning a Python list. While the current implementation looks functionally correct, this discrepancy might confuse reviewers or users expecting the API described in the PR summary. Wdyt about updating the PR description to match the actual implementation?
306-323: Same naming discrepancy with PR objectives.Similar to
get_state_artifacts(), the PR objectives describeget_catalog_artifact_json()returning a JSON string, but the implementation isget_catalog_artifact()returning a Python dict. The implementation looks solid, but the PR description should be updated to reflect what was actually built. Wdyt?
…catalog
Adds a new MCP tool that retrieves connection artifacts (state or catalog)
from Airbyte Cloud connections.
- get_connection_artifact(connection_id, workspace_id, artifact_type)
- Returns state as list[dict] or catalog as dict
- Returns {"ERROR": "..."} if artifact not found
Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/cloud/connections.py (2)
6-6: Import ofAnylooks appropriate; consider a lightweight alias for artifact shapes?Using
Anyhere makes sense given the flexible shapes coming back from the Config API. To make downstream usage a bit clearer, would it be worth introducing small type aliases (e.g.,StateArtifact = dict[str, Any],CatalogArtifact = dict[str, Any]) and using those in the method signatures, instead of rawdict[str, Any]everywhere, wdyt?
306-324: Catalog helper looks good; confirm public contract (Python structures vs JSON) and consider minimal testsThe
get_catalog_artifactimplementation itself looks straightforward and consistent with how other helpers wrapapi_util—it just returnsconnection_response.get("syncCatalog"), which aligns with the docstring (“Dictionary containing the configured catalog, orNoneif not found.”).Two things you might want to double‑check:
Both
get_state_artifactsandget_catalog_artifactcurrently return Python lists/dicts, not JSON strings. Given the PR description mentions “JSON” artifacts, is the intent that callers work with native Python structures (and calljson.dumpsthemselves if they truly need text), or should these helpers actually serialize to JSON before returning? If the native-structure contract is preferred (which seems cleaner from a Python SDK perspective), maybe we just ensure any external docs/examples are aligned, wdyt?Since these are new CloudConnection surface-area methods, would a tiny unit test that patches
api_util.get_connection_state/get_connection_catalogto return canned dicts be worthwhile, just to lock in the behavior aroundNonevs present values and avoid regressions later, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/cloud/connections.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/cloud/connections.py (1)
airbyte/_util/api_util.py (2)
get_connection_state(1780-1806)get_connection_catalog(1809-1838)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
Summary
Adds two new methods to
CloudConnectionfor retrieving connection artifacts as JSON strings:get_state_artifact_json()- Returns the persisted connection state via Config API/state/getget_catalog_artifact_json()- Returns the configured catalog (syncCatalog) via Config API/web_backend/connections/getThese methods enable live testing workflows to fetch connection artifacts without requiring direct backend database access.
Updates since last revision
stateType,connectionIdfieldssyncCatalogcontaining 39 streamsReview & Testing Checklist for Human
stateType: not_set; verify behavior with actual incremental statesyncCatalogstructure matches expected format - Confirm the returned catalog is compatible with connector live test requirementsRecommended test plan:
Notes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.