feat(mcp): Add get_api_docs_urls tool with data.externalDocumentationUrls support#858
Conversation
- Implement new MCP tool to retrieve API documentation URLs for connectors - Support both canonical connector IDs and API names as input - Extract docs URLs from multiple sources: - Registry metadata (Airbyte documentation) - Connector manifest.yaml files (description, metadata.assist.docsUrl, metadata.apiDocs) - Add comprehensive unit tests with 16 test cases - Propose metadata.apiDocs format for storing multiple API docs URLs in manifest.yaml Co-Authored-By: AJ Steers <aj@airbyte.io>
…tionUrls format - Add support for new data.externalDocumentationUrls format in manifest.yaml - Add doc_type field (api_reference, api_release_history, api_deprecations, other) - Add requires_login field for docs requiring authentication - Maintain backward compatibility with metadata.assist.docsUrl and metadata.apiDocs - Update ApiDocsUrl model with Field aliases for proper JSON serialization - Add 3 new unit tests for external docs format (19 total tests passing) - Extract docs from data.externalDocumentationUrls with type and requiresLogin support Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1762889439-get-api-docs-urls-tool' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1762889439-get-api-docs-urls-tool'Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
- Replace 'in' list membership checks with set-based assertions - Use issubset() to verify expected URLs are present - Resolves CodeQL 'Incomplete URL substring sanitization' alert - No functional changes, test still validates URL extraction correctly Co-Authored-By: AJ Steers <aj@airbyte.io>
📝 WalkthroughWalkthroughAdds ApiDocsUrl model and manifest-driven API docs discovery: registry-derived + manifest.yaml parsing, manifest fetch helpers, deduplication, a public get_connector_api_docs_urls function, and an MCP-exposed get_api_docs_urls wrapper plus unit tests. wdyt? Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant MCP as get_api_docs_urls (MCP)
participant API as get_connector_api_docs_urls
participant Registry as Connector Registry
participant Fetcher as _fetch_manifest_dict
participant HTTP as HTTP Client
participant Parser as ApiDocsUrl.from_manifest_dict
Note over MCP,API: high-level flow for docs discovery
User->>MCP: get_api_docs_urls(connector_name)
MCP->>API: delegate and handle not-found
alt connector found
API->>Registry: read registry docs entries
Registry-->>API: registry_docs
API->>Fetcher: compute manifest URL & GET
Fetcher->>HTTP: GET manifest.yaml/json
HTTP-->>Fetcher: payload / 404
Fetcher->>Parser: parse manifest -> manifest_docs
Parser-->>API: manifest_docs
API->>API: combine registry_docs + manifest_docs
API->>API: deduplicate & normalize
API-->>MCP: list[ApiDocsUrl]
MCP-->>User: list[ApiDocsUrl]
else connector not found
API-->>MCP: raises AirbyteConnectorNotRegisteredError
MCP-->>User: "Connector not found."
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/connector_registry.py(2 hunks)tests/unit_tests/test_mcp_connector_registry.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit_tests/test_mcp_connector_registry.py (1)
airbyte/mcp/connector_registry.py (6)
ApiDocsUrl(167-176)ApiDocsUrlsResult(179-184)_extract_urls_from_manifest_description(223-255)_fetch_manifest_docs_urls(308-328)_resolve_connector_name(187-220)get_api_docs_urls(336-395)
airbyte/mcp/connector_registry.py (3)
airbyte/sources/registry.py (2)
get_available_connectors(225-281)get_connector_metadata(195-222)airbyte/sources/util.py (1)
get_source(47-139)airbyte/_util/meta.py (1)
is_docker_installed(193-194)
🪛 GitHub Check: CodeQL
tests/unit_tests/test_mcp_connector_registry.py
[failure] 81-81: Incomplete URL substring sanitization
The string https://dashboard.example.com/ may be at an arbitrary position in the sanitized URL.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
- Split long set literal across multiple lines per ruff style - No functional changes Co-Authored-By: AJ Steers <aj@airbyte.io>
…registry - Add _extract_docs_from_registry() helper function to extract externalDocumentationUrls from connector registry - Integrate registry extraction into get_api_docs_urls tool - Update docstring to reflect new data sources - Tested with source-faker which now returns 3 docs URLs including Python Faker Library Documentation and Faker Changelog from registry externalDocumentationUrls - All 19 unit tests passing - All lint checks passing Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
airbyte/mcp/connector_registry.py (1)
204-218: Hyphenated API names still fail to resolve.This is the same issue flagged in the previous review. When a user supplies "Facebook-Marketing API", the hyphen remains in
search_term(line 204) whileconnector_name_clean(lines 214-216) replaces hyphens with spaces. The comparison then misses, returning "Connector not found." Could we normalize both values to handle hyphens/underscores consistently before comparing, wdyt?Apply this diff to normalize both search_term and connector_name_clean:
search_term = re.sub(r"\s+(api|rest api)$", "", connector_identifier_lower, flags=re.IGNORECASE) +search_term = re.sub(r"[-_]+", " ", search_term) +search_term = re.sub(r"\s+", " ", search_term).strip() for connector_name in available_connectors: metadata = None with contextlib.suppress(Exception): metadata = get_connector_metadata(connector_name) if metadata: pass connector_name_clean = ( connector_name.replace("source-", "").replace("destination-", "").replace("-", " ") ) + connector_name_clean = re.sub(r"\s+", " ", connector_name_clean.lower().strip()) if search_term in connector_name_clean or connector_name_clean in search_term: return connector_name
🧹 Nitpick comments (3)
airbyte/mcp/connector_registry.py (3)
207-212: Remove unused metadata retrieval.The metadata is fetched but never used—lines 211-212 contain a no-op
if metadata: pass. Should we remove this dead code or was there an intended use for it, wdyt?Apply this diff to remove the unused code:
for connector_name in available_connectors: - metadata = None - with contextlib.suppress(Exception): - metadata = get_connector_metadata(connector_name) - - if metadata: - pass - connector_name_clean = ( connector_name.replace("source-", "").replace("destination-", "").replace("-", " ") )
223-255: Consider the scope of standalone URL extraction.The standalone URL pattern (line 239) will match any
https?://URL in the description. If the description contains unrelated URLs (e.g., examples, company homepage), they'll be included as API docs. Would it make sense to limit extraction to certain sections or add filtering logic, wdyt?
331-374: Consider making the registry URL configurable.The registry URL is hardcoded at line 343. If the URL changes or if we need to support different registries (e.g., for testing), this could be inflexible. Would it make sense to extract this to a constant or configuration, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/connector_registry.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/connector_registry.py (3)
airbyte/sources/registry.py (2)
get_available_connectors(225-281)get_connector_metadata(195-222)airbyte/sources/util.py (1)
get_source(47-139)airbyte/_util/meta.py (1)
is_docker_installed(193-194)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (5)
airbyte/mcp/connector_registry.py (5)
7-11: LGTM!The new imports (
re,requests,yaml) are appropriate for the URL extraction, HTTP fetching, and YAML parsing functionality being added.
167-184: LGTM!The data models are well-structured with appropriate field aliases (
doc_type→type,requires_login→requiresLogin) and thepopulate_by_nameconfiguration for flexible input handling.
258-305: LGTM!Comprehensive extraction from multiple manifest sources (
description,data.externalDocumentationUrls,metadata.assist.docsUrl,metadata.apiDocs) with appropriate defensive checks usingisinstance.
308-328: LGTM!Good error handling with specific 404 check, reasonable timeout (10s), and graceful degradation returning an empty list on failures. The broad exception catch is appropriate for this best-effort fetch.
434-439: LGTM!The URL deduplication logic correctly preserves the first occurrence of each unique URL, using a set for efficient lookup. This aligns with the PR objective of "exact-URL deduplication across sources."
- Remove ApiDocsUrlsResult wrapper class, return list[ApiDocsUrl] directly - Remove _resolve_connector_name helper, only accept canonical connector IDs - Remove _extract_urls_from_manifest_description, only parse YAML fields - Remove filtering conditions from externalDocumentationUrls parsing - Fix lint error by breaking long description line Co-Authored-By: AJ Steers <aj@airbyte.io>
- Remove metadata.assist.docsUrl and metadata.apiDocs parsing (non-existent specs) - Remove blanket try/except blocks, let errors raise naturally - Remove tests for hallucinated specs (test_manifest_with_assist_docs_url, test_manifest_with_api_docs, test_manifest_with_mixed_formats, test_manifest_request_error) - Remove test_successful_retrieval test (confusing test with outdated source references) - Update docstring to reflect only supported sources: registry and manifest data.externalDocumentationUrls - 5 focused unit tests now passing Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte/mcp/connector_registry.py (2)
221-259: Consider consistent error handling across sources.Similar to
_fetch_manifest_docs_urls, this function will raise on any HTTP error (line 232), which could preventget_api_docs_urlsfrom returning partial results when the registry is temporarily unavailable but manifest data is accessible.For consistency and resilience, should network errors here also be caught to allow aggregation from available sources? wdyt?
307-311: Should we handle failures per-source to enable partial aggregation?Currently, if
_extract_docs_from_registryfails (line 307), we never reach_fetch_manifest_docs_urls(line 310), and vice versa. This means a network issue with one source causes complete failure even if other sources are available.Given the function's goal to "aggregate URLs from multiple sources," would it make sense to wrap each source's extraction in a try/except to collect available data and only fail if all sources fail? This would align with the PR description's claim of graceful network error handling. wdyt?
- registry_urls = _extract_docs_from_registry(connector_name) - docs_urls.extend(registry_urls) - - manifest_urls = _fetch_manifest_docs_urls(connector_name) - docs_urls.extend(manifest_urls) + with contextlib.suppress(Exception): + registry_urls = _extract_docs_from_registry(connector_name) + docs_urls.extend(registry_urls) + + with contextlib.suppress(Exception): + manifest_urls = _fetch_manifest_docs_urls(connector_name) + docs_urls.extend(manifest_urls)Alternatively, if errors should propagate (per previous maintainer feedback), we could clarify the PR description to remove the "graceful network error handling" claim.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/connector_registry.py(2 hunks)tests/unit_tests/test_mcp_connector_registry.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit_tests/test_mcp_connector_registry.py
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/connector_registry.py (4)
airbyte/mcp/_tool_utils.py (1)
mcp_tool(102-148)airbyte/sources/registry.py (1)
get_available_connectors(225-281)airbyte/sources/util.py (1)
get_source(47-139)airbyte/_util/meta.py (1)
is_docker_installed(193-194)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (5)
airbyte/mcp/connector_registry.py (5)
9-10: LGTM!The
requestsandyamlimports are appropriate for fetching and parsing manifest files and registry data.
166-176: LGTM!The
ApiDocsUrlmodel is well-structured with sensible defaults and flexible field naming through aliases. The use ofpopulate_by_nameallows bothtype/doc_typeandrequiresLogin/requires_loginto work correctly.
178-199: LGTM!The manifest extraction logic correctly navigates the nested structure and applies appropriate defaults for optional fields (
typeandrequiresLogin).
202-218: Verify error handling aligns with PR objectives.The PR description states the tool "handles missing manifests (404s) and network errors gracefully," but currently only 404 responses return an empty list. Other network errors (timeouts, DNS failures, 5xx responses) will propagate via
raise_for_status()on line 215, potentially causingget_api_docs_urlsto fail completely even if registry data is available.Is this the intended behavior, or should we also handle other request exceptions gracefully to ensure partial results can be returned? wdyt?
287-320: The aggregation logic looks solid.The flow correctly:
- Validates connector exists
- Gathers docs from multiple sources
- Deduplicates by URL (keeping first occurrence)
The deduplication strategy (lines 313-318) only checks URL equality, which means if the same URL appears from multiple sources with different titles or types, only the first is kept. This seems intentional for avoiding duplicate links, which makes sense.
…assmethod - Add _manifest_url_for() helper to generate manifest URL - Add _fetch_manifest_dict() helper to fetch and parse manifest from URL - Add ApiDocsUrl.from_manifest_dict() classmethod to extract docs from manifest dict - Remove old _extract_docs_from_manifest() and _fetch_manifest_docs_urls() functions - Update unit tests to test new helper functions and classmethod separately - Add typing_extensions.Self for proper return type annotation Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unit_tests/test_mcp_connector_registry.py (2)
28-58: Consider adding tests for non-404 error cases.The current tests cover the 404 case (returns empty dict) and the success case. It would be valuable to add test coverage for:
- Non-404 HTTP errors (should raise HTTPError after
raise_for_status())- Invalid YAML content (should raise YAMLError)
These would help verify the error propagation behavior mentioned in the PR summary ("errors raise naturally"), wdyt?
143-181: Consider adding tests for error scenarios in external calls.The deduplication test provides good coverage of the happy path. To improve resilience testing, consider adding test cases for:
_extract_docs_from_registryraising an exception (network failure, malformed JSON)_fetch_manifest_dictraising an exception for non-404 errors- Verifying that partial failures don't prevent returning available docs
This would validate that the function gracefully handles failures from individual sources while still aggregating what it can, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/mcp/connector_registry.py(2 hunks)tests/unit_tests/test_mcp_connector_registry.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit_tests/test_mcp_connector_registry.py (1)
airbyte/mcp/connector_registry.py (5)
ApiDocsUrl(204-242)_fetch_manifest_dict(182-201)_manifest_url_for(167-179)get_api_docs_urls(291-346)from_manifest_dict(216-242)
airbyte/mcp/connector_registry.py (4)
airbyte/mcp/_tool_utils.py (1)
mcp_tool(102-148)airbyte/sources/registry.py (1)
get_available_connectors(225-281)airbyte/sources/util.py (1)
get_source(47-139)airbyte/_util/meta.py (1)
is_docker_installed(193-194)
🔇 Additional comments (9)
airbyte/mcp/connector_registry.py (6)
167-179: LGTM!The
_manifest_url_forhelper is straightforward and correctly formats the manifest URL using the connector name and "latest" version.
182-201: Good use ofyaml.safe_loadfor security.The function correctly uses
yaml.safe_load()instead ofyaml.load(), which prevents arbitrary code execution from malicious YAML files. The 404 handling and the fallback to empty dict on line 201 (whenyaml.safe_loadreturnsNonefor empty files) are both appropriate.
204-242: LGTM!The
ApiDocsUrlmodel is well-designed:
- Field aliases properly support both camelCase and snake_case naming conventions via
populate_by_name- The
from_manifest_dictclassmethod uses defensive programming with.get()calls and type checks- Correctly extracts from
data.externalDocumentationUrlsas specified in the PR requirements
339-346: URL-based deduplication preserves first occurrence metadata.The deduplication logic keeps only the first occurrence of each URL and discards subsequent entries with the same URL, even if they have different metadata (title, doc_type, etc.). This means if the registry and manifest both list the same URL with different metadata, only the first one encountered will be returned. Is this the intended behavior, wdyt?
9-10: No action needed—both dependencies are already declared.The
requestsandpyyamlpackages are already present inpyproject.tomlwith explicit version constraints (pyyaml = "^6.0.2"andrequests = "!=3.32.0"). The project also includes their type stubs. The imports are safe and properly backed by project dependencies.
259-264: No action needed—the search logic is sound.The registry data confirms that each
dockerRepositoryis unique, and they consistently follow theorg/connector-nameformat (e.g.,airbyte/source-100ms,farosai/airbyte-victorops-source). Sinceendswith(f"/{connector_name}")matches against the final component, the search will reliably identify the correct entry without ambiguity. The concern about potential duplicates or format inconsistency is not a real issue here.tests/unit_tests/test_mcp_connector_registry.py (3)
17-25: LGTM!Simple and effective smoke test for the URL generation helper. The assertions verify the essential components are present in the generated URL.
61-127: LGTM!Excellent test coverage for the
ApiDocsUrl.from_manifest_dictclassmethod:
- Verifies extraction of all fields including type aliases
- Tests default values for optional fields (
doc_typedefaults to "other",requires_logindefaults to False)- Handles edge case of empty manifest
160-172: Test fixture URL is flagged by security scanners but is safe.The security scanner flagged the URL at line 172 as "Incomplete URL substring sanitization". However, this is a test fixture URL used in mocking, not user-controlled input or production code. This is expected and safe in the test context. If the security scanner continues to flag this, you could add a comment like
# nosecor# test fixture URLto suppress the warning, wdyt?
Resolved conflicts: - Added yaml import alongside new logging and exceptions imports from main - Kept refactored manifest parsing helpers and classmethod - Kept new get_connector_version_history function from main Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
airbyte/mcp/connector_registry.py (1)
178-190: Consider inlining this helper?This is a thin wrapper around
DEFAULT_MANIFEST_URL.format(). Since it's only called once (line 345) and doesn't add much abstraction, would it be simpler to inline it directly at the call site, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/mcp/connector_registry.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte/mcp/connector_registry.py (4)
airbyte/registry.py (1)
get_available_connectors(226-282)airbyte/sources/util.py (1)
get_source(47-139)airbyte/_util/meta.py (1)
is_docker_installed(193-194)airbyte/_executors/python.py (1)
docs_url(108-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
🔇 Additional comments (4)
airbyte/mcp/connector_registry.py (4)
193-212: LGTM!Clean error handling that lets non-404 failures propagate naturally while gracefully handling missing manifests. The
or {}fallback on line 212 is a nice touch for empty responses.
350-356: Question about deduplication strategy?The deduplication only checks URL equality and keeps the first occurrence. If the same URL appears in both registry and manifest sources with different metadata (e.g., different
titleordoc_type), the first one wins and the other metadata is discarded. Is this the intended behavior, or should we consider merging/enriching metadata when the same URL appears multiple times, wdyt?For example, if registry says
title="API Reference"and manifest saystitle="Official API Docs"for the same URL, currently we'd keep "API Reference" and lose "Official API Docs".
342-343: Verify error handling is intentional?
_extract_docs_from_registryhas no error handling and will raise if the registry fetch fails. Based on past comments about letting errors raise naturally, this seems intentional—but wanted to confirm this is the desired behavior (vs. logging and continuing with partial results), wdyt?
297-357: Overall structure looks solid!The aggregation logic cleanly combines docs from multiple sources (Airbyte docs, registry, and manifest), and the deduplication ensures no duplicate URLs in the result. The error handling strategy of letting registry/manifest failures propagate while suppressing connector initialization errors is clear and intentional.
…module - Move ApiDocsUrl class, helper functions, and get_connector_api_docs_urls to airbyte/registry.py - Simplify MCP get_api_docs_urls tool to call registry function - Update _extract_docs_from_registry to use _get_registry_url() and include documentationUrl - Add _DEFAULT_MANIFEST_URL constant to registry.py to avoid circular import - Update tests to import from registry module - All 8 unit tests passing, all lint checks passing Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte/mcp/connector_registry.py(3 hunks)airbyte/registry.py(3 hunks)tests/unit_tests/test_mcp_connector_registry.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte/registry.py (1)
airbyte/exceptions.py (1)
AirbyteConnectorNotRegisteredError(288-297)
tests/unit_tests/test_mcp_connector_registry.py (3)
airbyte/mcp/connector_registry.py (1)
get_api_docs_urls(183-203)airbyte/registry.py (4)
ApiDocsUrl(303-341)_fetch_manifest_dict(359-378)_manifest_url_for(344-356)from_manifest_dict(315-341)airbyte/exceptions.py (1)
AirbyteConnectorNotRegisteredError(288-297)
airbyte/mcp/connector_registry.py (3)
airbyte/registry.py (4)
ApiDocsUrl(303-341)InstallType(48-54)get_available_connectors(232-288)get_connector_api_docs_urls(432-474)airbyte/mcp/_tool_utils.py (1)
mcp_tool(102-148)airbyte/exceptions.py (1)
AirbyteConnectorNotRegisteredError(288-297)
- Removed deduplication logic from get_connector_docs_urls function - Removed test_deduplication_of_urls test - Updated docstring to remove mention of deduplication The deduplication feature was previously introduced in PR #858 but is being removed per product direction. Duplicate URLs from registry and manifest sources will now be preserved in the returned list. Co-Authored-By: AJ Steers <aj@airbyte.io>
refactor(registry): Move manifest parsing logic from MCP to registry module
Summary
Refactored the manifest parsing and API docs URL extraction logic from the MCP module (
airbyte/mcp/connector_registry.py) into the core registry module (airbyte/registry.py) to improve separation of concerns and code organization.Key changes:
ApiDocsUrlclass, helper functions (_manifest_url_for,_fetch_manifest_dict,_extract_docs_from_registry), and created new public functionget_connector_api_docs_urlsinairbyte/registry.pyget_api_docs_urlstool to be a thin wrapper that callsget_connector_api_docs_urlsfrom registry_extract_docs_from_registryto use_get_registry_url()for consistency and includedocumentationUrlfrom registry metadata_DEFAULT_MANIFEST_URLconstant toregistry.pyto avoid circular import with_executors.utilReview & Testing Checklist for Human
_DEFAULT_MANIFEST_URLconstant inregistry.py(instead of importing from_executors.util) is acceptable. This was necessary to avoid a circular import since_executors.utilimports fromregistry.get_api_docs_urlstool with a real connector (e.g.,source-faker) to ensure the refactored code works correctly and returns the expected documentation URLs._DEFAULT_MANIFEST_URLfromregistry.pyin the MCP module is acceptable, or if it should be made public.Notes
poe fix-and-check)Requested by: AJ Steers (aj@airbyte.io) / Aaron ("AJ") Steers (@aaronsteers)
Session: https://app.devin.ai/sessions/02ae774e5a3a4052b5260e136bec5ea0
Summary by CodeRabbit
New Features
Tests