-
Notifications
You must be signed in to change notification settings - Fork 32
fix: Respect explicit use_cache: false for parent streams in declarative sources #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Respect explicit use_cache: false for parent streams in declarative sources #864
Conversation
…ive sources This change modifies the _initialize_cache_for_parent_streams method to respect explicit use_cache: false settings in the manifest while still defaulting to True for parent streams that don't specify a value. This is important for APIs that use scroll-based pagination (like Intercom's /companies/scroll endpoint), where caching must be disabled because the same scroll_param is returned in pagination responses, causing duplicate records and infinite pagination loops. Fixes: airbytehq/oncall#8346 Co-Authored-By: unknown <>
Original prompt from API User |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1765227197-respect-use-cache-false#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1765227197-respect-use-cache-falseHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: unknown <>
brianjlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I wanted to note is that we actually do have what looks to be 3 community sources that explicitly set the requesters of some streams to use_cache: false. I don't have any context into why, but they'll be affected. Doesn't block this release since the changes make sense.
Also, we should edit the schema description in
airbyte-python-cdk/airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Line 2289 in daf7d48
| description: Enables stream requests caching. This field is automatically set by the CDK. |
This field is automatically set by the CDK. And add something about only set this to false if they're absolute certain that requests do not need to be cached cuz it will have negative performance imlications when caching is turned off
| def _initialize_cache_for_parent_streams( | ||
| stream_configs: List[Dict[str, Any]], | ||
| ) -> List[Dict[str, Any]]: | ||
| """Enable caching for parent streams unless explicitly disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting that I didn't review this file. The legacy manifest_declarative_source.py shouldn't be used anywhere at all in our code so this code path should never be invoked as far as I'm aware.
I don't mind leaving the changes in, but they also may not be needed
| def _set_cache_if_not_disabled(requester: Dict[str, Any]) -> None: | ||
| """Set use_cache to True only if not explicitly disabled.""" | ||
| if _should_enable_cache(requester): | ||
| requester["use_cache"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only use _should_enable_cache once, we probably don't need the separate helper method and can instead just do:
def _set_cache_if_not_disabled(requester: Dict[str, Any]) -> None:
"""Set use_cache to True only if not explicitly disabled."""
if requester.get("use_cache") is not False:
requester["use_cache"] = True
…review feedback Co-Authored-By: unknown <>
|
Thanks for the review feedback! I've addressed the comments in commit 41329e0:
This comment was created by Devin AI on behalf of unknown () |
fix: Respect explicit use_cache: false for parent streams in declarative sources
Summary
The
_initialize_cache_for_parent_streamsmethod was unconditionally overwritinguse_cache=Truefor any stream identified as a parent stream. This caused issues for APIs that use scroll-based pagination (like Intercom's/companies/scrollendpoint), where caching must be disabled because the samescroll_paramis returned in pagination responses, causing duplicate records and infinite pagination loops.This PR modifies the method to respect explicit
use_cache: falsesettings in the manifest while still defaulting toTruefor parent streams that don't specify a value. The key logic usesrequester.get("use_cache") is not Falseto distinguish between:False(explicit) → keepFalseNone(not set) → set toTrueTrue(explicit) → keepTrueThe fix is applied to both
concurrent_declarative_source.pyandmanifest_declarative_source.py.Updates since last revision
Per review feedback:
_should_enable_cacheand inlining the check directly in_set_cache_if_not_disableduse_cachefield description indeclarative_component_schema.yamlto remove "This field is automatically set by the CDK" and add warning about performance implications when disabling cachingReview & Testing Checklist for Human
is not Falseidentity check correctly handles all cases (explicit False, None/unset, explicit True)StateDelegatingStreambranches are properly handled (bothfull_refresh_streamandincremental_streamrequesters)test_only_parent_streams_use_cachetests still pass to ensure backward compatibilityRecommended test plan: After merging, update the Intercom connector's manifest to set
use_cache: falseon the companies stream and verify that syncs no longer produce duplicate records.Notes
use_cache: false- they will now have their settings respectedmanifest_declarative_source.pymay not be actively used but kept for consistency