Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Dec 8, 2025

fix: Respect explicit use_cache: false for parent streams in declarative sources

Summary

The _initialize_cache_for_parent_streams method was unconditionally overwriting use_cache=True for any stream identified as a parent stream. This caused issues for APIs that use scroll-based pagination (like Intercom's /companies/scroll endpoint), where caching must be disabled because the same scroll_param is returned in pagination responses, causing duplicate records and infinite pagination loops.

This PR modifies the method to respect explicit use_cache: false settings in the manifest while still defaulting to True for parent streams that don't specify a value. The key logic uses requester.get("use_cache") is not False to distinguish between:

  • False (explicit) → keep False
  • None (not set) → set to True
  • True (explicit) → keep True

The fix is applied to both concurrent_declarative_source.py and manifest_declarative_source.py.

Updates since last revision

Per review feedback:

  • Simplified helper function by removing _should_enable_cache and inlining the check directly in _set_cache_if_not_disabled
  • Updated use_cache field description in declarative_component_schema.yaml to remove "This field is automatically set by the CDK" and add warning about performance implications when disabling caching

Review & Testing Checklist for Human

  • Verify the is not False identity check correctly handles all cases (explicit False, None/unset, explicit True)
  • Confirm the StateDelegatingStream branches are properly handled (both full_refresh_stream and incremental_stream requesters)
  • Test with the actual Intercom connector (or similar scroll-based API) to verify the fix resolves duplicate records issue
  • Verify existing test_only_parent_streams_use_cache tests still pass to ensure backward compatibility

Recommended test plan: After merging, update the Intercom connector's manifest to set use_cache: false on the companies stream and verify that syncs no longer produce duplicate records.

Notes

  • Fixes: airbytehq/oncall#8346
  • Note: Reviewer mentioned 3 community sources already explicitly set use_cache: false - they will now have their settings respected
  • Note: Changes to legacy manifest_declarative_source.py may not be actively used but kept for consistency
  • Link to Devin run: https://app.devin.ai/sessions/e11958025bc64d19ba291fc6a0aa7511
  • Requested by: unknown ()

…ive sources

This change modifies the _initialize_cache_for_parent_streams method to
respect explicit use_cache: false settings in the manifest while still
defaulting to True for parent streams that don't specify a value.

This is important for APIs that use scroll-based pagination (like
Intercom's /companies/scroll endpoint), where caching must be disabled
because the same scroll_param is returned in pagination responses,
causing duplicate records and infinite pagination loops.

Fixes: airbytehq/oncall#8346
Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

Original prompt from API User
Comment from @agarctfi: /ai-fix 
# Fix: Respect `use_cache: false` for Parent Streams in Declarative Sources

## Problem Statement

The `_initialize_cache_for_parent_streams` method in the Airbyte Python CDK unconditionally overwrites `use_cache=True` for any stream identified as a parent stream. This causes issues for APIs that use scroll-based pagination (like Intercom's `/companies/scroll` endpoint), where caching must be disabled because:

1. The same `scroll_param` is returned in pagination responses
2. Caching causes the same response to be returned repeatedly
3. This results in duplicate records and infinite pagination loops

### Current Behavior (Bug)
Even when a manifest explicitly sets `use_cache: false`, the CDK overwrites it to `True` at runtime for parent streams.

### Expected Behavior
The CDK should respect explicit `use_cache: false` settings in the manifest while still defaulting to `True` for parent streams that don't specify a value.

## Files to Modify

1. `/airbyte_cdk/sources/declarative/concurrent_declarative_source.py`
2. `/airbyte_cdk/legacy/sources/declarative/manifest_declarative_source.py`

## Implementation

### Change 1: `concurrent_declarative_source.py`

Find the `_initialize_cache_for_parent_streams` static method and modify it to respect explicit `use_cache=False` settings.

**Before:**
```python
@staticmethod
def _initialize_cache_for_parent_streams(
 stream_configs: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
 parent_streams = set()
 for stream_config in stream_configs:
 parent_configs = (
 stream_config.get("retriever", {})
 .get("partition_router", {})
 .get("parent_stream_configs", [])
 )
 for parent_config in parent_configs:
 parent_streams.add(parent_config.get("stream", {}).get("name"))

 parent_requester = (
 parent_config.get("stream", {}).get("retriever", {}).get("requester")
 )
 if parent_requester:
 parent_requester["use_cache"] = True

 for stream_config in stream_configs:
 if stream_config.get("name") in parent_s... (11236 chars truncated...)

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added bug Something isn't working security labels Dec 8, 2025
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1765227197-respect-use-cache-false#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1765227197-respect-use-cache-false

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Co-Authored-By: unknown <>
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

PyTest Results (Fast)

3 820 tests  +2   3 808 ✅ +2   6m 31s ⏱️ +16s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 41329e0. ± Comparison against base commit daf7d48.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

PyTest Results (Full)

3 823 tests  +2   3 811 ✅ +2   11m 1s ⏱️ +8s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 41329e0. ± Comparison against base commit daf7d48.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I wanted to note is that we actually do have what looks to be 3 community sources that explicitly set the requesters of some streams to use_cache: false. I don't have any context into why, but they'll be affected. Doesn't block this release since the changes make sense.

Also, we should edit the schema description in

description: Enables stream requests caching. This field is automatically set by the CDK.
to remove This field is automatically set by the CDK. And add something about only set this to false if they're absolute certain that requests do not need to be cached cuz it will have negative performance imlications when caching is turned off

def _initialize_cache_for_parent_streams(
stream_configs: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Enable caching for parent streams unless explicitly disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting that I didn't review this file. The legacy manifest_declarative_source.py shouldn't be used anywhere at all in our code so this code path should never be invoked as far as I'm aware.

I don't mind leaving the changes in, but they also may not be needed

def _set_cache_if_not_disabled(requester: Dict[str, Any]) -> None:
"""Set use_cache to True only if not explicitly disabled."""
if _should_enable_cache(requester):
requester["use_cache"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only use _should_enable_cache once, we probably don't need the separate helper method and can instead just do:

def _set_cache_if_not_disabled(requester: Dict[str, Any]) -> None:
  """Set use_cache to True only if not explicitly disabled."""
  if requester.get("use_cache") is not False:
      requester["use_cache"] = True

@devin-ai-integration
Copy link
Contributor Author

Thanks for the review feedback! I've addressed the comments in commit 41329e0:

  1. Simplified helper function: Removed the _should_enable_cache helper and inlined the check directly in _set_cache_if_not_disabled as suggested.

  2. Updated schema description: Updated the use_cache field description in declarative_component_schema.yaml to:

    • Remove "This field is automatically set by the CDK"
    • Add warning about only setting to false if absolutely certain, as it has negative performance implications
  3. Legacy file: Acknowledged the note about manifest_declarative_source.py - I've kept the changes there for consistency, but understand it may not be actively used.


This comment was created by Devin AI on behalf of unknown ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants