-
Notifications
You must be signed in to change notification settings - Fork 32
fix: revert connector builder limitation wrongly applied to all the streams #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix-faulty-merge#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix-faulty-mergeHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughThe stream slicer passed to StreamSlicerPartitionGenerator in create_declarative_stream was changed from a StreamSlicerTestReadDecorator-wrapped combined_slicers (enforcing a per-slice limit) to the raw combined_slicers, effectively removing the per-slice fetch limit. No public signatures were altered. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Config
participant Factory as model_to_component_factory
participant Slicer as StreamSlicerPartitionGenerator
participant Source
Note over Factory: New flow (no decorator)
Config->>Factory: create_declarative_stream()
Factory->>Factory: build combined_slicers
Factory->>Slicer: init(stream_slicer=combined_slicers)
Slicer->>Source: generate partitions (no per-slice cap)
Source-->>Slicer: slices/partitions
sequenceDiagram
autonumber
participant Config
participant Factory as model_to_component_factory
participant Decorator as StreamSlicerTestReadDecorator
participant Slicer as StreamSlicerPartitionGenerator
participant Source
Note over Factory,Decorator: Previous flow (with test-read limit)
Config->>Factory: create_declarative_stream()
Factory->>Factory: build combined_slicers
Factory->>Decorator: wrap(combined_slicers, max_number_of_slices)
Decorator->>Slicer: init(stream_slicer=wrapped_slicer)
Slicer->>Source: generate limited partitions
Source-->>Slicer: slices/partitions
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to add or update tests to cover behavior without the per-slice limit, including large-slice scenarios and regression checks for test-read paths, wdyt? Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2081-2090: Confirm intent: removing the StreamSlicerTestReadDecorator here also disables test slice limiting for DefaultStreamPassing combined_slicers directly fixes the over-eager “connector builder” limit being applied everywhere. However, it also disables honoring _limit_slices_fetched for streams that go through the DefaultStream path (where previously the decorator capped slices). Do you want to retain slice limiting only when _should_limit_slices_fetched() is true (i.e., in tests or builder), similar to create_simple_retriever/AsyncRetriever, while keeping it off by default for production? If yes, this conditional keeps test ergonomics without reintroducing the KB/CB bug, wdyt?
partition_generator = StreamSlicerPartitionGenerator( DeclarativePartitionFactory( stream_name, schema_loader, retriever, self._message_repository, ), - stream_slicer=combined_slicers, + stream_slicer=( + cast( + StreamSlicer, + StreamSlicerTestReadDecorator( + wrapped_slicer=combined_slicers, + maximum_number_of_slices=self._limit_slices_fetched or 5, + ), + ) + if self._should_limit_slices_fetched() + else combined_slicers + ), )Can you confirm whether any tests rely on _limit_slices_fetched being enforced for streams that hit this DefaultStream path? If so, the conditional above will preserve that behavior only under test/CB modes. Otherwise, we can keep your current change as-is. Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
lmossman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM, did not test
What
We are applying Connector Builder stream slice limit to real syncs
Repro steps:
Before the change:
{"type":"LOG","log":{"level":"INFO","message":"Read 10 records from pokemon stream"}}After the change:
{"type":"LOG","log":{"level":"INFO","message":"Read 12 records from pokemon stream"}}How
Remove the casting of the StreamSlicer. This should be fine for Connector Builder stuff because we will only take this code path when it is not Connector Builder stuff (see this condition).
Summary by CodeRabbit
Bug Fixes
Performance
Stability