-
Notifications
You must be signed in to change notification settings - Fork 40
feat(low code): Add GroupingPartitionRouter #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Anatolii Yatsuk (tolik0)
merged 16 commits into
main
from
tolik0/add-grouping-partition-router
Mar 20, 2025
Merged
Changes from 1 commit
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
96ee457
Add GroupingPartitionRouter
tolik0 cbf6328
Fix parameter name
tolik0 26455ca
Add extra fields
tolik0 b35a165
Add unit tests
tolik0 2348d25
Fix import in unit tests
tolik0 cf782a2
Fix formatting
tolik0 4d8c918
Add unit test for model_to_component_factory.py
tolik0 6cb895e
Fix mypy errors
tolik0 fd9b225
Add unit tests for error scenarios
tolik0 01fb6c8
Fix comments from review
tolik0 1a1d407
Fix comment
tolik0 fce9b9e
Merge branch 'main' into tolik0/add-grouping-partition-router
tolik0 e7bb9c0
Add correct state handling
tolik0 85d041b
Force global state for GroupingPartitionRouter
tolik0 93cf934
Add error for partition with multiple values
tolik0 e152d70
Fix error for partition with multiple values
tolik0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
airbyte_cdk/sources/declarative/partition_routers/grouping_partition_router.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| # | ||
| # Copyright (c) 2023 Airbyte, Inc., all rights reserved. | ||
| # | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Any, Iterable, Mapping, Optional | ||
|
|
||
| from airbyte_cdk.sources.declarative.partition_routers.partition_router import PartitionRouter | ||
| from airbyte_cdk.sources.types import Config, StreamSlice, StreamState | ||
|
|
||
|
|
||
| @dataclass | ||
| class GroupingPartitionRouter(PartitionRouter): | ||
| """ | ||
| A partition router that groups partitions from an underlying partition router into batches of a specified size. | ||
| This is useful for APIs that support filtering by multiple partition keys in a single request. | ||
|
|
||
| Attributes: | ||
| group_size (int): The number of partitions to include in each group. | ||
| underlying_partition_router (SinglePartitionRouter): The partition router whose output will be grouped. | ||
| deduplicate (bool): If True, ensures unique partitions within each group by removing duplicates based on the partition key. | ||
| config (Config): The connector configuration. | ||
| parameters (Mapping[str, Any]): Additional parameters for interpolation and configuration. | ||
| """ | ||
|
|
||
| group_size: int | ||
| underlying_partition_router: PartitionRouter | ||
| config: Config | ||
| deduplicate: bool = True | ||
|
|
||
| def stream_slices(self) -> Iterable[StreamSlice]: | ||
| """ | ||
| Lazily groups partitions from the underlying partition router into batches of size `group_size`. | ||
|
|
||
| This method processes partitions one at a time from the underlying router, maintaining a batch buffer. | ||
| When the buffer reaches `group_size` or the underlying router is exhausted, it yields a grouped slice. | ||
| If deduplication is enabled, it tracks seen partition keys to ensure uniqueness within the current batch. | ||
|
|
||
| Yields: | ||
| Iterable[StreamSlice]: An iterable of StreamSlice objects, where each slice contains a batch of partition values. | ||
| """ | ||
| batch = [] | ||
| seen_keys = set() if self.deduplicate else None | ||
|
|
||
| # Iterate over partitions lazily from the underlying router | ||
| for partition in self.underlying_partition_router.stream_slices(): | ||
| # Extract the partition key (assuming single key-value pair, e.g., {"board_ids": value}) | ||
maxi297 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| key = next(iter(partition.partition.values()), None) | ||
|
|
||
| # Skip duplicates if deduplication is enabled | ||
| if self.deduplicate and key in seen_keys: | ||
| continue | ||
tolik0 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Add partition to the batch | ||
| batch.append(partition) | ||
| if self.deduplicate: | ||
| seen_keys.add(key) | ||
tolik0 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Yield the batch when it reaches the group_size | ||
| if len(batch) == self.group_size: | ||
| yield self._create_grouped_slice(batch) | ||
| batch = [] # Reset the batch | ||
|
|
||
| # Yield any remaining partitions if the batch isn't empty | ||
| if batch: | ||
| yield self._create_grouped_slice(batch) | ||
|
|
||
| def _create_grouped_slice(self, batch: list[StreamSlice]) -> StreamSlice: | ||
| # Combine partition values into a single dict with lists | ||
| grouped_partition = { | ||
| key: [p.partition.get(key) for p in batch] for key in batch[0].partition.keys() | ||
| } | ||
tolik0 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| return StreamSlice( | ||
| partition=grouped_partition, | ||
| cursor_slice={}, # Cursor is managed by the underlying router or incremental sync | ||
| ) | ||
|
|
||
| def get_request_params( | ||
| self, | ||
| stream_state: Optional[StreamState] = None, | ||
| stream_slice: Optional[StreamSlice] = None, | ||
| next_page_token: Optional[Mapping[str, Any]] = None, | ||
| ) -> Mapping[str, Any]: | ||
| return {} | ||
|
|
||
| def get_request_headers( | ||
| self, | ||
| stream_state: Optional[StreamState] = None, | ||
| stream_slice: Optional[StreamSlice] = None, | ||
| next_page_token: Optional[Mapping[str, Any]] = None, | ||
| ) -> Mapping[str, Any]: | ||
| return {} | ||
|
|
||
| def get_request_body_data( | ||
| self, | ||
| stream_state: Optional[StreamState] = None, | ||
| stream_slice: Optional[StreamSlice] = None, | ||
| next_page_token: Optional[Mapping[str, Any]] = None, | ||
| ) -> Mapping[str, Any]: | ||
| return {} | ||
|
|
||
| def get_request_body_json( | ||
| self, | ||
| stream_state: Optional[StreamState] = None, | ||
| stream_slice: Optional[StreamSlice] = None, | ||
| next_page_token: Optional[Mapping[str, Any]] = None, | ||
| ) -> Mapping[str, Any]: | ||
| return {} | ||
|
|
||
| def set_initial_state(self, stream_state: StreamState) -> None: | ||
| """Delegate state initialization to the underlying partition router.""" | ||
| self.underlying_partition_router.set_initial_state(stream_state) | ||
|
|
||
| def get_stream_state(self) -> Optional[Mapping[str, StreamState]]: | ||
| """Delegate state retrieval to the underlying partition router.""" | ||
| return self.underlying_partition_router.get_stream_state() | ||
maxi297 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.