-
Notifications
You must be signed in to change notification settings - Fork 180
Aggregation joins #1569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
samukweku
wants to merge
51
commits into
dev
Choose a base branch
from
aggregation-joins
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Aggregation joins #1569
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added win-64 platform dependencies to the 'default' and 'chemistry' environments in the pixi.lock file, enabling Windows compatibility. Updated pyproject.toml to reflect these changes.
Refactored _multiple_conditional_join_ne to use _not_equal_indices directly and improved handling of empty index results. Removed unused parameters and streamlined output formatting for consistency.
Major refactor of conditional join internals for improved performance and maintainability. Adds optimized index calculation for equi and non-equi joins, introduces binary search helpers, and removes legacy pandas merge code. Updates error handling, code style, and test coverage for new join logic.
Lowered the max_examples parameter from 600 to 10 in all hypothesis-based tests in test_conditional_join.py. This change speeds up test execution, likely for faster development cycles or to avoid long runtimes during CI.
Replaces local janitor_rs wheel references with platform-specific URLs from PyPI in pixi.lock. This ensures that the correct prebuilt wheels are used for various environments and platforms.
Moved several helper imports from utils to _helpers module for better organization. Updated type hints for clarity and consistency. Added deprecation notes and comments regarding numba support, indicating that numba-based implementations are no longer maintained or supported.
Deprecated warnings for the df_columns and right_columns arguments have been removed from the conditional_join function's docstring. This streamlines the documentation and removes redundant warning messages.
Added a deprecation warning for numba support in _conditional_join_preliminary_checks. Replaced deprecated 'select' method with 'select_columns' in _create_frame to ensure compatibility with updated DataFrame API.
Introduces the join_agg method for computing aggregations (sum, min, max, size, prod) on the right DataFrame during conditional joins. Refactors and extends internal join logic to support aggregation, adds helper functions for aggregation, and updates input validation and tests accordingly.
Co-authored-by: Copilot <[email protected]>
Introduces janitor/functions/_conditional_join/_agg_functions.py with a comprehensive set of aggregation functions (sum, min, max, prod) for use in conditional joins, leveraging janitor_rs for efficient computation. This supports both start/end and match-based aggregations for various numeric types.
…ormat tests Removed unnecessary print statements from _conditional_join_compute and added usage examples to the join_agg docstring. Reformatted and improved readability of test cases in test_conditional_join.py, including better line breaks for complex expressions and consistent formatting for DataFrame creation.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Replaces outdated references to the 'select' syntax with 'select_columns' in the conditional_join function documentation for clarity and accuracy.
|
Removed redundant line breaks in docstring for clarity.
Updated return types and docstring for clarity.
Clarify the docstring to explain the aggregation process after a join, including supported functions.
Updated the conditional_join function to accept variable arguments for conditions as tuples instead of a list. Adjusted related documentation to reflect the change.
Removed commented-out return statement for clarity.
…anitor-devs/pyjanitor into aggregation-joins
14 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Description
Please describe the changes proposed in the pull request:
This PR resolves #1497 .