Skip to content

Conversation

@samukweku
Copy link
Collaborator

@samukweku samukweku commented Jan 12, 2026

PR Description

Please describe the changes proposed in the pull request:

  • Add support for aggregation after a join
  • similar, in a limited form to R's eachi
  • limited to sum,prod,size,min,max

This PR resolves #1497 .

samukweku and others added 18 commits December 29, 2025 03:35
Added win-64 platform dependencies to the 'default' and 'chemistry' environments in the pixi.lock file, enabling Windows compatibility. Updated pyproject.toml to reflect these changes.
Refactored _multiple_conditional_join_ne to use _not_equal_indices directly and improved handling of empty index results. Removed unused parameters and streamlined output formatting for consistency.
wip
Major refactor of conditional join internals for improved performance and maintainability. Adds optimized index calculation for equi and non-equi joins, introduces binary search helpers, and removes legacy pandas merge code. Updates error handling, code style, and test coverage for new join logic.
Lowered the max_examples parameter from 600 to 10 in all hypothesis-based tests in test_conditional_join.py. This change speeds up test execution, likely for faster development cycles or to avoid long runtimes during CI.
Replaces local janitor_rs wheel references with platform-specific URLs from PyPI in pixi.lock. This ensures that the correct prebuilt wheels are used for various environments and platforms.
Moved several helper imports from utils to _helpers module for better organization. Updated type hints for clarity and consistency. Added deprecation notes and comments regarding numba support, indicating that numba-based implementations are no longer maintained or supported.
Deprecated warnings for the df_columns and right_columns arguments have been removed from the conditional_join function's docstring. This streamlines the documentation and removes redundant warning messages.
Added a deprecation warning for numba support in _conditional_join_preliminary_checks. Replaced deprecated 'select' method with 'select_columns' in _create_frame to ensure compatibility with updated DataFrame API.
Introduces the join_agg method for computing aggregations (sum, min, max, size, prod) on the right DataFrame during conditional joins. Refactors and extends internal join logic to support aggregation, adds helper functions for aggregation, and updates input validation and tests accordingly.
Co-authored-by: Copilot <[email protected]>
Introduces janitor/functions/_conditional_join/_agg_functions.py with a comprehensive set of aggregation functions (sum, min, max, prod) for use in conditional joins, leveraging janitor_rs for efficient computation. This supports both start/end and match-based aggregations for various numeric types.
…ormat tests

Removed unnecessary print statements from _conditional_join_compute and added usage examples to the join_agg docstring. Reformatted and improved readability of test cases in test_conditional_join.py, including better line breaks for complex expressions and consistent formatting for DataFrame creation.
Replaces outdated references to the 'select' syntax with 'select_columns' in the conditional_join function documentation for clarity and accuracy.
@samukweku samukweku self-assigned this Jan 12, 2026
@samukweku samukweku marked this pull request as draft January 12, 2026 07:21
@github-actions
Copy link

github-actions bot commented Jan 12, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://pyjanitor-devs.github.io/pyjanitor/pr-preview/pr-1569/

Built to branch gh-pages at 2026-01-29 18:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@samukweku samukweku marked this pull request as ready for review January 29, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Aggregation within a conditional_join

3 participants