Skip to content

Conversation

danhoeflinger
Copy link
Contributor

Add partitioning kernel to set APIs balanced path algorithm.

Adds a partitioning phase which does a sparse pass over the input data to establish binary search boundaries for the main run. This allows memory access pattern to fit within L1 cache for the main kernels when performing the binary searches to establish balanced path intersections.

This improves performance for large sizes of the set algorithms. (When combined with #2317, it provides a nice combination of performance improvements for both large and small sizes of the set algorithms)

@danhoeflinger danhoeflinger requested a review from Copilot June 20, 2025 18:42
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a partitioning phase to the set algorithms, enhancing performance for large input sizes by establishing binary search boundaries that optimize cache usage. Key changes include updating the __gen_set_balanced_path template to accept an additional bounds provider parameter, adding new helper functions (__decode_balanced_path_temp_data, __encode_balanced_path_temp_data) for balanced path processing, and integrating a new partition kernel for the balanced path phase.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
test/general/implementation_details/device_copyable.pass.cpp Updated static_asserts to include new bounds provider parameter
include/oneapi/dpl/pstl/hetero/dpcpp/sycl_traits.h Modified __gen_set_balanced_path specialization to include _BoundsProvider
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h Added helper functions & modified balanced path computation to include partitioning support and safeguard against out-of-bound element access
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h Updated __parallel_set_reduce_then_scan integration with new bounds provider and partitioning kernel
Comments suppressed due to low confidence (4)

include/oneapi/dpl/pstl/hetero/dpcpp/sycl_traits.h:458

  • The specialization of __gen_set_balanced_path now includes the _BoundsProvider parameter; please verify that all downstream usages are updated accordingly to maintain consistent API behavior.
template <typename _SetOpCount, typename _BoundsProvider, typename _Compare>

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h:1086

  • [nitpick] When constructing _GenReduceInput with the new _BoundsProvider, it would be helpful to document the role of __diagonal_spacing and __partition_size in determining partition sizes, ensuring that readers understand how these values impact performance.
                                       _BoundsProvider{__diagonal_spacing, __partition_size}, __comp};

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h:698

  • The change from returning 0 to clamping __i_elem to __rng1.size() + __rng2.size() - 1 may affect the algorithm's edge-case handling; please confirm that this adjusted behavior correctly reflects the intended semantics.
        if (__i_elem >= __rng1.size() + __rng2.size())

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h:821

  • [nitpick] Retrieving __tile_size from __gen_input.__get_bounds is critical for partitioning; please ensure that __tile_size is always correctly initialized and consistent for various input sizes to avoid unexpected partition boundaries.
        std::size_t __tile_size = __gen_input.__get_bounds.__tile_size;

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a first pass over the implementation. I like how the patch unifies handling between the partitioned and non-partitioned bounds.

@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/partition_set_algs branch from 4e4ca84 to b3c9236 Compare July 14, 2025 20:24
@danhoeflinger danhoeflinger added this to the 2022.10.0 milestone Aug 4, 2025
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/partition_set_algs branch from 4b61133 to 42af6b8 Compare August 14, 2025 21:00
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks near ready to me. The conflicts with main need to be addressed.


// Calculate the max location to search in the second set for future repeats, limiting to the edge of the range
_Index __fwd_search_bound = std::min(__merge_path_rng2 + __fwd_search_count, __rng2.size());
using _SizeType = decltype(std::get<0>(__in_rng.tuple()).size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No necessary action on your side here, but I see we use something along the lines of decltype(__rng.size())) frequently in this patch. It looks like we have a oneapi::dpl::__internal::__range_size_t but only for C++20 or later. Perhaps it would be worth extending this for C++17 with internal ranges.

@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/partition_set_algs branch from d2e7888 to 45e9998 Compare August 29, 2025 16:25
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/partition_set_algs branch from eac4b52 to b86cf8a Compare September 2, 2025 15:33
mmichel11
mmichel11 previously approved these changes Sep 2, 2025
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
mmichel11
mmichel11 previously approved these changes Sep 2, 2025
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
SergeyKopienko
SergeyKopienko previously approved these changes Sep 3, 2025
Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danhoeflinger danhoeflinger merged commit 7c955a8 into main Sep 4, 2025
18 of 19 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoeflin/partition_set_algs branch September 4, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants