Skip to content

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Jun 20, 2025

This PR makes some infrastructure changes to allow a runtime decision based upon number of elements to select the set algorithm selected, in addition to limitations based upon platform and environment.

  1. Relocate algorithm choice from hetero level to sycl backend and make __parallel_set_op blocking. (temporarily required for type alignment, see [Draft] RFC future type and keepalives #2300)
  2. Create sycl backend entry points to call "merge" (__parallel_merge_impl) and "copy" (__parallel_copy_impl).
  3. Add a threshold to switch algorithms from set_a_scan to balanced_path based upon set tag in __set_op_impl using __consider_write_a_alg

This results in an overall performance improvement, as small numbers of input elements do not perform as well with the balanced path algorithm.

edit: this incorporates a fix from #2386 for the reduce_then_scan temporary buffer for set_a_write

@danhoeflinger danhoeflinger requested a review from Copilot June 20, 2025 17:55
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reworks the set algorithms in the SYCL backend to dynamically select between different implementation strategies based on input sizes. Key changes include removing the boolean template parameter from the set tag types, updating call sites and helper functions to use non-templated tags, and introducing a threshold switch in __set_op_impl to choose between the write_a_scan and balanced_path algorithms.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/general/implementation_details/balanced_path_unit_tests.pass.cpp Removed instantiations with std::true_type from set tag calls in tests
include/oneapi/dpl/pstl/hetero/dpcpp/unseq_backend_sycl.h Dropped the _IsOneShot template parameter from set tag types and updated __brick_set_op accordingly
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h Updated comments and template parameters (renaming to _SetTag) to align with new tag type definitions
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h Adjusted __parallel_merge overloads to remove extraction of _ExecutionPolicy and use a custom name parameter
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_for.h Updated parallel_for overloads to use an explicit queue parameter and _CustomName for kernel naming
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h Refactored the __set_op_impl and related functions to support algorithm switching based on input size
include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h Modified set algorithm patterns to remove the boolean tag instantiation and use the new tag types
Comments suppressed due to low confidence (3)

include/oneapi/dpl/pstl/hetero/dpcpp/unseq_backend_sycl.h:1230

  • Since the _IsOneShot template parameter and the associated __can_write_from_rng2_v flag have been removed from the set tag types, please add a brief comment explaining that the write capability is now determined elsewhere for clarity and maintainability.
struct _IntersectionTag

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h:532

  • With the introduction of the _CustomName parameter and the new overloads for __parallel_merge, please add inline documentation to clarify when each overload should be used and how _CustomName is derived from the execution policy.
template <typename _ExecutionPolicy, typename _Range1, typename _Range2, typename _Range3, typename _Compare,

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h:1332

  • The magic number used as a threshold for switching algorithms (1024 * 1024) could benefit from an inline comment that explains its rationale and expected impact on performance, to improve code maintainability.
    static constexpr std::size_t __threshold = 1024 * 1024;

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments from me.

@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/balanced_path_small_n branch from 0271355 to 25654d0 Compare July 14, 2025 19:36
@danhoeflinger danhoeflinger added this to the 2022.10.0 milestone Aug 4, 2025
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/balanced_path_small_n branch from b43a8d1 to 8b9fdb7 Compare August 20, 2025 16:08
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/balanced_path_small_n branch from 22bd0b2 to 39eb107 Compare September 4, 2025 13:49
@danhoeflinger
Copy link
Contributor Author

danhoeflinger commented Sep 4, 2025

I fixed the rebase with main, Now the diff reasonable is reasonable. It was previously clashing with a partial implementation of the partitioning PR which made the diff messy and incorrect.

I have found a same-mangled name issue with set_union and set_symmetric_difference I am investigating.
Edit: this is now resolved.

@SergeyKopienko
Copy link
Contributor

I have found a same-mangled name issue with set_union and set_symmetric_difference I am investigating.

@danhoeflinger I can help you with this topic.

@danhoeflinger
Copy link
Contributor Author

I've resolved the same mangled name issue.

@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/balanced_path_small_n branch from 8393391 to f52a5b6 Compare September 5, 2025 18:15
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger force-pushed the dev/dhoeflin/balanced_path_small_n branch from d901a22 to 653e24a Compare September 9, 2025 14:41
mmichel11
mmichel11 previously approved these changes Sep 9, 2025
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done another pass over this PR, and it LGTM assuming green CI.

Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving after bugfix.

Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danhoeflinger danhoeflinger merged commit 71cbc66 into main Sep 11, 2025
19 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoeflin/balanced_path_small_n branch September 11, 2025 12:16
timmiesmith pushed a commit that referenced this pull request Sep 11, 2025
Relocate the algorithm selection to sycl backend, and use an empirical threshold to decide between reduce_then_scan algorithms set_a_write and partitioned balanced path.
---------

Signed-off-by: Dan Hoeflinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants