Skip to content

Conversation

@danhoeflinger
Copy link
Contributor

This PR refactors the SYCL radix sort implementation to improve performance across multiple work-groups. The optimization focuses on kernel simplification, better memory access patterns, and improved work distribution.

Key Changes

Kernel Consolidation

  • Removed the separate __radix_sort_reorder_peer_kernel
  • Unified even/odd iteration logic to reduce kernel variants
  • Eliminated compile-time kernel specialization overhead

Count Kernel Optimization

  • Reorganized to support larger work-group sizes, previously limited to a single subgroup which resulted in poor occupancy
  • Changed from _CountT (uint32) to uint8 accumulation in shared local memory (SLM) for bucket counters
  • Improved SLM indexing with __index_views helper for better cache locality during counting phase
  • Implemented 8-way unrolling with fully strided memory access for better coalescing
  • Optimized reduction with tree-based aggregation where all work-items participate in parallel
  • Reduced register pressure by having each work-item handle 4 radix states during reduction

Reorder Kernel Improvements

  • Redesigned to support larger work-group sizes, previously limited to a single subgroup per work-group which resulted in bad utilization of the hardware

Code Simplification

  • Removed complex peer prefix algorithm variants (subgroup_ballot, atomic_fetch_or, scan_then_broadcast)
  • Removed workload-based tuning (_ONEDPL_RADIX_WORKLOAD_TUNING) in favor of simpler unified approach
  • Net reduction of ~128 lines of code

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger marked this pull request as draft January 28, 2026 20:09
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant