Skip to content

Comments

Implemented OneAPI Backend for Spatter Benchmark#247

Merged
plavin merged 18 commits intohpcgarage:spatter-develfrom
mjung76:main
May 5, 2025
Merged

Implemented OneAPI Backend for Spatter Benchmark#247
plavin merged 18 commits intohpcgarage:spatter-develfrom
mjung76:main

Conversation

@mjung76
Copy link

@mjung76 mjung76 commented Apr 19, 2025

Overview

This pull request introduces a new OneAPI (SYCL/DPC++) backend to the Spatter benchmark suite, enabling support for heterogeneous memory access pattern evaluation across modern CPUs, GPUs, and FPGAs via Intel's DPC++ programming model. This addition extends Spatter’s capabilities beyond CUDA and OpenMP by providing a portable and vendor-agnostic path to benchmark gather/scatter performance on various hardware platforms supported by SYCL.

✨ Change Description/Rationale

  • Added full implementation of the OneAPI backend to support gather and scatter memory access patterns.
  • Integrated support for hierarchical parallelism using nd_range and group_barrier constructs to mirror CUDA-style thread coordination.
  • Employed dpct::atomic_exchange to ensure correctness for concurrent scatter operations, following CUDA’s atomic behavior with SYCL-compatible semantics.
  • Tunable parameters such as block size, vector length (ILP), and work-per-thread are exposed through the OneAPI backend to allow platform-specific optimizations.
  • Updated build system (CMakeLists.txt) to detect the Intel DPC++ compiler (icpx) and link with the appropriate IntelSYCL target or fallback to manual linking if unavailable.
  • This change provides parity with existing CUDA and OpenMP backends, while also unlocking performance portability and future extensibility for platforms supporting the SYCL ecosystem.

👀 Reviewer Checklist

  • All GitHub actions and runners have passed
  • Commits are clean and relevant
  • Commits are clean and relevant to OneAPI backend
  • Implementation integrates with Spatter's backend interface and tuning framework
  • Backend produces correct and deterministic results on supported OneAPI platforms
  • Benchmark parameters (e.g., block_size, index_len, delta) are consistent across backends

✅ PR Checklist

  • [✅] Remove or update the template boilerplate text
  • [✅] Commits are relevant and combined where appropriate
  • [✅] Rebase off spatter-devel
  • [✅] Reviewers Requested
  • [✅] Projects associated
  • [✅] Commits mention issue and/or PR numbers at the bottom of the message
  • [✅] Relevant issues are linked into the PR
  • [✅] TODOs are completed
  • [✅] Reviewer checklist is updated

🚀 TODOs

  • [✅] Test and validate correctness on Intel GPU via DPC++ compiler (icpx)
  • [✅] Confirm atomic scatter correctness under concurrent access
  • [✅] Tune performance parameters on both CPU and GPU OneAPI targets

📌 Future Work

  • Investigate support for hierarchical indexed access patterns in SYCL using subgroup-level operations for finer control over SIMD execution.
  • Documenting Future Work and explanations will help reviewers understand why those things are not being completed in this PR.
  • Extend benchmarking support to Intel FPGAs using SYCL where memory hierarchies and latency behavior differ significantly.
  • Incorporate performance counter hooks to analyze cache behavior and memory throughput on OneAPI platforms, in line with CUDA’s profiling.

@mjung76 mjung76 marked this pull request as ready for review April 21, 2025 04:15
@jyoung3131 jyoung3131 changed the base branch from main to spatter-devel April 30, 2025 15:28
@plavin plavin merged commit 5769c8e into hpcgarage:spatter-devel May 5, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants