Skip to content

Add AMD ROCm/HIP platform support#232

Draft
Copilot wants to merge 6 commits intoLTS-C++17from
copilot/review-cmake-scripts-for-hipify
Draft

Add AMD ROCm/HIP platform support#232
Copilot wants to merge 6 commits intoLTS-C++17from
copilot/review-cmake-scripts-for-hipify

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

The library only supported NVIDIA CUDA and CPU backends. This adds AMD GPU support via CMake 4.x native HIP language, using HIP's built-in CUDA compatibility layer so existing CUDA API calls in headers work transparently without touching .cu source files.

New CMake infrastructure

  • cmake/hip_init.cmake — calls enable_language(HIP), wires up HIP cmake modules
  • cmake/libs/hip/hip.cmakefind_package(hip), defines add_hip_to_target(), ENABLE_HIP_LINE_INFO / ENABLE_HIP_DEBUG options
  • cmake/libs/hip/archs.cmake — GPU arch selection via HIP_ARCH cache var (defaults to native)
  • cmake/libs/hip/target_generation.cmake — target properties, debug/lineinfo helpers

CMakeLists.txt & test discovery

  • Root CMakeLists.txt: check_language(HIP) + ENABLE_HIP option; includes cmake/hip_init.cmake when found
  • cmake/discover_tests.cmake, cmake/tests/discover_tests.cmake, cmake/tests/add_generated_test.cmake: generate .hip launchers alongside .cu / .cpp when CMAKE_HIP_COMPILER AND ENABLE_HIP
  • benchmarks/CMakeLists.txt, utest_saturate/CMakeLists.txt: HIP equivalents added alongside existing CUDA blocks
  • cmake/discover_tests.cmake: scoped CUDA::cuda_driver link to CUDA-only targets

Header changes

All changes are additive #elif HIP_HOST_DEVICE / || HIP_HOST_DEVICE guards; no existing CUDA paths modified.

  • compiler_macros.hHIP_HOST_DEVICE macro (1 when __HIP__ defined)
  • utils.h#include <hip/hip_runtime.h> + gpuAssert(hipError_t) / gpuErrchk for HIP
  • parallel_architectures.hdefaultParArch = GPU_AMD when HIP; adds HIP_HOST_DEVICE include-guard check
  • stream.hStream_<ParArch::GPU_AMD> specialisation using hipStream_t
  • executor_kernels.h — adds HIP_HOST_DEVICE to compile guard (HIP supports __grid_constant__ since ROCm 5.4)
  • data_parallel_patterns.hTransformDPP<GPU_AMD> and DivergentBatchTransformDPP<GPU_AMD> specialisations; DivergentBatchTransformDPPBase templated on ParArch so launchTransformDPP dispatches to the correct arch
  • executors.hExecutor<TransformDPP<GPU_AMD>> and Executor<DivergentBatchTransformDPP<GPU_AMD>> specialisations; static_assert updated to allow GPU_AMD
  • ptr_nd.hdefaultMemType = DeviceAndPinned for HIP; alloc/free/copy functions extended; upload/download overloads for Stream_<GPU_AMD>
  • vector_types.h — skips CPU-side fk:: aliases when HIP (HIP runtime already provides them via hip_vector_types.h)
  • ptr_utils.hGPU_AMD branch in setTo
  • image.h — HIP upload/download overloads

Invocation (Windows, ROCm 7.1)

cmake -G "Ninja" path\to\FusedKernelLibrary ^
  -DCMAKE_CXX_COMPILER="%HIP_PATH%bin\clang++.exe" ^
  -DCMAKE_HIP_COMPILER="%HIP_PATH%bin\clang++.exe" ^
  -DCMAKE_HIP_COMPILER_ROCM_ROOT="C:/Program Files/AMD/ROCm/7.1/" ^
  -DENABLE_CUDA=OFF

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 2 commits March 13, 2026 20:45
Co-authored-by: albertandaluz <176801700+albertandaluz@users.noreply.github.com>
…NFO)

Co-authored-by: albertandaluz <176801700+albertandaluz@users.noreply.github.com>
Copilot AI changed the title [WIP] Review and suggest changes for CMake scripts to support HIP Add AMD ROCm/HIP platform support Mar 13, 2026
Copilot AI requested a review from albertandaluz March 13, 2026 20:50
Albert Andaluz and others added 3 commits March 26, 2026 19:42
# Conflicts:
#	include/fused_kernel/algorithms/image_processing/image.h
#	include/fused_kernel/core/data/ptr_nd.h
#	include/fused_kernel/core/execution_model/data_parallel_patterns.h
#	include/fused_kernel/core/execution_model/executor_details/executor_kernels.h
#	include/fused_kernel/core/execution_model/executors.h
#	include/fused_kernel/core/execution_model/parallel_architectures.h
#	include/fused_kernel/core/utils/compiler_macros.h
#	include/fused_kernel/core/utils/utils.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants