Skip to content

WIP: Add HIP backend support#6

Open
icemc wants to merge 8 commits intolattice-land:mainfrom
icemc:feature/hip
Open

WIP: Add HIP backend support#6
icemc wants to merge 8 commits intolattice-land:mainfrom
icemc:feature/hip

Conversation

@icemc
Copy link

@icemc icemc commented Feb 28, 2026

#Add HIP backend support (AMD ROCm)

Summary

This PR adds HIP as a first-class, compile-time-selectable GPU backend targeting AMD ROCm hardware. All existing CPU and CUDA behaviour is completely unchanged. HIP support is opt-in via -DHIP=ON -DGPU=OFF and affects no currently-passing tests.

Note on NVIDIA-via-HIP scaffolding: This PR includes temporary hip-nvidia-* CMake presets and associated build scaffolding that allow the HIP code to be compiled and tested on a NVIDIA machine (using the HIP portability headers that map hip* -> cuda*). This is a development convenience only - it is not a supported end-user configuration and will be removed once the code is validated on real AMD hardware. NVIDIA users should continue using the existing GPU=ON CUDA presets.

Test results (verified on NVIDIA hardware via HIP portability layer as a development proxy):

  • 35/35 CPU tests
  • 38/38 CUDA tests (gpu-debug)
  • 38/38 HIP tests (hip-nvidia-debug, temporary dev preset)

Motivation

Users running on AMD GPUs (ROCm) cannot use the library today. HIP provides portability across AMD and NVIDIA hardware with near-identical syntax to CUDA. The goal is to support AMD hardware without forking the codebase or breaking existing CUDA users.


Design approach

HIP is selected entirely at compile time. There is no runtime dispatch. The key principle is: *_hip.cpp files call the HIP API directly - on AMD this is native ROCm; on the temporary NVIDIA dev path the HIP portability headers map hip* to cuda* transparently, so no backend-switching logic is needed in test or application code.

The supported backends are:

Backend CMake option Compiler Required
CPU only GPU=OFF HIP=OFF Any C++20 C++20 compiler
CUDA / NVIDIA GPU=ON nvcc CUDA Toolkit 12+
HIP / AMD (this PR) HIP=ON hipcc ROCm 6+
HIP / NVIDIA (temporary dev scaffolding - not for end users) - -

Changes by area

include/battery/utility.hpp - backend detection macros

Replaces raw __CUDACC__ / __CUDA_ARCH__ checks throughout the codebase with clean, named macros:

  • BATTERY_GPU_ENABLED - any GPU compiler active
  • BATTERY_DEVICE_CODE - inside a device function body
  • BATTERY_CUDA_BACKEND - nvcc without HIP wrapper
  • BATTERY_HIP_BACKEND - hipcc (AMD or NVIDIA via hipcc)
  • BATTERY_HIP_BUILD - injected by CMake when HIP=ON, covering the NVIDIA-via-HIP dev path where nvcc is the compiler and __HIPCC__ is never set
  • HIPE(result) / HIPEX(result) - HIP error-check macros, parallel to the existing CUDAE/CUDAEX; hipAssert is host-only since hipGetErrorString is a host function; CPU no-op stubs provided so headers stay includable in CPU translation units

CUDAE/CUDAEX and all existing macros are unchanged.

include/battery/allocator.hpp - backend-dispatching allocation helpers

Five internal battery::impl helpers (gpu_malloc, gpu_free, gpu_malloc_managed, gpu_malloc_host, gpu_free_host) replace direct CUDA API calls in the allocator classes. Each uses a strict three-branch guard with an explicit #error fallback - no silent fallthrough:

#if defined(BATTERY_CUDA_BACKEND) && !defined(BATTERY_HIP_BUILD)  // -> cudaMalloc*
#elif defined(BATTERY_HIP_BACKEND) || defined(BATTERY_HIP_BUILD)  // -> hipMalloc*
#else
  #error "no GPU backend defined"
#endif

Uses ROCm 6.x non-deprecated names: hipHostMalloc / hipHostFree.

include/battery/memory.hpp - HIP scoped atomics

  • Scope constants: hip_scope_block/device/system mapping to __HIP_MEMORY_SCOPE_*
  • hip_atomic_wrapper<T, Scope> wrapping __hip_atomic_* intrinsics, compatible with the existing copyable_atomic interface
  • atomic_memory_block/grid/multi_grid aliases for HIP, mirroring the CUDA ones

include/battery/unique_ptr.hpp - HIP cooperative groups

  • Conditional include of <hip/hip_cooperative_groups.h> vs <cooperative_groups.h>
  • battery::invoke_one device shim for HIP (invoke_one is absent from HIP cooperative groups; implemented via g.thread_rank() == 0)
  • make_unique_block / make_unique_grid guard changed from BATTERY_CUDA_BACKEND to BATTERY_GPU_ENABLED so both backends compile
  • assert(ptr) instead of assert(ptr != nullptr) to avoid __nv_bool/nullptr_t clash when HIP portability headers are loaded alongside nvcc

CMakeLists.txt

  • option(HIP ...) with mutual exclusion guard against GPU=ON
  • project(... LANGUAGES HIP CXX) path
  • find_package(hip REQUIRED) + target_link_libraries(cuda_battery INTERFACE hip::host) - handles AMD (amdhip64) and the dev NVIDIA path (cudart wrapper) in one target
  • target_compile_definitions(cuda_battery INTERFACE BATTERY_HIP_BUILD) when HIP=ON
  • Section III test discovery: tests/*_hip.cpp compiled as LANGUAGE HIP
  • (temporary dev scaffolding) CUDA::cuda_driver linked conditionally when CUDAToolkit_FOUND - needed on the NVIDIA dev path because hipDeviceGetAttribute routes through libcuda.so; will be removed in Phase 5

CMakePresets.json

  • hip-debug and hip-release - the permanent AMD presets
  • (temporary) hip-nvidia-debug and hip-nvidia-release - NVIDIA dev presets, to be deleted after AMD validation

HIP test files (tests/*_hip.cpp)

Three new HIP test files mirroring the existing *_gpu.cpp suite. They call the HIP API directly with no GPU_* abstraction macros - on AMD this is native; on the NVIDIA dev path the portability headers handle the mapping:

  • tests/allocator_test_gpu_hip.cpp
  • tests/utility_test_cpu_gpu_hip.cpp
  • tests/unique_ptr_test_gpu_hip.cpp

Demo (demo/)

Four new HIP demo source files and test, demo/CMakeLists.txt extended with the same hip::host / CUDA::cuda_driver treatment, demo/README.md updated with AMD build instructions.


What remains after this PR (Last Phase)

Once tested on AMD hardware, a small follow-up is needed to remove the NVIDIA dev scaffolding:

  • Delete hip-nvidia-debug, hip-nvidia-release, default-hip-nvidia from CMakePresets.json
  • Remove find_package(CUDAToolkit ...) and CUDA::cuda_driver link blocks from CMakeLists.txt and demo/CMakeLists.txt

No header or test changes are needed - the *_hip.cpp files already call the HIP API natively.


Files changed

Category Files
Headers include/battery/utility.hpp, allocator.hpp, memory.hpp, unique_ptr.hpp, vector.hpp
Build CMakeLists.txt, CMakePresets.json, demo/CMakeLists.txt
Tests tests/allocator_test_gpu_hip.cpp, utility_test_cpu_gpu_hip.cpp, unique_ptr_test_gpu_hip.cpp
Demo demo/src/simple_hip.cpp, demo_hip.cpp, inkernel_allocation_hip.cpp, demo/tests/demo_test_gpu_hip.cpp
Docs README.md, demo/README.md, CHANGELOG.md

18 files changed, 1 446 insertions(+), 195 deletions(−).

TODO

  • Validate functionality on AMD hardware
  • Cleanup code to remove hip-nvidia-* related code

Copilot AI review requested due to automatic review settings February 28, 2026 22:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds compile-time HIP (AMD ROCm) as an additional GPU backend alongside existing CPU and CUDA backends, including build system wiring, HIP-specific tests, and demo targets.

Changes:

  • Introduces backend-detection macros and HIP error-check helpers (HIPE/HIPEX) in utility.hpp, and updates device/host guards across headers.
  • Adds HIP build/test infrastructure (CMake options, presets, and HIP test discovery).
  • Adds HIP-mirrored GPU tests and HIP demo executables/docs.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
CMakeLists.txt Adds HIP option, HIP language support, HIP test discovery/linking.
CMakePresets.json Adds HIP presets (AMD) and temporary HIP-on-NVIDIA dev presets.
include/battery/utility.hpp Adds backend detection macros and HIP error-check macros; updates device-code guards.
include/battery/allocator.hpp Adds backend-dispatching GPU allocation wrappers for CUDA vs HIP.
include/battery/memory.hpp Adds HIP scoped atomic abstractions and switches CUDA detection macro.
include/battery/unique_ptr.hpp Adds HIP cooperative groups support + shim; broadens GPU guard.
include/battery/vector.hpp Switches device-code detection to the new macro.
tests/allocator_test_gpu_hip.cpp New HIP allocator test mirroring CUDA GPU test behavior.
tests/utility_test_cpu_gpu_hip.cpp New HIP utility test mirroring CUDA CPU/GPU utility tests.
tests/unique_ptr_test_gpu_hip.cpp New HIP unique_ptr/cooperative launch tests.
demo/CMakeLists.txt Adds HIP demo targets and HIP test discovery.
demo/README.md Adds HIP build/run instructions and temporary HIP-on-NVIDIA notes.
demo/src/demo_hip.cpp HIP variant of the main demo executable.
demo/src/simple_hip.cpp HIP variant of the simple demo executable.
demo/src/inkernel_allocation_hip.cpp HIP variant of in-kernel allocation demo, incl. coop launch path.
demo/tests/demo_test_gpu_hip.cpp HIP variant of demo GPU test.
demo/src/inkernel_allocation.cpp Fixes a std::endl typo in CUDA demo.
README.md Updates top-level documentation to mention HIP support and backend macros.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@icemc
Copy link
Author

icemc commented Feb 28, 2026

Open Question

Area Risk Mitigation
Rename CUDA / CUDA_GLOBAL qualifiers? Now that HIP is a first-class backend, the macro names CUDA (__device__ __host__) and CUDA_GLOBAL (__global__) are misleading - they apply equally to HIP code but look CUDA-specific to a reader unfamiliar with the library. This creates confusion when browsing HIP source files that use CUDA on every function signature. Suggested replacements: GPU for CUDA and GPU_KERNEL for CUDA_GLOBAL. These names are backend-agnostic, self-explanatory, and consistent with BATTERY_GPU_ENABLED. The rename is purely mechanical but is a breaking API change for any out-of-tree code using the macros directly. A deprecation shim (#define CUDA GPU) could ease the transition. Decide before the first tagged release that includes HIP; do not break the names after that.

cc @ptal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants