WIP: Add HIP backend support by icemc · Pull Request #6 · lattice-land/cuda-battery

icemc · 2026-02-28T22:45:36Z

#Add HIP backend support (AMD ROCm)

Summary

This PR adds HIP as a first-class, compile-time-selectable GPU backend targeting AMD ROCm hardware. All existing CPU and CUDA behaviour is completely unchanged. HIP support is opt-in via -DHIP=ON -DGPU=OFF and affects no currently-passing tests.

Note on NVIDIA-via-HIP scaffolding: This PR includes temporary hip-nvidia-* CMake presets and associated build scaffolding that allow the HIP code to be compiled and tested on a NVIDIA machine (using the HIP portability headers that map hip* -> cuda*). This is a development convenience only - it is not a supported end-user configuration and will be removed once the code is validated on real AMD hardware. NVIDIA users should continue using the existing GPU=ON CUDA presets.

Test results (verified on NVIDIA hardware via HIP portability layer as a development proxy):

35/35 CPU tests
38/38 CUDA tests (gpu-debug)
38/38 HIP tests (hip-nvidia-debug, temporary dev preset)

Motivation

Users running on AMD GPUs (ROCm) cannot use the library today. HIP provides portability across AMD and NVIDIA hardware with near-identical syntax to CUDA. The goal is to support AMD hardware without forking the codebase or breaking existing CUDA users.

Design approach

HIP is selected entirely at compile time. There is no runtime dispatch. The key principle is: *_hip.cpp files call the HIP API directly - on AMD this is native ROCm; on the temporary NVIDIA dev path the HIP portability headers map hip* to cuda* transparently, so no backend-switching logic is needed in test or application code.

The supported backends are:

Backend	CMake option	Compiler	Required
CPU only	`GPU=OFF HIP=OFF`	Any C++20	C++20 compiler
CUDA / NVIDIA	`GPU=ON`	nvcc	CUDA Toolkit 12+
HIP / AMD (this PR)	`HIP=ON`	hipcc	ROCm 6+
~~HIP / NVIDIA~~	(temporary dev scaffolding - not for end users)	-	-

Changes by area

`include/battery/utility.hpp` - backend detection macros

Replaces raw __CUDACC__ / __CUDA_ARCH__ checks throughout the codebase with clean, named macros:

BATTERY_GPU_ENABLED - any GPU compiler active
BATTERY_DEVICE_CODE - inside a device function body
BATTERY_CUDA_BACKEND - nvcc without HIP wrapper
BATTERY_HIP_BACKEND - hipcc (AMD or NVIDIA via hipcc)
BATTERY_HIP_BUILD - injected by CMake when HIP=ON, covering the NVIDIA-via-HIP dev path where nvcc is the compiler and __HIPCC__ is never set
HIPE(result) / HIPEX(result) - HIP error-check macros, parallel to the existing CUDAE/CUDAEX; hipAssert is host-only since hipGetErrorString is a host function; CPU no-op stubs provided so headers stay includable in CPU translation units

CUDAE/CUDAEX and all existing macros are unchanged.

`include/battery/allocator.hpp` - backend-dispatching allocation helpers

Five internal battery::impl helpers (gpu_malloc, gpu_free, gpu_malloc_managed, gpu_malloc_host, gpu_free_host) replace direct CUDA API calls in the allocator classes. Each uses a strict three-branch guard with an explicit #error fallback - no silent fallthrough:

#if defined(BATTERY_CUDA_BACKEND) && !defined(BATTERY_HIP_BUILD)  // -> cudaMalloc*
#elif defined(BATTERY_HIP_BACKEND) || defined(BATTERY_HIP_BUILD)  // -> hipMalloc*
#else
  #error "no GPU backend defined"
#endif

Uses ROCm 6.x non-deprecated names: hipHostMalloc / hipHostFree.

`include/battery/memory.hpp` - HIP scoped atomics

Scope constants: hip_scope_block/device/system mapping to __HIP_MEMORY_SCOPE_*
hip_atomic_wrapper<T, Scope> wrapping __hip_atomic_* intrinsics, compatible with the existing copyable_atomic interface
atomic_memory_block/grid/multi_grid aliases for HIP, mirroring the CUDA ones

`include/battery/unique_ptr.hpp` - HIP cooperative groups

Conditional include of <hip/hip_cooperative_groups.h> vs <cooperative_groups.h>
battery::invoke_one device shim for HIP (invoke_one is absent from HIP cooperative groups; implemented via g.thread_rank() == 0)
make_unique_block / make_unique_grid guard changed from BATTERY_CUDA_BACKEND to BATTERY_GPU_ENABLED so both backends compile
assert(ptr) instead of assert(ptr != nullptr) to avoid __nv_bool/nullptr_t clash when HIP portability headers are loaded alongside nvcc

`CMakeLists.txt`

option(HIP ...) with mutual exclusion guard against GPU=ON
project(... LANGUAGES HIP CXX) path
find_package(hip REQUIRED) + target_link_libraries(cuda_battery INTERFACE hip::host) - handles AMD (amdhip64) and the dev NVIDIA path (cudart wrapper) in one target
target_compile_definitions(cuda_battery INTERFACE BATTERY_HIP_BUILD) when HIP=ON
Section III test discovery: tests/*_hip.cpp compiled as LANGUAGE HIP
(temporary dev scaffolding) CUDA::cuda_driver linked conditionally when CUDAToolkit_FOUND - needed on the NVIDIA dev path because hipDeviceGetAttribute routes through libcuda.so; will be removed in Phase 5

`CMakePresets.json`

hip-debug and hip-release - the permanent AMD presets
(temporary) hip-nvidia-debug and hip-nvidia-release - NVIDIA dev presets, to be deleted after AMD validation

HIP test files (`tests/*_hip.cpp`)

Three new HIP test files mirroring the existing *_gpu.cpp suite. They call the HIP API directly with no GPU_* abstraction macros - on AMD this is native; on the NVIDIA dev path the portability headers handle the mapping:

tests/allocator_test_gpu_hip.cpp
tests/utility_test_cpu_gpu_hip.cpp
tests/unique_ptr_test_gpu_hip.cpp

Demo (`demo/`)

Four new HIP demo source files and test, demo/CMakeLists.txt extended with the same hip::host / CUDA::cuda_driver treatment, demo/README.md updated with AMD build instructions.

What remains after this PR (Last Phase)

Once tested on AMD hardware, a small follow-up is needed to remove the NVIDIA dev scaffolding:

Delete hip-nvidia-debug, hip-nvidia-release, default-hip-nvidia from CMakePresets.json
Remove find_package(CUDAToolkit ...) and CUDA::cuda_driver link blocks from CMakeLists.txt and demo/CMakeLists.txt

No header or test changes are needed - the *_hip.cpp files already call the HIP API natively.

Files changed

Category	Files
Headers	`include/battery/utility.hpp`, `allocator.hpp`, `memory.hpp`, `unique_ptr.hpp`, `vector.hpp`
Build	`CMakeLists.txt`, `CMakePresets.json`, `demo/CMakeLists.txt`
Tests	`tests/allocator_test_gpu_hip.cpp`, `utility_test_cpu_gpu_hip.cpp`, `unique_ptr_test_gpu_hip.cpp`
Demo	`demo/src/simple_hip.cpp`, `demo_hip.cpp`, `inkernel_allocation_hip.cpp`, `demo/tests/demo_test_gpu_hip.cpp`
Docs	`README.md`, `demo/README.md`, `CHANGELOG.md`

18 files changed, 1 446 insertions(+), 195 deletions(−).

TODO

Validate functionality on AMD hardware
Cleanup code to remove hip-nvidia-* related code

Copilot

Pull request overview

Adds compile-time HIP (AMD ROCm) as an additional GPU backend alongside existing CPU and CUDA backends, including build system wiring, HIP-specific tests, and demo targets.

Changes:

Introduces backend-detection macros and HIP error-check helpers (HIPE/HIPEX) in utility.hpp, and updates device/host guards across headers.
Adds HIP build/test infrastructure (CMake options, presets, and HIP test discovery).
Adds HIP-mirrored GPU tests and HIP demo executables/docs.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`CMakeLists.txt`	Adds `HIP` option, HIP language support, HIP test discovery/linking.
`CMakePresets.json`	Adds HIP presets (AMD) and temporary HIP-on-NVIDIA dev presets.
`include/battery/utility.hpp`	Adds backend detection macros and HIP error-check macros; updates device-code guards.
`include/battery/allocator.hpp`	Adds backend-dispatching GPU allocation wrappers for CUDA vs HIP.
`include/battery/memory.hpp`	Adds HIP scoped atomic abstractions and switches CUDA detection macro.
`include/battery/unique_ptr.hpp`	Adds HIP cooperative groups support + shim; broadens GPU guard.
`include/battery/vector.hpp`	Switches device-code detection to the new macro.
`tests/allocator_test_gpu_hip.cpp`	New HIP allocator test mirroring CUDA GPU test behavior.
`tests/utility_test_cpu_gpu_hip.cpp`	New HIP utility test mirroring CUDA CPU/GPU utility tests.
`tests/unique_ptr_test_gpu_hip.cpp`	New HIP unique_ptr/cooperative launch tests.
`demo/CMakeLists.txt`	Adds HIP demo targets and HIP test discovery.
`demo/README.md`	Adds HIP build/run instructions and temporary HIP-on-NVIDIA notes.
`demo/src/demo_hip.cpp`	HIP variant of the main demo executable.
`demo/src/simple_hip.cpp`	HIP variant of the simple demo executable.
`demo/src/inkernel_allocation_hip.cpp`	HIP variant of in-kernel allocation demo, incl. coop launch path.
`demo/tests/demo_test_gpu_hip.cpp`	HIP variant of demo GPU test.
`demo/src/inkernel_allocation.cpp`	Fixes a `std::endl` typo in CUDA demo.
`README.md`	Updates top-level documentation to mention HIP support and backend macros.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/battery/memory.hpp

CMakeLists.txt

README.md

icemc · 2026-02-28T23:29:39Z

Open Question

Area	Risk	Mitigation
Rename `CUDA` / `CUDA_GLOBAL` qualifiers?	Now that HIP is a first-class backend, the macro names `CUDA` (`__device__ __host__`) and `CUDA_GLOBAL` (`__global__`) are misleading - they apply equally to HIP code but look CUDA-specific to a reader unfamiliar with the library. This creates confusion when browsing HIP source files that use `CUDA` on every function signature. Suggested replacements: `GPU` for `CUDA` and `GPU_KERNEL` for `CUDA_GLOBAL`. These names are backend-agnostic, self-explanatory, and consistent with `BATTERY_GPU_ENABLED`. The rename is purely mechanical but is a breaking API change for any out-of-tree code using the macros directly. A deprecation shim (`#define CUDA GPU`) could ease the transition.	Decide before the first tagged release that includes HIP; do not break the names after that.

cc @ptal

icemc added 7 commits February 28, 2026 13:20

Macro abstraction implementation

8eb44a4

Implemented HIP allocators

4dd0b5d

Implemented HIP atomics / memory.hpp

0d8fb9b

Completed test coverage for HIP

a6efc43

Added demo for HIP

871df70

Updated README

26ee05c

Removed GPU_* abstraction macros from *_hip.cpp files

e9ca1bc

Copilot AI review requested due to automatic review settings February 28, 2026 22:45

Copilot started reviewing on behalf of icemc February 28, 2026 22:46 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

include/battery/memory.hpp Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

README.md Show resolved Hide resolved

Added recommended changes

8b2c46c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add HIP backend support#6

WIP: Add HIP backend support#6
icemc wants to merge 8 commits intolattice-land:mainfrom
icemc:feature/hip

icemc commented Feb 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icemc commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

icemc commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design approach

Changes by area

include/battery/utility.hpp - backend detection macros

include/battery/allocator.hpp - backend-dispatching allocation helpers

include/battery/memory.hpp - HIP scoped atomics

include/battery/unique_ptr.hpp - HIP cooperative groups

CMakeLists.txt

CMakePresets.json

HIP test files (tests/*_hip.cpp)

Demo (demo/)

What remains after this PR (Last Phase)

Files changed

TODO

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icemc commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

icemc commented Feb 28, 2026 •

edited

Loading

`include/battery/utility.hpp` - backend detection macros

`include/battery/allocator.hpp` - backend-dispatching allocation helpers

`include/battery/memory.hpp` - HIP scoped atomics

`include/battery/unique_ptr.hpp` - HIP cooperative groups

`CMakeLists.txt`

`CMakePresets.json`

HIP test files (`tests/*_hip.cpp`)

Demo (`demo/`)