Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds compile-time HIP (AMD ROCm) as an additional GPU backend alongside existing CPU and CUDA backends, including build system wiring, HIP-specific tests, and demo targets.
Changes:
- Introduces backend-detection macros and HIP error-check helpers (
HIPE/HIPEX) inutility.hpp, and updates device/host guards across headers. - Adds HIP build/test infrastructure (CMake options, presets, and HIP test discovery).
- Adds HIP-mirrored GPU tests and HIP demo executables/docs.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
CMakeLists.txt |
Adds HIP option, HIP language support, HIP test discovery/linking. |
CMakePresets.json |
Adds HIP presets (AMD) and temporary HIP-on-NVIDIA dev presets. |
include/battery/utility.hpp |
Adds backend detection macros and HIP error-check macros; updates device-code guards. |
include/battery/allocator.hpp |
Adds backend-dispatching GPU allocation wrappers for CUDA vs HIP. |
include/battery/memory.hpp |
Adds HIP scoped atomic abstractions and switches CUDA detection macro. |
include/battery/unique_ptr.hpp |
Adds HIP cooperative groups support + shim; broadens GPU guard. |
include/battery/vector.hpp |
Switches device-code detection to the new macro. |
tests/allocator_test_gpu_hip.cpp |
New HIP allocator test mirroring CUDA GPU test behavior. |
tests/utility_test_cpu_gpu_hip.cpp |
New HIP utility test mirroring CUDA CPU/GPU utility tests. |
tests/unique_ptr_test_gpu_hip.cpp |
New HIP unique_ptr/cooperative launch tests. |
demo/CMakeLists.txt |
Adds HIP demo targets and HIP test discovery. |
demo/README.md |
Adds HIP build/run instructions and temporary HIP-on-NVIDIA notes. |
demo/src/demo_hip.cpp |
HIP variant of the main demo executable. |
demo/src/simple_hip.cpp |
HIP variant of the simple demo executable. |
demo/src/inkernel_allocation_hip.cpp |
HIP variant of in-kernel allocation demo, incl. coop launch path. |
demo/tests/demo_test_gpu_hip.cpp |
HIP variant of demo GPU test. |
demo/src/inkernel_allocation.cpp |
Fixes a std::endl typo in CUDA demo. |
README.md |
Updates top-level documentation to mention HIP support and backend macros. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Author
|
Open Question
cc @ptal |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#Add HIP backend support (AMD ROCm)
Summary
This PR adds HIP as a first-class, compile-time-selectable GPU backend targeting AMD ROCm hardware. All existing CPU and CUDA behaviour is completely unchanged. HIP support is opt-in via
-DHIP=ON -DGPU=OFFand affects no currently-passing tests.Test results (verified on NVIDIA hardware via HIP portability layer as a development proxy):
gpu-debug)hip-nvidia-debug, temporary dev preset)Motivation
Users running on AMD GPUs (ROCm) cannot use the library today. HIP provides portability across AMD and NVIDIA hardware with near-identical syntax to CUDA. The goal is to support AMD hardware without forking the codebase or breaking existing CUDA users.
Design approach
HIP is selected entirely at compile time. There is no runtime dispatch. The key principle is:
*_hip.cppfiles call the HIP API directly - on AMD this is native ROCm; on the temporary NVIDIA dev path the HIP portability headers maphip*tocuda*transparently, so no backend-switching logic is needed in test or application code.The supported backends are:
GPU=OFF HIP=OFFGPU=ONHIP=ONHIP / NVIDIAChanges by area
include/battery/utility.hpp- backend detection macrosReplaces raw
__CUDACC__/__CUDA_ARCH__checks throughout the codebase with clean, named macros:BATTERY_GPU_ENABLED- any GPU compiler activeBATTERY_DEVICE_CODE- inside a device function bodyBATTERY_CUDA_BACKEND- nvcc without HIP wrapperBATTERY_HIP_BACKEND- hipcc (AMD or NVIDIA via hipcc)BATTERY_HIP_BUILD- injected by CMake whenHIP=ON, covering the NVIDIA-via-HIP dev path where nvcc is the compiler and__HIPCC__is never setHIPE(result)/HIPEX(result)- HIP error-check macros, parallel to the existingCUDAE/CUDAEX;hipAssertis host-only sincehipGetErrorStringis a host function; CPU no-op stubs provided so headers stay includable in CPU translation unitsCUDAE/CUDAEXand all existing macros are unchanged.include/battery/allocator.hpp- backend-dispatching allocation helpersFive internal
battery::implhelpers (gpu_malloc,gpu_free,gpu_malloc_managed,gpu_malloc_host,gpu_free_host) replace direct CUDA API calls in the allocator classes. Each uses a strict three-branch guard with an explicit#errorfallback - no silent fallthrough:Uses ROCm 6.x non-deprecated names:
hipHostMalloc/hipHostFree.include/battery/memory.hpp- HIP scoped atomicship_scope_block/device/systemmapping to__HIP_MEMORY_SCOPE_*hip_atomic_wrapper<T, Scope>wrapping__hip_atomic_*intrinsics, compatible with the existingcopyable_atomicinterfaceatomic_memory_block/grid/multi_gridaliases for HIP, mirroring the CUDA onesinclude/battery/unique_ptr.hpp- HIP cooperative groups<hip/hip_cooperative_groups.h>vs<cooperative_groups.h>battery::invoke_onedevice shim for HIP (invoke_oneis absent from HIP cooperative groups; implemented viag.thread_rank() == 0)make_unique_block/make_unique_gridguard changed fromBATTERY_CUDA_BACKENDtoBATTERY_GPU_ENABLEDso both backends compileassert(ptr)instead ofassert(ptr != nullptr)to avoid__nv_bool/nullptr_tclash when HIP portability headers are loaded alongside nvccCMakeLists.txtoption(HIP ...)with mutual exclusion guard againstGPU=ONproject(... LANGUAGES HIP CXX)pathfind_package(hip REQUIRED)+target_link_libraries(cuda_battery INTERFACE hip::host)- handles AMD (amdhip64) and the dev NVIDIA path (cudartwrapper) in one targettarget_compile_definitions(cuda_battery INTERFACE BATTERY_HIP_BUILD)whenHIP=ONtests/*_hip.cppcompiled asLANGUAGE HIPCUDA::cuda_driverlinked conditionally whenCUDAToolkit_FOUND- needed on the NVIDIA dev path becausehipDeviceGetAttributeroutes throughlibcuda.so; will be removed in Phase 5CMakePresets.jsonhip-debugandhip-release- the permanent AMD presetship-nvidia-debugandhip-nvidia-release- NVIDIA dev presets, to be deleted after AMD validationHIP test files (
tests/*_hip.cpp)Three new HIP test files mirroring the existing
*_gpu.cppsuite. They call the HIP API directly with noGPU_*abstraction macros - on AMD this is native; on the NVIDIA dev path the portability headers handle the mapping:tests/allocator_test_gpu_hip.cpptests/utility_test_cpu_gpu_hip.cpptests/unique_ptr_test_gpu_hip.cppDemo (
demo/)Four new HIP demo source files and test,
demo/CMakeLists.txtextended with the samehip::host/CUDA::cuda_drivertreatment,demo/README.mdupdated with AMD build instructions.What remains after this PR (Last Phase)
Once tested on AMD hardware, a small follow-up is needed to remove the NVIDIA dev scaffolding:
hip-nvidia-debug,hip-nvidia-release,default-hip-nvidiafromCMakePresets.jsonfind_package(CUDAToolkit ...)andCUDA::cuda_driverlink blocks fromCMakeLists.txtanddemo/CMakeLists.txtNo header or test changes are needed - the
*_hip.cppfiles already call the HIP API natively.Files changed
include/battery/utility.hpp,allocator.hpp,memory.hpp,unique_ptr.hpp,vector.hppCMakeLists.txt,CMakePresets.json,demo/CMakeLists.txttests/allocator_test_gpu_hip.cpp,utility_test_cpu_gpu_hip.cpp,unique_ptr_test_gpu_hip.cppdemo/src/simple_hip.cpp,demo_hip.cpp,inkernel_allocation_hip.cpp,demo/tests/demo_test_gpu_hip.cppREADME.md,demo/README.md,CHANGELOG.md18 files changed, 1 446 insertions(+), 195 deletions(−).
TODO
hip-nvidia-*related code