-
Notifications
You must be signed in to change notification settings - Fork 445
Description
I recently tested Chapels AMD GPU support with ROCm 6.3 (not yet officially supported by Chapel). This issue captures the success and failures I had with this.
By editing util/chplenv/chpl_gpu.py, I could build and run make check with ROCm 6.3.0
patch file
diff --git a/util/chplenv/chpl_gpu.py b/util/chplenv/chpl_gpu.py
index 41890d8c469..132ef95c1dc 100644
--- a/util/chplenv/chpl_gpu.py
+++ b/util/chplenv/chpl_gpu.py
@@ -541,7 +541,7 @@ def _validate_rocm_version_impl():
MIN_REQ_VERSION = "5.0"
MAX_REQ_VERSION = "5.5"
MIN_ROCM6_REQ_VERSION = "6"
- MAX_ROCM6_REQ_VERSION = "6.3"
+ MAX_ROCM6_REQ_VERSION = "6.4"
rocm_version = get_sdk_version()
As a spot check, I ran the following tests. This were selected based on tests I have seen fail when upgrading ROCm versions without additional effort
- test/gpu/native/jacobi/jacobi.chpl
- test/gpu/native/reduction/basic.chpl
- test/gpu/native/mathOps.chpl
- test/gpu/native/gpuWritelnAndAssertOnGpu.chpl
The only one that failed was gpuWritelnAndAssertOnGpu with a segfault, which is a usual suspect for ROCm failures. This test relies on interop and printf/varargs, which frequently triggers edge cases. Note that this is one of the tests that led us to rely on an AMD LLVM for several ROCm versions in the 5.x era.
The caveat to this is that I tested with ROCm 6.3.0 and do not have access to other versions right now (the latest at the time of writing this is 6.3.3)