Skip to content

ROCm 6.3 support #26934

@jabraham17

Description

@jabraham17

I recently tested Chapels AMD GPU support with ROCm 6.3 (not yet officially supported by Chapel). This issue captures the success and failures I had with this.

By editing util/chplenv/chpl_gpu.py, I could build and run make check with ROCm 6.3.0

patch file
diff --git a/util/chplenv/chpl_gpu.py b/util/chplenv/chpl_gpu.py
index 41890d8c469..132ef95c1dc 100644
--- a/util/chplenv/chpl_gpu.py
+++ b/util/chplenv/chpl_gpu.py
@@ -541,7 +541,7 @@ def _validate_rocm_version_impl():
     MIN_REQ_VERSION = "5.0"
     MAX_REQ_VERSION = "5.5"
     MIN_ROCM6_REQ_VERSION = "6"
-    MAX_ROCM6_REQ_VERSION = "6.3"
+    MAX_ROCM6_REQ_VERSION = "6.4"
 
     rocm_version = get_sdk_version()
 

As a spot check, I ran the following tests. This were selected based on tests I have seen fail when upgrading ROCm versions without additional effort

  • test/gpu/native/jacobi/jacobi.chpl
  • test/gpu/native/reduction/basic.chpl
  • test/gpu/native/mathOps.chpl
  • test/gpu/native/gpuWritelnAndAssertOnGpu.chpl

The only one that failed was gpuWritelnAndAssertOnGpu with a segfault, which is a usual suspect for ROCm failures. This test relies on interop and printf/varargs, which frequently triggers edge cases. Note that this is one of the tests that led us to rely on an AMD LLVM for several ROCm versions in the 5.x era.

The caveat to this is that I tested with ROCm 6.3.0 and do not have access to other versions right now (the latest at the time of writing this is 6.3.3)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions