-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[Offload][cmake] Add GPU test job limit for AMDGPU buildbot cmake cache #146611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added GPU test job limit to make it consistent with current config https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77
|
@llvm/pr-subscribers-offload @llvm/pr-subscribers-backend-amdgpu Author: None (Kewen12) ChangesAdded GPU test job limit to make it consistent with current config https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77 Full diff: https://github.com/llvm/llvm-project/pull/146611.diff 1 Files Affected:
diff --git a/offload/cmake/caches/AMDGPULibcBot.cmake b/offload/cmake/caches/AMDGPULibcBot.cmake
index 728dfe3f0a3f1..a772043c79669 100644
--- a/offload/cmake/caches/AMDGPULibcBot.cmake
+++ b/offload/cmake/caches/AMDGPULibcBot.cmake
@@ -18,3 +18,4 @@ set(CLANG_DEFAULT_RTLIB "compiler-rt" STRING "")
set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_JOBS 4 CACHE STRING "")
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not entirely sure this is safe, but it's worth a shot. Maybe they fixed some HSA bugs since I last checked.
|
Thanks for the help @jhuber6! I might not have full context here, you mean enabling this flag may not be safe? |
Yes, the HSA runtime would routinely crash when many of these tests were run in parallel. I poked at it through https://github.com/jhuber6/hsa_test awhile back, pretty much just found that loading binaries in parallel would crash depending on the machine. |
|
"routinely crash" love it :-D |
This sounds like a loader issue. CC @kzhuravl |
Who knows, maybe they fixed it, haven't checked in awhile. |
|
We've been running this config on the current libc bot for about 6 months now or so (ROCm 6.2 and ROCm 6.3) and did not see spurious fails in that time. |
Added GPU test job limit to make it consistent with current config https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77