[libc] Handle differing wavefront sizes correctly in the AMDHSA loader #117788

jhuber6 · 2024-11-26T20:55:17Z

Summary:
The AMDGPU backend can handle wavefront sizes of 32 and 64, with the
native hardware preferring one or the other. The user can override the
hardware with -mwavefrontsize64 or -mwavefrontsize32 which
previously wasn't handled. We need to know the wavefront size to know
how much memory to allocate and how to index the RPC buffer. There isn't
a good way to do this with ROCm so we just use the LLVM support for
offloading to check this from the image.

Summary: The AMDGPU backend can handle wavefront sizes of 32 and 64, with the native hardware preferring one or the other. The user can override the hardware with `-mwavefrontsize64` or `-mwavefrontsize32` which previously wasn't handled. We need to know the wavefront size to know how much memory to allocate and how to index the RPC buffer. There isn't a good way to do this with ROCm so we just use the LLVM support for offloading to check this from the image.

llvmbot · 2024-11-26T20:55:51Z

@llvm/pr-subscribers-libc

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
The AMDGPU backend can handle wavefront sizes of 32 and 64, with the
native hardware preferring one or the other. The user can override the
hardware with -mwavefrontsize64 or -mwavefrontsize32 which
previously wasn't handled. We need to know the wavefront size to know
how much memory to allocate and how to index the RPC buffer. There isn't
a good way to do this with ROCm so we just use the LLVM support for
offloading to check this from the image.

Full diff: https://github.com/llvm/llvm-project/pull/117788.diff

2 Files Affected:

(modified) libc/utils/gpu/loader/amdgpu/CMakeLists.txt (+1)
(modified) libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp (+27-19)

diff --git a/libc/utils/gpu/loader/amdgpu/CMakeLists.txt b/libc/utils/gpu/loader/amdgpu/CMakeLists.txt
index 46c5631046ce20..80c5ae357416af 100644
--- a/libc/utils/gpu/loader/amdgpu/CMakeLists.txt
+++ b/libc/utils/gpu/loader/amdgpu/CMakeLists.txt
@@ -3,6 +3,7 @@ set(LLVM_LINK_COMPONENTS
   Object
   Option
   Support
+  FrontendOffloading
   )
 
 add_llvm_executable(amdhsa-loader amdhsa-loader.cpp)
diff --git a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
index 13a13668335471..5a9fe87077328e 100644
--- a/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
+++ b/libc/utils/gpu/loader/amdgpu/amdhsa-loader.cpp
@@ -28,6 +28,8 @@
 #include "hsa/hsa_ext_amd.h"
 #endif
 
+#include "llvm/Frontend/Offloading/Utility.h"
+
 #include <atomic>
 #include <cstdio>
 #include <cstdlib>
@@ -163,17 +165,13 @@ hsa_status_t launch_kernel(hsa_agent_t dev_agent, hsa_executable_t executable,
                            hsa_queue_t *queue, rpc::Server &server,
                            const LaunchParameters &params,
                            const char *kernel_name, args_t kernel_args,
-                           bool print_resource_usage) {
+                           uint32_t wavefront_size, bool print_resource_usage) {
   // Look up the kernel in the loaded executable.
   hsa_executable_symbol_t symbol;
   if (hsa_status_t err = hsa_executable_get_symbol_by_name(
           executable, kernel_name, &dev_agent, &symbol))
     return err;
 
-  uint32_t wavefront_size = 0;
-  if (hsa_status_t err = hsa_agent_get_info(
-          dev_agent, HSA_AGENT_INFO_WAVEFRONT_SIZE, &wavefront_size))
-    handle_error(err);
   // Retrieve different properties of the kernel symbol used for launch.
   uint64_t kernel;
   uint32_t args_size;
@@ -419,6 +417,16 @@ int load(int argc, const char **argv, const char **envp, void *image,
               dev_agent, &coarsegrained_pool))
     handle_error(err);
 
+  // The AMDGPU target can change its wavefront size. There currently isn't a
+  // good way to look this up through the HSA API so we use the LLVM interface.
+  uint16_t abi_version;
+  llvm::StringRef image_ref(reinterpret_cast<char *>(image), size);
+  llvm::StringMap<llvm::offloading::amdgpu::AMDGPUKernelMetaData> info_map;
+  if (llvm::Error err = llvm::offloading::amdgpu::getAMDGPUMetaDataFromImage(
+          llvm::MemoryBufferRef(image_ref, ""), info_map, abi_version)) {
+    handle_error(llvm::toString(std::move(err)).c_str());
+  }
+
   // Allocate fine-grained memory on the host to hold the pointer array for the
   // copied argv and allow the GPU agent to access it.
   auto allocator = [&](uint64_t size) -> void * {
@@ -448,10 +456,10 @@ int load(int argc, const char **argv, const char **envp, void *image,
   hsa_amd_memory_fill(dev_ret, 0, /*count=*/1);
 
   // Allocate finegrained memory for the RPC server and client to share.
-  uint32_t wavefront_size = 0;
-  if (hsa_status_t err = hsa_agent_get_info(
-          dev_agent, HSA_AGENT_INFO_WAVEFRONT_SIZE, &wavefront_size))
-    handle_error(err);
+  uint32_t wavefront_size =
+      llvm::max_element(info_map, [](auto &&x, auto &&y) {
+        return x.second.WavefrontSize < y.second.WavefrontSize;
+      })->second.WavefrontSize;
 
   // Set up the RPC server.
   void *rpc_buffer;
@@ -513,7 +521,6 @@ int load(int argc, const char **argv, const char **envp, void *image,
   if (HSA_STATUS_SUCCESS ==
       hsa_executable_get_symbol_by_name(executable, "__llvm_libc_clock_freq",
                                         &dev_agent, &freq_sym)) {
-
     void *host_clock_freq;
     if (hsa_status_t err =
             hsa_amd_memory_pool_allocate(finegrained_pool, sizeof(uint64_t),
@@ -553,16 +560,17 @@ int load(int argc, const char **argv, const char **envp, void *image,
 
   LaunchParameters single_threaded_params = {1, 1, 1, 1, 1, 1};
   begin_args_t init_args = {argc, dev_argv, dev_envp};
-  if (hsa_status_t err = launch_kernel(dev_agent, executable, kernargs_pool,
-                                       coarsegrained_pool, queue, server,
-                                       single_threaded_params, "_begin.kd",
-                                       init_args, print_resource_usage))
+  if (hsa_status_t err = launch_kernel(
+          dev_agent, executable, kernargs_pool, coarsegrained_pool, queue,
+          server, single_threaded_params, "_begin.kd", init_args,
+          info_map["_begin"].WavefrontSize, print_resource_usage))
     handle_error(err);
 
   start_args_t args = {argc, dev_argv, dev_envp, dev_ret};
   if (hsa_status_t err = launch_kernel(
           dev_agent, executable, kernargs_pool, coarsegrained_pool, queue,
-          server, params, "_start.kd", args, print_resource_usage))
+          server, params, "_start.kd", args, info_map["_start"].WavefrontSize,
+          print_resource_usage))
     handle_error(err);
 
   void *host_ret;
@@ -580,10 +588,10 @@ int load(int argc, const char **argv, const char **envp, void *image,
   int ret = *static_cast<int *>(host_ret);
 
   end_args_t fini_args = {ret};
-  if (hsa_status_t err = launch_kernel(dev_agent, executable, kernargs_pool,
-                                       coarsegrained_pool, queue, server,
-                                       single_threaded_params, "_end.kd",
-                                       fini_args, print_resource_usage))
+  if (hsa_status_t err = launch_kernel(
+          dev_agent, executable, kernargs_pool, coarsegrained_pool, queue,
+          server, single_threaded_params, "_end.kd", fini_args,
+          info_map["_end"].WavefrontSize, print_resource_usage))
     handle_error(err);
 
   if (hsa_status_t err = hsa_amd_memory_pool_free(rpc_buffer))

shiltian

no test?

jhuber6 · 2024-11-26T22:26:37Z

no test?

Was kind of annoying, but it should work.

arsenm · 2024-11-27T01:16:04Z

libc/test/integration/startup/gpu/rpc_lane_test.cpp

+  ASSERT_TRUE(cnt == gpu::get_lane_id() + 1 && "Incorrect sum");
+  ASSERT_TRUE(gpu::get_thread_id() == gpu::get_lane_id() && "Not in same lane");


Is this just gtest? Should be EXPECT_EQ?

It's a little confusing, the libc has integration tests that simply run main with a really small subset of gtest. However, EXPECT_EQ does exist there, so I can use that instead.

arsenm · 2024-11-27T01:37:58Z

libc/test/integration/startup/gpu/rpc_lane_test.cpp

+      });
+  port.close();
+  EXPECT_EQ(cnt, gpu::get_lane_id() + 1);
+  EXPECT_EQ(gpu::get_thread_id(), gpu::get_lane_id());


Suggested change

EXPECT_EQ(gpu::get_thread_id(), gpu::get_lane_id());

EXPECT_EQ(gpu::get_thread_id(), gpu::get_lane_id()) << "Not in same lane";

Preserve the messages from before?

The macro doesn't take a message and since it's not a boolean I can't use the && trick where a const char pointer is evaluated as true.

Yes, it's a << after the failure

That macro expands to if (x == y) { ... } which won't accept an operator.

jhuber6 requested review from JonChesterfield, arsenm, jplehr, shiltian and yxsamliu November 26, 2024 20:55

llvmbot added backend:AMDGPU libc labels Nov 26, 2024

shiltian reviewed Nov 26, 2024

View reviewed changes

test

5b22893

arsenm reviewed Nov 27, 2024

View reviewed changes

test eq

641143c

arsenm reviewed Nov 27, 2024

View reviewed changes

shiltian approved these changes Nov 27, 2024

View reviewed changes

jhuber6 merged commit 38049dc into llvm:main Nov 27, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc] Handle differing wavefront sizes correctly in the AMDHSA loader #117788

[libc] Handle differing wavefront sizes correctly in the AMDHSA loader #117788

Uh oh!

jhuber6 commented Nov 26, 2024

Uh oh!

llvmbot commented Nov 26, 2024 •

edited

Loading

Uh oh!

shiltian left a comment

Uh oh!

jhuber6 commented Nov 26, 2024

Uh oh!

arsenm Nov 27, 2024

Uh oh!

jhuber6 Nov 27, 2024

Uh oh!

arsenm Nov 27, 2024

Uh oh!

jhuber6 Nov 27, 2024

Uh oh!

arsenm Nov 27, 2024

Uh oh!

jhuber6 Nov 27, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		ASSERT_TRUE(cnt == gpu::get_lane_id() + 1 && "Incorrect sum");
		ASSERT_TRUE(gpu::get_thread_id() == gpu::get_lane_id() && "Not in same lane");

	EXPECT_EQ(gpu::get_thread_id(), gpu::get_lane_id());
	EXPECT_EQ(gpu::get_thread_id(), gpu::get_lane_id()) << "Not in same lane";

[libc] Handle differing wavefront sizes correctly in the AMDHSA loader #117788

[libc] Handle differing wavefront sizes correctly in the AMDHSA loader #117788

Uh oh!

Conversation

jhuber6 commented Nov 26, 2024

Uh oh!

llvmbot commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Nov 26, 2024

Uh oh!

arsenm Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Nov 26, 2024 •

edited

Loading