Skip to content

Commit 9e209a4

Browse files
authored
[Offload] Use the kernel argument size directly in AMDGPU offloading (llvm#94667)
Summary: The old COV3 implementation of HSA used to omit the implicit arguments from the kernel argument size. For COV4 and COV5 this is no longer the case so we can simply use the size reported from the symbol information. See ROCm/ROCR-Runtime#117 (comment)
1 parent 9293fc7 commit 9e209a4

File tree

1 file changed

+1
-7
lines changed
  • offload/plugins-nextgen/amdgpu/src

1 file changed

+1
-7
lines changed

offload/plugins-nextgen/amdgpu/src/rtl.cpp

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3272,19 +3272,13 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
32723272
if (ArgsSize < KernelArgsSize)
32733273
return Plugin::error("Mismatch of kernel arguments size");
32743274

3275-
// The args size reported by HSA may or may not contain the implicit args.
3276-
// For now, assume that HSA does not consider the implicit arguments when
3277-
// reporting the arguments of a kernel. In the worst case, we can waste
3278-
// 56 bytes per allocation.
3279-
uint32_t AllArgsSize = KernelArgsSize + ImplicitArgsSize;
3280-
32813275
AMDGPUPluginTy &AMDGPUPlugin =
32823276
static_cast<AMDGPUPluginTy &>(GenericDevice.Plugin);
32833277
AMDHostDeviceTy &HostDevice = AMDGPUPlugin.getHostDevice();
32843278
AMDGPUMemoryManagerTy &ArgsMemoryManager = HostDevice.getArgsMemoryManager();
32853279

32863280
void *AllArgs = nullptr;
3287-
if (auto Err = ArgsMemoryManager.allocate(AllArgsSize, &AllArgs))
3281+
if (auto Err = ArgsMemoryManager.allocate(ArgsSize, &AllArgs))
32883282
return Err;
32893283

32903284
// Account for user requested dynamic shared memory.

0 commit comments

Comments
 (0)