Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -991,13 +991,7 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
// dispatch registers are function args.
unsigned WaveDispatchNumSGPR = 0, WaveDispatchNumVGPR = 0;

// Entry functions need to count input arguments even if they're not used
// (i.e. not reported by AMDGPUResourceUsageAnalysis). Other functions can
// skip including them. This is especially important for shaders that use the
// init.whole.wave intrinsic, since they sometimes have VGPR arguments that
// are only added for the purpose of preserving their inactive lanes and
// should not be included in the vgpr-count.
if (isShader(F.getCallingConv()) && isEntryFunctionCC(F.getCallingConv())) {
if (AMDGPU::shouldReportUnusedFuncArgs(F.getCallingConv())) {
bool IsPixelShader =
F.getCallingConv() == CallingConv::AMDGPU_PS && !STM.isAmdHsaOS();

Expand Down
22 changes: 22 additions & 0 deletions llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1351,6 +1351,28 @@ constexpr bool isEntryFunctionCC(CallingConv::ID CC) {
}
}

// Shaders that are entry functions need to count input arguments even if
// they're not used (i.e. not reported by AMDGPUResourceUsageAnalysis). Other
// functions can skip including them. This is especially important for shaders
// that use the init.whole.wave intrinsic, since they sometimes have VGPR
// arguments that are only added for the purpose of preserving their inactive
// lanes and should not be included in the vgpr-count.
LLVM_READNONE
constexpr bool shouldReportUnusedFuncArgs(CallingConv::ID CC) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name should express the reason, not the usage context. Although here I don't understand why you're going out of your way to exclude kernels. The same reasoning should apply when using preloaded arguments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a better name? This is mostly just an implementation detail. Maybe it shouldn't be in AMDGPUBaseInfo in the first place. Should I just move it to AMDGPUAsmPrinter.cpp?

Although here I don't understand why you're going out of your way to exclude kernels. The same reasoning should apply when using preloaded arguments

Graphics and kernels handle hardware-initialized registers a bit differently. For graphics, we're putting them as arguments to the IR functions, and for compute we track them in SIMachineFunctionInfo instead. We do handle the preloaded arguments in the same place in AMDGPUAsmPrinter, just on the else branch of where this helper is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you suggest a better name? This is mostly just an implementation detail. Maybe it shouldn't be in AMDGPUBaseInfo in the first place. Should I just move it to AMDGPUAsmPrinter.cpp?

Just handle all entry points. I don't see any sensible reason why this would exclude compute entry points.

Graphics and kernels handle hardware-initialized registers a bit differently. For graphics, we're putting them as arguments to the IR functions

I think you misunderstand. Compute know has a preloading kernel argument optimization, where the values appear in the IR kernel argument list exactly the same way as graphics. There is no fundamental difference here, it's programming the same registers even if that weren't the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get that, but the current code already treats kernels differently and changing that would cause a lot of test churn that's not related to this patch. At the moment we don't include unused VGPR arguments for kernels and I'm trying to preserve that behavior.

switch (CC) {
case CallingConv::AMDGPU_VS:
case CallingConv::AMDGPU_LS:
case CallingConv::AMDGPU_HS:
case CallingConv::AMDGPU_ES:
case CallingConv::AMDGPU_GS:
case CallingConv::AMDGPU_PS:
case CallingConv::AMDGPU_CS:
return true;
default:
return false;
}
}

LLVM_READNONE
constexpr bool isChainCC(CallingConv::ID CC) {
switch (CC) {
Expand Down
Loading