-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1351,6 +1351,28 @@ constexpr bool isEntryFunctionCC(CallingConv::ID CC) { | |
| } | ||
| } | ||
|
|
||
| // Shaders that are entry functions need to count input arguments even if | ||
| // they're not used (i.e. not reported by AMDGPUResourceUsageAnalysis). Other | ||
| // functions can skip including them. This is especially important for shaders | ||
| // that use the init.whole.wave intrinsic, since they sometimes have VGPR | ||
| // arguments that are only added for the purpose of preserving their inactive | ||
| // lanes and should not be included in the vgpr-count. | ||
| LLVM_READNONE | ||
| constexpr bool shouldReportUnusedFuncArgs(CallingConv::ID CC) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Name should express the reason, not the usage context. Although here I don't understand why you're going out of your way to exclude kernels. The same reasoning should apply when using preloaded arguments
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you suggest a better name? This is mostly just an implementation detail. Maybe it shouldn't be in
Graphics and kernels handle hardware-initialized registers a bit differently. For graphics, we're putting them as arguments to the IR functions, and for compute we track them in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just handle all entry points. I don't see any sensible reason why this would exclude compute entry points.
I think you misunderstand. Compute know has a preloading kernel argument optimization, where the values appear in the IR kernel argument list exactly the same way as graphics. There is no fundamental difference here, it's programming the same registers even if that weren't the case.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I get that, but the current code already treats kernels differently and changing that would cause a lot of test churn that's not related to this patch. At the moment we don't include unused VGPR arguments for kernels and I'm trying to preserve that behavior. |
||
| switch (CC) { | ||
| case CallingConv::AMDGPU_VS: | ||
| case CallingConv::AMDGPU_LS: | ||
| case CallingConv::AMDGPU_HS: | ||
| case CallingConv::AMDGPU_ES: | ||
| case CallingConv::AMDGPU_GS: | ||
| case CallingConv::AMDGPU_PS: | ||
| case CallingConv::AMDGPU_CS: | ||
| return true; | ||
| default: | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| LLVM_READNONE | ||
| constexpr bool isChainCC(CallingConv::ID CC) { | ||
| switch (CC) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this huge switch is great but does it have to be part of this patch? Can it be a separate NFC thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I did consider removing it in a separate patch, but then there would be no reason for the whole loop to exist. So I'd end up removing the loop in one patch and then re-adding the next, which felt kind of silly.