-
Notifications
You must be signed in to change notification settings - Fork 15.3k
AMDGPU] ISel & PEI for whole wave functions #131334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs are WWM registers, which are then spilled and restored with the usual logic. I'm still working on the GlobalISel support and on adding some docs in AMDGPUUsage.rst. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic, a codegen prepare patch that looks for the callees of that intrinsic and marks them as whole wave functions, and probably a lot of optimization work.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
perlfu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick look.
This seems pretty clean as is.
However, I had assumed this would introduce a new calling convention, and WWM functions would use that calling convention?
Then there is no SubtargetFeature, etc
| // In practice, all the VGPRs are WWM registers, and we will need to save at | ||
| // least their inactive lanes. Add them to WWMReservedRegs. | ||
| assert(!NeedExecCopyReservedReg && "Whole wave functions can use the reg mapped for their i1 argument"); | ||
| for (MCRegister Reg : AMDGPU::VGPR_32RegClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be expensive.
It's probably fine for making this work, but long term I think we'd need to do this differently.
This reverts commit c6e9211d5644061521cbce8edac7c475c83b01d6.
|
Reopened with graphite #145858 |
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic,
llvm.amdgcn.call.whole.wave(to be added ina future patch). These functions are meant as an alternative to the
llvm.amdgcn.init.whole.waveorllvm.amdgcn.strict.wwmintrinsics.Whole wave functions will have the
amdgpu_gfx_whole_wavecallingconvention and will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument,
i1 %active, that is going to be mapped to EXEC. The inactivelanes need to be preserved for all registers used, active lanes only for
the CSRs.
At the IR level, arguments to a whole wave function (other than
%active) contain poison in their inactive lanes. Likewise, the returnvalue for the inactive lanes is poison.
This patch contains the following work:
amdgpu_gfx_whole_waveused for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
a SReg_1 representing
%active, which needs to be passed intoSI_WHOLE_WAVE_FUNC_RETURN.
including the special handling of %active. Since the return may be in a different
basic block, it's difficult to add the virtual reg for %active to
SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
which is later replaced via a custom inserter.
marks any used VGPRs are WWM registers, which are then spilled and
restored with the usual logic.
Future patches will include the
llvm.amdgcn.call.whole.waveintrinsic,and probably a lot of optimization work.