Skip to content

Conversation

@rovka
Copy link
Collaborator

@rovka rovka commented Mar 14, 2025

Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, llvm.amdgcn.call.whole.wave (to be added in
a future patch). These functions are meant as an alternative to the
llvm.amdgcn.init.whole.wave or llvm.amdgcn.strict.wwm intrinsics.

Whole wave functions will have the amdgpu_gfx_whole_wave calling
convention and will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, i1 %active, that is going to be mapped to EXEC. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
%active) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:

  • A new calling convention, amdgpu_gfx_whole_wave
  • 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
    used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
    a SReg_1 representing %active, which needs to be passed into
    SI_WHOLE_WAVE_FUNC_RETURN.
  • SelectionDAG support for generating these 2 new pseudos,
    including the special handling of %active. Since the return may be in a different
    basic block, it's difficult to add the virtual reg for %active to
    SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
    which is later replaced via a custom inserter.
  • GlobalISel support (pretty straightforward)
  • Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
    marks any used VGPRs are WWM registers, which are then spilled and
    restored with the usual logic.

Future patches will include the llvm.amdgcn.call.whole.wave intrinsic,
and probably a lot of optimization work.

rovka added 2 commits March 14, 2025 14:17
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs are WWM registers, which are then spilled and
  restored with the usual logic.

I'm still working on the GlobalISel support and on adding some docs in
AMDGPUUsage.rst.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic,
a codegen prepare patch that looks for the callees of that intrinsic and
marks them as whole wave functions, and probably a lot of optimization
work.
@rovka rovka requested a review from perlfu March 14, 2025 13:48
@github-actions
Copy link

github-actions bot commented Mar 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look.
This seems pretty clean as is.
However, I had assumed this would introduce a new calling convention, and WWM functions would use that calling convention?
Then there is no SubtargetFeature, etc

// In practice, all the VGPRs are WWM registers, and we will need to save at
// least their inactive lanes. Add them to WWMReservedRegs.
assert(!NeedExecCopyReservedReg && "Whole wave functions can use the reg mapped for their i1 argument");
for (MCRegister Reg : AMDGPU::VGPR_32RegClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be expensive.
It's probably fine for making this work, but long term I think we'd need to do this differently.

@rovka rovka marked this pull request as ready for review May 20, 2025 11:08
@rovka
Copy link
Collaborator Author

rovka commented Jun 26, 2025

Reopened with graphite #145858

@rovka rovka closed this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants