AMDGPU] ISel & PEI for whole wave functions #131334

rovka · 2025-03-14T13:48:00Z

Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, llvm.amdgcn.call.whole.wave (to be added in
a future patch). These functions are meant as an alternative to the
llvm.amdgcn.init.whole.wave or llvm.amdgcn.strict.wwm intrinsics.

Whole wave functions will have the amdgpu_gfx_whole_wave calling
convention and will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, i1 %active, that is going to be mapped to EXEC. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
%active) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:

A new calling convention, amdgpu_gfx_whole_wave
2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
a SReg_1 representing %active, which needs to be passed into
SI_WHOLE_WAVE_FUNC_RETURN.
SelectionDAG support for generating these 2 new pseudos,
including the special handling of %active. Since the return may be in a different
basic block, it's difficult to add the virtual reg for %active to
SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
which is later replaced via a custom inserter.
GlobalISel support (pretty straightforward)
Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
marks any used VGPRs are WWM registers, which are then spilled and
restored with the usual logic.

Future patches will include the llvm.amdgcn.call.whole.wave intrinsic,
and probably a lot of optimization work.

Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs are WWM registers, which are then spilled and restored with the usual logic. I'm still working on the GlobalISel support and on adding some docs in AMDGPUUsage.rst. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic, a codegen prepare patch that looks for the callees of that intrinsic and marks them as whole wave functions, and probably a lot of optimization work.

github-actions · 2025-03-14T13:51:36Z

✅ With the latest revision this PR passed the C/C++ code formatter.

perlfu

Had a quick look.
This seems pretty clean as is.
However, I had assumed this would introduce a new calling convention, and WWM functions would use that calling convention?
Then there is no SubtargetFeature, etc

perlfu · 2025-03-15T07:23:02Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

+    // In practice, all the VGPRs are WWM registers, and we will need to save at
+    // least their inactive lanes. Add them to WWMReservedRegs.
+    assert(!NeedExecCopyReservedReg && "Whole wave functions can use the reg mapped for their i1 argument");
+    for (MCRegister Reg : AMDGPU::VGPR_32RegClass)


This is going to be expensive.
It's probably fine for making this work, but long term I think we'd need to do this differently.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This reverts commit c6e9211d5644061521cbce8edac7c475c83b01d6.

rovka · 2025-06-26T09:04:34Z

Reopened with graphite #145858

rovka added 2 commits March 14, 2025 14:17

Add subtarget feature

a525bba

rovka requested a review from perlfu March 14, 2025 13:48

perlfu reviewed Mar 15, 2025

View reviewed changes

rovka added 16 commits March 17, 2025 12:47

Use MF instead of MBB

399e08c

Revert "Add subtarget feature"

8f72b59

This reverts commit c6e9211d5644061521cbce8edac7c475c83b01d6.

Add new CC. Do nothing

accbe8e

Replace SubtargetFeature with CallingConv

1a82d88

Enable gisel in tests

ea3821b

GISel support

1b20edd

Rename pseudo to match others

5e97750

Rename CC

be094ce

Fix formatting

b1a17c6

Merge branch 'main' into whole-wave-funcs

75017e9

Merge remote-tracking branch 'remotes/origin/main' into whole-wave-funcs

4c6beec

Update tests after merge

80e6433

Fix bug in testcase

552e220

Test inreg args

7ed7e96

Merge remote-tracking branch 'remotes/origin/main' into whole-wave-funcs

8325ef1

Add docs and fixme

e1f133e

rovka marked this pull request as ready for review May 20, 2025 11:08

rovka mentioned this pull request May 26, 2025

[AMDGPU] Skip register uses in AMDGPUResourceUsageAnalysis #133242

Merged

rovka added 7 commits June 24, 2025 10:56

Remove kill flags on orig exec mask

ac70a87

Add helper to add orig exec to return

08102a3

Test with single use of orig exec

1cd402f

Test calling gfx func from wwf

e8fc4bd

Test wave64

8feed10

Merge remote-tracking branch 'remotes/origin/main' into whole-wave-funcs

bc7b9ef

Merge remote-tracking branch 'remotes/origin/main' into whole-wave-funcs

ba08290

rovka added 2 commits June 24, 2025 13:14

Fix a few missed spots

bc8d8ce

clang-format

0eb6c66

rovka closed this Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU] ISel & PEI for whole wave functions #131334

AMDGPU] ISel & PEI for whole wave functions #131334

Uh oh!

rovka commented Mar 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2025 •

edited

Loading

Uh oh!

perlfu left a comment

Uh oh!

perlfu Mar 15, 2025

Uh oh!

Uh oh!

rovka commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AMDGPU] ISel & PEI for whole wave functions #131334

AMDGPU] ISel & PEI for whole wave functions #131334

Uh oh!

Conversation

rovka commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

perlfu left a comment

Choose a reason for hiding this comment

Uh oh!

perlfu Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rovka commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rovka commented Mar 14, 2025 •

edited

Loading

github-actions bot commented Mar 14, 2025 •

edited

Loading