Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
0e4dd51
[InstCombine] Optimize AMDGPU ballot + assume uniformity patterns
TejaX-Alaghari Sep 25, 2025
258ecfe
[InstCombine] Add constant folding for AMDGPU ballot intrinsics
TejaX-Alaghari Sep 29, 2025
408bc09
[InstCombine] Implement generic assume-based uniformity optimization
TejaX-Alaghari Sep 29, 2025
f0ed709
[InstCombine] Add focused assume-based optimizations
TejaX-Alaghari Oct 2, 2025
2f5587e
Address @ssahasra's review feedback
TejaX-Alaghari Oct 5, 2025
b7a013c
Address feedback on the location of the opt
TejaX-Alaghari Oct 7, 2025
1fbc805
Refactored the ballot optimization condition into propagateEquality m…
TejaX-Alaghari Oct 10, 2025
c7a1f6a
Implement reviewer's suggestions to -
TejaX-Alaghari Oct 31, 2025
08b6db2
Moved the assume based ballot folding logic to AMDGPUInstCombineIntri…
TejaX-Alaghari Nov 10, 2025
20ad9d2
Simplified the logic to check for CompareValue and return nullptr whe…
TejaX-Alaghari Nov 11, 2025
5e3d47e
Skip the optimization when ballot size < wave size
TejaX-Alaghari Nov 11, 2025
9b6e186
Only proceed with the optiization if ballot arg is an instruction
TejaX-Alaghari Nov 12, 2025
0312bf5
Add additional check to demonstrate E2E impact of this optimization
TejaX-Alaghari Nov 12, 2025
bc961db
Address feedback:1. Add the condition to make sure ballot width match…
TejaX-Alaghari Nov 13, 2025
f4f1277
Removed unnecessary temp varaible for holding wave front size
TejaX-Alaghari Nov 13, 2025
b19b765
Skip only constant args
TejaX-Alaghari Nov 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "AMDGPUTargetTransformInfo.h"
#include "GCNSubtarget.h"
#include "llvm/ADT/FloatingPointMode.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/Transforms/InstCombine/InstCombiner.h"
Expand Down Expand Up @@ -1341,6 +1342,75 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
Call->takeName(&II);
return IC.replaceInstUsesWith(II, Call);
}

// Fold ballot intrinsic based on llvm.assume hint about the result.
//
// assume(ballot(x) == ballot(true)) -> x = true
// assume(ballot(x) == -1) -> x = true
// assume(ballot(x) == 0) -> x = false
Comment on lines +1348 to +1350
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow ballot.i32 in wave64 mode, which would break these optimizations.

Copy link
Author

@TejaX-Alaghari TejaX-Alaghari Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of this! thanks for pointing out.

We allow ballot.i32 in wave64 mode, which would break these optimizations.

If you're referring to this condition in Instruction selection, doesn't it mean that "emitting i64 ballots in wave32 mode is supported" and not the other way around.

Please let me know if I'm missing something in understanding this logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I thought there was also a plan to support "emitting i32 ballots in wave64 mode". Maybe that part was never implemented. I guess your patch is OK then, but personally I would add a check that the ballot size matches the wave size just for safety.

Copy link
Author

@TejaX-Alaghari TejaX-Alaghari Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check in a recent commit for proceeding with this optimization only when ballotWidth >= waveSize.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to all that trouble, then the tests should check all four combinations: i32 ballot in wave32, i32 ballot in wave64, etc.

//
// Skip if Arg is a constant.
if (isa<Constant>(Arg))
break;

// Skip if ballot width doesn't match wave size.
if (ST->getWavefrontSize() != II.getType()->getIntegerBitWidth())
break;

// For each llvm.assume that references the ballot intrinsic, try to infer
// the value of the ballot's condition argument from the assumed relation.
for (auto &AssumeVH : IC.getAssumptionCache().assumptionsFor(&II)) {
if (!AssumeVH)
continue;

auto *Assume = cast<AssumeInst>(AssumeVH);
Value *Cond = Assume->getArgOperand(0);

// Pattern match: assume(icmp eq ballot, CompareValue)
ICmpInst *ICI = dyn_cast<ICmpInst>(Cond);
if (!ICI || ICI->getPredicate() != ICmpInst::ICMP_EQ)
continue;

Value *CompareValue;
if (!match(ICI, m_c_ICmp(m_Specific(&II), m_Value(CompareValue))))
continue;

// Determine the inferred value of the ballot's condition argument.
bool InferredCondValue;
if (auto *CI = dyn_cast<ConstantInt>(CompareValue)) {
if (CI->isMinusOne()) {
// ballot(x) == -1 means all lanes have x = true.
InferredCondValue = true;
} else if (CI->isZero()) {
// ballot(x) == 0 means all lanes have x = false.
InferredCondValue = false;
} else {
continue;
}
} else if (match(CompareValue,
m_Intrinsic<Intrinsic::amdgcn_ballot>(m_One()))) {
// ballot(x) == ballot(true) means x = true (EXEC mask comparison).
InferredCondValue = true;
} else {
continue;
}

Constant *ReplacementValue =
ConstantInt::getBool(Arg->getContext(), InferredCondValue);

// Replace uses of the condition argument dominated by the assume.
bool Changed = false;
Arg->replaceUsesWithIf(ReplacementValue, [&](Use &U) {
Instruction *UserInst = dyn_cast<Instruction>(U.getUser());
bool Dominates = UserInst && IC.getDominatorTree().dominates(Assume, U);
Changed |= Dominates;
return Dominates;
});

if (Changed)
return nullptr;
}

break;
}
case Intrinsic::amdgcn_wavefrontsize: {
Expand Down
Loading