Skip to content

Commit b692ae9

Browse files
[InstCombine] Implement generic assume-based uniformity optimization
Implement a comprehensive generic optimization for assume intrinsics that extracts uniformity information and optimizes dominated uses. The optimization recognizes multiple patterns that establish value uniformity and replaces dominated uses with uniform constants. Addresses uniformity analysis optimization opportunities identified in AMDGPU ballot/readfirstlane + assume patterns for improved code generation through constant propagation.
1 parent 29607f6 commit b692ae9

File tree

6 files changed

+195
-187
lines changed

6 files changed

+195
-187
lines changed

.github/copilot-instructions.md

Lines changed: 4 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,4 @@
1-
# LLVM Project AI Coding Agent Instructions
2-
3-
## Architecture Overview
4-
5-
LLVM is a compiler infrastructure with modular components:
6-
- **Core LLVM** (`llvm/`): IR processing, optimizations, code generation
7-
- **Clang** (`clang/`): C/C++/Objective-C frontend
8-
- **LLD** (`lld/`): Linker
9-
- **libc++** (`libcxx/`): C++ standard library
10-
- **Target backends** (`llvm/lib/Target/{AMDGPU,X86,ARM,...}/`): Architecture-specific code generation
11-
12-
## Essential Development Workflows
13-
14-
### Build System (CMake + Ninja)
15-
```bash
16-
# Configure with common options for development
17-
cmake -G Ninja -S llvm-project/llvm -B build \
18-
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
19-
-DLLVM_ENABLE_PROJECTS="clang;lld" \
20-
-DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" \
21-
-DLLVM_ENABLE_ASSERTIONS=ON
22-
23-
# Build and install
24-
cmake --build build
25-
cmake --install build --prefix install/
26-
```
27-
28-
### Testing with LIT
29-
- Use `opt < file.ll -passes=instcombine -S | FileCheck %s` pattern for IR transforms
30-
- Test files go in `llvm/test/Transforms/{PassName}/` with `.ll` extension
31-
- Always include both positive and negative test cases
32-
- Use `CHECK-LABEL:` for function boundaries, `CHECK-NEXT:` for strict sequence
33-
34-
### Key Patterns for Transforms
35-
36-
**InstCombine Pattern** (`llvm/lib/Transforms/InstCombine/`):
37-
- Implement in `InstCombine*.cpp` using visitor pattern (`visitCallInst`, `visitBinaryOperator`)
38-
- Use `PatternMatch.h` matchers: `match(V, m_Add(m_Value(X), m_ConstantInt()))`
39-
- Return `nullptr` for no change, modified instruction, or replacement
40-
- Add to worklist with `Worklist.pushValue()` for dependent values
41-
42-
**Target-Specific Intrinsics**:
43-
- AMDGPU: `@llvm.amdgcn.*` intrinsics in `llvm/include/llvm/IR/IntrinsicsAMDGPU.td`
44-
- Pattern: `if (II->getIntrinsicID() == Intrinsic::amdgcn_ballot)`
45-
46-
## Code Quality Standards
47-
48-
### Control Flow & Debug Info
49-
When modifying control flow, ensure changes don't corrupt:
50-
- Performance profiling data (branch weights, call counts)
51-
- Debug information for branches and calls
52-
- Exception handling unwind information
53-
54-
### Target-Specific Considerations
55-
- **AMDGPU**: Wavefront uniformity analysis affects ballot intrinsics
56-
- **X86**: Vector width and ISA feature dependencies
57-
- Use `TargetTransformInfo` for cost models and capability queries
58-
59-
### Testing Requirements
60-
- Every optimization needs regression tests showing before/after IR
61-
- Include edge cases: constants, undef, poison values
62-
- Test target-specific intrinsics with appropriate triple
63-
- Use `; RUN: opt < %s -passes=... -S | FileCheck %s` format
64-
65-
## Common Development Pitfalls
66-
- Don't assume instruction operand order without checking `isCommutative()`
67-
- Verify type compatibility before creating new instructions
68-
- Consider poison/undef propagation in optimizations
69-
- Check for side effects before eliminating instructions
70-
71-
## Pass Pipeline Context
72-
- InstCombine runs early and multiple times in the pipeline
73-
- Subsequent passes like SimplifyCFG will clean up control flow
74-
- Use `replaceAllUsesWith()` carefully to maintain SSA form
1+
When performing a code review, pay close attention to code modifying a function's
2+
control flow. Could the change result in the corruption of performance profile
3+
data? Could the change result in invalid debug information, in particular for
4+
branches and calls?

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1322,12 +1322,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
13221322
if (isa<PoisonValue>(Arg))
13231323
return IC.replaceInstUsesWith(II, PoisonValue::get(II.getType()));
13241324

1325-
if (auto *Src = dyn_cast<ConstantInt>(Arg)) {
1326-
if (Src->isZero()) {
1327-
// amdgcn.ballot(i1 0) is zero.
1328-
return IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType()));
1329-
}
1330-
}
1325+
// For Wave32 targets, convert i64 ballot to i32 ballot + zext
13311326
if (ST->isWave32() && II.getType()->getIntegerBitWidth() == 64) {
13321327
// %b64 = call i64 ballot.i64(...)
13331328
// =>
@@ -1341,6 +1336,15 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
13411336
Call->takeName(&II);
13421337
return IC.replaceInstUsesWith(II, Call);
13431338
}
1339+
1340+
if (auto *Src = dyn_cast<ConstantInt>(Arg)) {
1341+
if (Src->isZero()) {
1342+
// amdgcn.ballot(i1 0) is zero.
1343+
return IC.replaceInstUsesWith(II, Constant::getNullValue(II.getType()));
1344+
}
1345+
// Note: ballot(true) is NOT constant folded because the result depends
1346+
// on the active lanes in the wavefront, not just the condition value.
1347+
}
13441348
break;
13451349
}
13461350
case Intrinsic::amdgcn_wavefrontsize: {

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Lines changed: 117 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,6 @@ using namespace PatternMatch;
8585

8686
STATISTIC(NumSimplified, "Number of library calls simplified");
8787

88-
89-
9088
static cl::opt<unsigned> GuardWideningWindow(
9189
"instcombine-guard-widening-window",
9290
cl::init(3),
@@ -2989,20 +2987,6 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
29892987
}
29902988
break;
29912989
}
2992-
case Intrinsic::amdgcn_ballot: {
2993-
// Optimize ballot intrinsics when the condition is known to be uniform
2994-
Value *Condition = II->getArgOperand(0);
2995-
2996-
// If the condition is a constant, we can evaluate the ballot directly
2997-
if (auto *ConstCond = dyn_cast<ConstantInt>(Condition)) {
2998-
// ballot(true) -> -1 (all lanes active)
2999-
// ballot(false) -> 0 (no lanes active)
3000-
uint64_t Result = ConstCond->isOne() ? ~0ULL : 0ULL;
3001-
return replaceInstUsesWith(*II, ConstantInt::get(II->getType(), Result));
3002-
}
3003-
3004-
break;
3005-
}
30062990
case Intrinsic::ldexp: {
30072991
// ldexp(ldexp(x, a), b) -> ldexp(x, a + b)
30082992
//
@@ -3556,8 +3540,6 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
35563540
}
35573541
}
35583542

3559-
3560-
35613543
// If there is a dominating assume with the same condition as this one,
35623544
// then this one is redundant, and should be removed.
35633545
KnownBits Known(1);
@@ -3571,7 +3553,9 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
35713553
return eraseInstFromFunction(*II);
35723554
}
35733555

3574-
3556+
// Try to extract uniformity information from the assume and optimize
3557+
// dominated uses of any variables that are established as uniform.
3558+
optimizeAssumedUniformValues(cast<AssumeInst>(II));
35753559

35763560
// Update the cache of affected values for this assumption (we might be
35773561
// here because we just simplified the condition).
@@ -5026,3 +5010,117 @@ InstCombinerImpl::transformCallThroughTrampoline(CallBase &Call,
50265010
Call.setCalledFunction(FTy, NestF);
50275011
return &Call;
50285012
}
5013+
5014+
/// Extract uniformity information from assume and optimize dominated uses.
5015+
/// This works with any assume pattern that establishes value uniformity.
5016+
void InstCombinerImpl::optimizeAssumedUniformValues(AssumeInst *Assume) {
5017+
Value *AssumedCondition = Assume->getArgOperand(0);
5018+
5019+
// Map of uniform values to their uniform constants
5020+
SmallDenseMap<Value *, Constant *> UniformValues;
5021+
5022+
// Pattern 1: assume(icmp eq (X, C)) -> X is uniform and equals C
5023+
if (auto *ICmp = dyn_cast<ICmpInst>(AssumedCondition)) {
5024+
if (ICmp->getPredicate() == ICmpInst::ICMP_EQ) {
5025+
Value *LHS = ICmp->getOperand(0);
5026+
Value *RHS = ICmp->getOperand(1);
5027+
5028+
// X == constant -> X is uniform and equals constant
5029+
if (auto *C = dyn_cast<Constant>(RHS)) {
5030+
UniformValues[LHS] = C;
5031+
} else if (auto *C = dyn_cast<Constant>(LHS)) {
5032+
UniformValues[RHS] = C;
5033+
}
5034+
5035+
// Handle intrinsic patterns in equality comparisons
5036+
// Pattern: assume(ballot(cmp) == -1) -> cmp is uniform and true
5037+
if (auto *IntrinsicCall = dyn_cast<IntrinsicInst>(LHS)) {
5038+
if (IntrinsicCall->getIntrinsicID() == Intrinsic::amdgcn_ballot) {
5039+
if (match(RHS, m_AllOnes())) {
5040+
Value *BallotArg = IntrinsicCall->getArgOperand(0);
5041+
if (BallotArg->getType()->isIntegerTy(1)) {
5042+
UniformValues[BallotArg] = ConstantInt::getTrue(BallotArg->getType());
5043+
5044+
// Special case: if BallotArg is an equality comparison,
5045+
// we know the operands are equal
5046+
if (auto *CmpInst = dyn_cast<ICmpInst>(BallotArg)) {
5047+
if (CmpInst->getPredicate() == ICmpInst::ICMP_EQ) {
5048+
Value *CmpLHS = CmpInst->getOperand(0);
5049+
Value *CmpRHS = CmpInst->getOperand(1);
5050+
5051+
// If one operand is constant, the other is uniform and equals that constant
5052+
if (auto *C = dyn_cast<Constant>(CmpRHS)) {
5053+
UniformValues[CmpLHS] = C;
5054+
} else if (auto *C = dyn_cast<Constant>(CmpLHS)) {
5055+
UniformValues[CmpRHS] = C;
5056+
}
5057+
// TODO: Handle case where both operands are variables
5058+
}
5059+
}
5060+
}
5061+
}
5062+
} else if (IntrinsicCall->getIntrinsicID() == Intrinsic::amdgcn_readfirstlane) {
5063+
// assume(readfirstlane(x) == c) -> x is uniform and equals c
5064+
if (auto *C = dyn_cast<Constant>(RHS)) {
5065+
Value *ReadFirstLaneArg = IntrinsicCall->getArgOperand(0);
5066+
UniformValues[ReadFirstLaneArg] = C;
5067+
}
5068+
}
5069+
}
5070+
5071+
// Handle the reverse case too
5072+
if (auto *IntrinsicCall = dyn_cast<IntrinsicInst>(RHS)) {
5073+
if (IntrinsicCall->getIntrinsicID() == Intrinsic::amdgcn_ballot) {
5074+
if (match(LHS, m_AllOnes())) {
5075+
Value *BallotArg = IntrinsicCall->getArgOperand(0);
5076+
if (BallotArg->getType()->isIntegerTy(1)) {
5077+
UniformValues[BallotArg] = ConstantInt::getTrue(BallotArg->getType());
5078+
}
5079+
}
5080+
} else if (IntrinsicCall->getIntrinsicID() == Intrinsic::amdgcn_readfirstlane) {
5081+
if (auto *C = dyn_cast<Constant>(LHS)) {
5082+
Value *ReadFirstLaneArg = IntrinsicCall->getArgOperand(0);
5083+
UniformValues[ReadFirstLaneArg] = C;
5084+
}
5085+
}
5086+
}
5087+
}
5088+
}
5089+
5090+
// Pattern 2: assume(X) where X is i1 -> X is uniform and equals true
5091+
if (AssumedCondition->getType()->isIntegerTy(1) && !isa<ICmpInst>(AssumedCondition)) {
5092+
UniformValues[AssumedCondition] = ConstantInt::getTrue(AssumedCondition->getType());
5093+
}
5094+
5095+
// Now optimize dominated uses of all discovered uniform values
5096+
for (auto &[UniformValue, UniformConstant] : UniformValues) {
5097+
SmallVector<Use *, 8> DominatedUses;
5098+
5099+
// Find all uses dominated by the assume
5100+
// Skip if the value doesn't have a use list (e.g., constants)
5101+
if (!UniformValue->hasUseList())
5102+
continue;
5103+
5104+
for (Use &U : UniformValue->uses()) {
5105+
Instruction *UseInst = dyn_cast<Instruction>(U.getUser());
5106+
if (!UseInst || UseInst == Assume)
5107+
continue;
5108+
5109+
// Critical: Check dominance using InstCombine's infrastructure
5110+
if (isValidAssumeForContext(Assume, UseInst, &DT)) {
5111+
DominatedUses.push_back(&U);
5112+
}
5113+
}
5114+
5115+
// Replace only dominated uses with the uniform constant
5116+
for (Use *U : DominatedUses) {
5117+
U->set(UniformConstant);
5118+
Worklist.pushValue(U->getUser());
5119+
}
5120+
5121+
// Mark for further optimization if we made changes
5122+
if (!DominatedUses.empty()) {
5123+
Worklist.pushValue(UniformValue);
5124+
}
5125+
}
5126+
}

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,6 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
124124
BinaryOperator &I);
125125
Instruction *foldVariableSignZeroExtensionOfVariableHighBitExtract(
126126
BinaryOperator &OldAShr);
127-
128-
129127
Instruction *visitAShr(BinaryOperator &I);
130128
Instruction *visitLShr(BinaryOperator &I);
131129
Instruction *commonShiftTransforms(BinaryOperator &I);
@@ -231,6 +229,9 @@ class LLVM_LIBRARY_VISIBILITY InstCombinerImpl final
231229
private:
232230
bool annotateAnyAllocSite(CallBase &Call, const TargetLibraryInfo *TLI);
233231
bool isDesirableIntType(unsigned BitWidth) const;
232+
233+
/// Optimize uses of variables that are established as uniform by assume intrinsics.
234+
void optimizeAssumedUniformValues(AssumeInst *Assume);
234235
bool shouldChangeType(unsigned FromBitWidth, unsigned ToBitWidth) const;
235236
bool shouldChangeType(Type *From, Type *To) const;
236237
Value *dyn_castNegVal(Value *V) const;

0 commit comments

Comments
 (0)