Skip to content

Commit 2ad519c

Browse files
authored
[TritonGPU] Augment FuseNestedLoops to handle dependent inner loop bounds (#8132)
This PR teaches FuseNestedLoops to handle inner loops whose loop bounds are some (pure) function of the outer loop bounds. FuseNestedLoops slices the inner loop bound computations into a loop before the fused loop to compute the total number of fused iterations. Then, inside the first fused prologue, the inner loop lengths for the current outer loop iterations are computed. The pass supports a mix of inner loops whose bounds can be made outer loop invariant and those that are not. This patch also adds a small hack that pattern matches `tl.assume(ub > lb)` for the inner loop bounds to allow speculation (i.e. all inner loops execute at least once).
1 parent 27f406c commit 2ad519c

File tree

5 files changed

+419
-180
lines changed

5 files changed

+419
-180
lines changed

include/triton/Dialect/TritonGPU/Transforms/PipeliningUtility.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,10 @@ bool isPureScalarOp(Operation *op);
4040
bool getDominatingValueSetOpsToHoist(
4141
DominanceInfo &domInfo, Operation *refOp, ArrayRef<Value> valueSet,
4242
llvm::SetVector<Operation *> &toHoist,
43-
function_ref<bool(Operation *)> canHoist = isPureScalarOp);
43+
function_ref<bool(Operation *)> canHoist = isPureScalarOp,
44+
function_ref<bool(BlockArgument)> canUseArg = [](BlockArgument) {
45+
return false;
46+
});
4447

4548
// Hoist the given set of operations above the reference operation.
4649
void hoistOpsBefore(Operation *refOp,

0 commit comments

Comments
 (0)