Skip to content

Commit c966e3b

Browse files
authored
Speedup RewriteStackPtr by avoiding one mod.walk call (#4803)
Timings are obtained from compiling flex attn backward kernel: `1.0694 ( 6.3%) 1.0694 ( 10.7%) TritonIntelGPURewriteStackPtr` vs `0.8938 ( 5.2%) 0.8938 ( 8.8%) TritonIntelGPURewriteStackPtr` Relates to #4062 Signed-off-by: Anatoly Myachev <[email protected]>
1 parent 4f35147 commit c966e3b

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

third_party/intel/lib/TritonIntelGPUTransforms/RewriteStackPtr.cpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,9 @@ struct TritonIntelGPURewriteStackPtrPass
3838

3939
// 1: Process function arguments for root functions
4040
if (!usePoison) {
41-
mod.walk([&](FunctionOpInterface funcOp) {
42-
if (allocation.isRoot(funcOp)) {
43-
insertFuncArguments(funcOp, ptrTy);
44-
}
45-
});
41+
for (auto &root : allocation.getRoots()) {
42+
insertFuncArguments(root, ptrTy);
43+
}
4644
}
4745

4846
// 2: Collect all AddressOfOp that need updating

0 commit comments

Comments
 (0)