-
Notifications
You must be signed in to change notification settings - Fork 15k
[Passes] Move LoopInterchange into optimization pipeline #145503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -690,9 +690,6 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level, | |
|
|
||
| LPM2.addPass(LoopDeletionPass()); | ||
|
|
||
| if (PTO.LoopInterchange) | ||
| LPM2.addPass(LoopInterchangePass()); | ||
|
|
||
| // Do not enable unrolling in PreLinkThinLTO phase during sample PGO | ||
| // because it changes IR to makes profile annotation in back compile | ||
| // inaccurate. The normal unroller doesn't pay attention to forced full unroll | ||
|
|
@@ -1547,6 +1544,10 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level, | |
| // this may need to be revisited once we run GVN before loop deletion | ||
| // in the simplification pipeline. | ||
| LPM.addPass(LoopDeletionPass()); | ||
|
|
||
| if (PTO.LoopInterchange) | ||
| LPM.addPass(LoopInterchangePass()); | ||
|
Comment on lines
+1548
to
+1549
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure where LoopInterchange should be placed within the optimization pipeline... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, would be interesting to get some more data about the impact on different positions, but current palcement seems reasonable to me at least There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, collecting diverse data would be ideal, but since the current LoopInterchange is rarely triggered in practice, I haven’t been able to gather enough data to have a meaningful discussion about the optimal placement... TSVC s235 might be an interesting example. If LoopDistribute were extended to support non-innermost loops and it ran before LoopInterchange, the following could become possible: Original: for (int nl = 0; nl < 200*(ntimes/LEN2); nl++)
for (int i = 0; i < LEN2; i++) {
a[i] += b[i] * c[i];
for (int j = 1; j < LEN2; j++) {
aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];
}
}Distributed: for (int nl = 0; nl < 200*(ntimes/LEN2); nl++)
for (int i = 0; i < LEN2; i++)
a[i] += b[i] * c[i];
for (int nl = 0; nl < 200*(ntimes/LEN2); nl++)
for (int i = 0; i < LEN2; i++)
for (int j = 1; j < LEN2; j++)
aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];Interchanged: for (int nl = 0; nl < 200*(ntimes/LEN2); nl++)
for (int i = 0; i < LEN2; i++)
a[i] += b[i] * c[i];
// The i-loop and j-loop are interchanged.
for (int nl = 0; nl < 200*(ntimes/LEN2); nl++)
for (int j = 1; j < LEN2; j++)
for (int i = 0; i < LEN2; i++)
aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];IIRC, when I ran the last code in the past, it was about 8x faster than the original. But achieving this would require a lot of work on LoopDistribute... |
||
|
|
||
| OptimizePM.addPass(createFunctionToLoopPassAdaptor( | ||
| std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| ; RUN: opt -passes='default<O3>' -enable-loopinterchange -disable-output \ | ||
| ; RUN: -disable-verify -verify-analysis-invalidation=0 \ | ||
| ; RUN: -debug-pass-manager=quiet %s 2>&1 | FileCheck %s | ||
|
|
||
| ; Test the position of LoopInterchange in the pass pipeline. | ||
|
|
||
| ; CHECK-NOT: Running pass: LoopInterchangePass | ||
| ; CHECK: Running pass: ControlHeightReductionPass | ||
| ; CHECK-NEXT: Running pass: LoopSimplifyPass | ||
| ; CHECK-NEXT: Running pass: LCSSAPass | ||
| ; CHECK-NEXT: Running pass: LoopRotatePass | ||
| ; CHECK-NEXT: Running pass: LoopDeletionPass | ||
| ; CHECK-NEXT: Running pass: LoopRotatePass | ||
| ; CHECK-NEXT: Running pass: LoopDeletionPass | ||
| ; CHECK-NEXT: Running pass: LoopInterchangePass | ||
| ; CHECK-NEXT: Running pass: LoopDistributePass | ||
| ; CHECK-NEXT: Running pass: InjectTLIMappings | ||
| ; CHECK-NEXT: Running pass: LoopVectorizePass | ||
|
Comment on lines
+1
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please let me know if there is a better way... |
||
|
|
||
|
|
||
| define void @foo(ptr %a, i32 %n) { | ||
| entry: | ||
| br label %for.i.header | ||
|
|
||
| for.i.header: | ||
| %i = phi i32 [ 0, %entry ], [ %i.next, %for.i.latch ] | ||
| br label %for.j | ||
|
|
||
| for.j: | ||
| %j = phi i32 [ 0, %for.i.header ], [ %j.next, %for.j ] | ||
| %tmp = mul i32 %i, %n | ||
| %offset = add i32 %tmp, %j | ||
| %idx = getelementptr inbounds i32, ptr %a, i32 %offset | ||
| %load = load i32, ptr %idx, align 4 | ||
| %inc = add i32 %load, 1 | ||
| store i32 %inc, ptr %idx, align 4 | ||
| %j.next = add i32 %j, 1 | ||
| %j.exit = icmp eq i32 %j.next, %n | ||
| br i1 %j.exit, label %for.i.latch, label %for.j | ||
|
|
||
| for.i.latch: | ||
| %i.next = add i32 %i, 1 | ||
| %i.exit = icmp eq i32 %i.next, %n | ||
| br i1 %i.exit, label %for.i.header, label %exit | ||
|
|
||
| exit: | ||
| ret void | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be reasonable to keep this, at least for now?