-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[StructurizeCFG] Hoist and simplify zero-cost incoming else phi values #139605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
d7da7dd
95c47d2
e7c1f9c
05c24b8
d279104
dc9330f
44614f6
cf34c41
65b72ae
0e13ed3
c442538
875ecd2
f49e20b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -307,6 +307,8 @@ class StructurizeCFG { | |||
|
|
||||
| RegionNode *PrevNode; | ||||
|
|
||||
| void reorderIfElseBlock(BasicBlock *BB, unsigned Idx); | ||||
|
|
||||
| void orderNodes(); | ||||
|
|
||||
| void analyzeLoops(RegionNode *N); | ||||
|
|
@@ -409,6 +411,31 @@ class StructurizeCFGLegacyPass : public RegionPass { | |||
|
|
||||
| } // end anonymous namespace | ||||
|
|
||||
| /// Helper function for heuristics to order if else block | ||||
| /// Checks whether an instruction is potential vector copy instruction, if so, | ||||
| /// checks if the operands are from different BB. if so, returns True. | ||||
| // Then there's a possibility of coelescing without interference when ordered | ||||
| // first. | ||||
| static bool hasAffectingInstructions(Instruction *I, BasicBlock *BB) { | ||||
|
|
||||
|
||||
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand looking for special case instruction types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way these heuristics will work is if we know which are going to be copy instructions after lowering. Here I just look at these two because mostly these are getting lowered to copy instruction. Another way of calculating heuristics for ordering was to see whether the incoming phi value instruction defined by the operands of the same block or different block. but this is too broad and can have unintentional reorders.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use range loop
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just sort blocks in topological order or something? I'm not sure I understand this heuristic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blocks are already sorted in the topological order. But in case of if-then-else, topologically then,else or else,then is same. But based on the order, there are extra copies that leads to spill in large cases.
This is specifically seen in case where one block just copies values and other block modifies values.
godbolt link
if:
%vec = <4 x i32> ...
br i1 %c %then %else
then:
%x = extractelement <4 x i32> %vec, i32 0
%z = add i32 %x, 1
br label %merge
else:
%a = extractelement <4 x i32> %vec, i32 0
br label %merge
merge:
%phi = phi i32 [ %z, %then ], [ %a, %else ]
store i32 %phi, ptr %ptr
ret void
After structurization and machine code , if 'then' block is ordered first
then at register coalescer
if:
[v0 - v4] = ...
cond_branch_scc then
tmp:
v5 = 0
branch Flow
then:
v5 = copy v0 ; eliminated in reg coalescer
v5 = v5 + 1 ; v5 = v0 + 1
flow:
v6 = v5 ; eliminated
cond_branch final
else:
v6 = copy v0 ; v5 = copy v0
; this copy v5 = copy v0 can't be coalesced further then->flow->else live range
final:
store v6 ; store v5but at the same time, if 'else' block is ordered first,
if:
[v0 - v4] = ...
cond_branch_scc else
tmp:
v5 = 0
branch Flow
else:
v5 = copy v0 ; eliminated
flow:
v6 = copy v5 ; eliminated
cond_branch final
then:
v6 = copy v0 ; eliminated
v6 = v6 + 1 ; v6 = v0 + 1 ; can be coalesced to v0 = v0 + 1
final:
store v6 ; coalesced to store v0Theres an extra copy just due to ordering in the first case. Though then and else block won't be live at same time due to structurization, we see this.
This heuristics just sees instructions that can be lowered to copy instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything can be lowered to copied, initially most cross block values emit copies. Does this really mean "0" cost instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, changed the code to check for zero cost instructions
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: opt -S -structurizecfg %s -o - | FileCheck %s | ||
| ; RUN: opt -S -passes=structurizecfg %s -o - | FileCheck %s | ||
|
|
||
| define amdgpu_kernel void @test_extractelement_1(<4 x i32> %vec, i1 %cond, ptr %ptr) { | ||
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ; CHECK-LABEL: define amdgpu_kernel void @test_extractelement_1( | ||
| ; CHECK-SAME: <4 x i32> [[VEC:%.*]], i1 [[COND:%.*]], ptr [[PTR:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: [[COND_INV:%.*]] = xor i1 [[COND]], true | ||
| ; CHECK-NEXT: br i1 [[COND_INV]], label %[[ELSE:.*]], label %[[FLOW:.*]] | ||
| ; CHECK: [[FLOW]]: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[A:%.*]], %[[ELSE]] ], [ poison, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = phi i1 [ false, %[[ELSE]] ], [ true, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: br i1 [[TMP1]], label %[[THEN:.*]], label %[[MERGE:.*]] | ||
| ; CHECK: [[THEN]]: | ||
| ; CHECK-NEXT: [[X:%.*]] = extractelement <4 x i32> [[VEC]], i32 0 | ||
| ; CHECK-NEXT: [[Z:%.*]] = add i32 [[X]], 1 | ||
| ; CHECK-NEXT: br label %[[MERGE]] | ||
| ; CHECK: [[ELSE]]: | ||
| ; CHECK-NEXT: [[A]] = extractelement <4 x i32> [[VEC]], i32 1 | ||
| ; CHECK-NEXT: br label %[[FLOW]] | ||
| ; CHECK: [[MERGE]]: | ||
| ; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[TMP0]], %[[FLOW]] ], [ [[Z]], %[[THEN]] ] | ||
| ; CHECK-NEXT: store i32 [[PHI]], ptr [[PTR]], align 4 | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
| entry: | ||
| br i1 %cond, label %then, label %else | ||
|
|
||
| then: | ||
| %x = extractelement <4 x i32> %vec, i32 0 | ||
| %z = add i32 %x, 1 | ||
| br label %merge | ||
|
|
||
| else: | ||
| %a = extractelement <4 x i32> %vec, i32 1 | ||
| br label %merge | ||
|
|
||
| merge: | ||
| %phi = phi i32 [ %z, %then ], [ %a, %else ] | ||
| store i32 %phi, ptr %ptr | ||
| ret void | ||
| } | ||
|
|
||
| define amdgpu_kernel void @test_extractelement_2(<4 x i32> %vec, i1 %cond, ptr %ptr) { | ||
| ; CHECK-LABEL: define amdgpu_kernel void @test_extractelement_2( | ||
| ; CHECK-SAME: <4 x i32> [[VEC:%.*]], i1 [[COND:%.*]], ptr [[PTR:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: [[COND_INV:%.*]] = xor i1 [[COND]], true | ||
| ; CHECK-NEXT: br i1 [[COND_INV]], label %[[ELSE:.*]], label %[[FLOW:.*]] | ||
| ; CHECK: [[FLOW]]: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[A:%.*]], %[[ELSE]] ], [ poison, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = phi i1 [ false, %[[ELSE]] ], [ true, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: br i1 [[TMP1]], label %[[THEN:.*]], label %[[MERGE:.*]] | ||
| ; CHECK: [[THEN]]: | ||
| ; CHECK-NEXT: [[X:%.*]] = extractelement <4 x i32> [[VEC]], i32 1 | ||
| ; CHECK-NEXT: [[Y:%.*]] = add i32 [[X]], 1 | ||
| ; CHECK-NEXT: [[VEC1:%.*]] = insertelement <4 x i32> poison, i32 [[Y]], i32 0 | ||
| ; CHECK-NEXT: [[Z:%.*]] = extractelement <4 x i32> [[VEC1]], i32 0 | ||
| ; CHECK-NEXT: br label %[[MERGE]] | ||
| ; CHECK: [[ELSE]]: | ||
| ; CHECK-NEXT: [[A]] = extractelement <4 x i32> [[VEC]], i32 1 | ||
| ; CHECK-NEXT: br label %[[FLOW]] | ||
| ; CHECK: [[MERGE]]: | ||
| ; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[TMP0]], %[[FLOW]] ], [ [[Z]], %[[THEN]] ] | ||
| ; CHECK-NEXT: store i32 [[PHI]], ptr [[PTR]], align 4 | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
| entry: | ||
| br i1 %cond, label %then, label %else | ||
|
|
||
| then: | ||
| %x = extractelement <4 x i32> %vec, i32 1 | ||
| %y = add i32 %x, 1 | ||
| %vec1 = insertelement <4 x i32> poison, i32 %y, i32 0 | ||
| %z = extractelement <4 x i32> %vec1, i32 0 | ||
| br label %merge | ||
|
|
||
| else: | ||
| %a = extractelement <4 x i32> %vec, i32 1 | ||
| br label %merge | ||
|
|
||
| merge: | ||
| %phi = phi i32 [ %z, %then ], [ %a, %else ] | ||
| store i32 %phi, ptr %ptr | ||
| ret void | ||
| } | ||
|
|
||
| %pair = type { i32, i32 } | ||
| define amdgpu_kernel void @test_extractvalue(ptr %ptr, i1 %cond) { | ||
| ; CHECK-LABEL: define amdgpu_kernel void @test_extractvalue( | ||
| ; CHECK-SAME: ptr [[PTR:%.*]], i1 [[COND:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: [[LOAD_THEN:%.*]] = load [[PAIR:%.*]], ptr [[PTR]], align 4 | ||
| ; CHECK-NEXT: br i1 [[COND]], label %[[THEN:.*]], label %[[FLOW:.*]] | ||
| ; CHECK: [[THEN]]: | ||
| ; CHECK-NEXT: [[A_THEN:%.*]] = extractvalue [[PAIR]] [[LOAD_THEN]], 0 | ||
| ; CHECK-NEXT: br label %[[FLOW]] | ||
| ; CHECK: [[FLOW]]: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[A_THEN]], %[[THEN]] ], [ poison, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = phi i1 [ false, %[[THEN]] ], [ true, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: br i1 [[TMP1]], label %[[ELSE:.*]], label %[[MERGE:.*]] | ||
| ; CHECK: [[ELSE]]: | ||
| ; CHECK-NEXT: [[A_ELSE:%.*]] = extractvalue [[PAIR]] [[LOAD_THEN]], 0 | ||
| ; CHECK-NEXT: [[SUM_ELSE:%.*]] = add i32 [[A_ELSE]], 1 | ||
| ; CHECK-NEXT: br label %[[MERGE]] | ||
| ; CHECK: [[MERGE]]: | ||
| ; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[TMP0]], %[[FLOW]] ], [ [[SUM_ELSE]], %[[ELSE]] ] | ||
| ; CHECK-NEXT: store i32 [[PHI]], ptr [[PTR]], align 4 | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
| entry: | ||
| %load_then = load %pair, ptr %ptr | ||
| br i1 %cond, label %then, label %else | ||
|
|
||
| then: | ||
| %a_then = extractvalue %pair %load_then, 0 | ||
| br label %merge | ||
|
|
||
| else: | ||
| %a_else = extractvalue %pair %load_then, 0 | ||
| %sum_else = add i32 %a_else, 1 | ||
| br label %merge | ||
|
|
||
| merge: | ||
| %phi = phi i32 [ %a_then, %then ], [ %sum_else, %else ] | ||
| store i32 %phi, ptr %ptr | ||
| ret void | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.