-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[StructurizeCFG] Hoist and simplify zero-cost incoming else phi values #139605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
d7da7dd
95c47d2
e7c1f9c
05c24b8
d279104
dc9330f
44614f6
cf34c41
65b72ae
0e13ed3
c442538
875ecd2
f49e20b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |||||
| #include "llvm/Analysis/RegionInfo.h" | ||||||
| #include "llvm/Analysis/RegionIterator.h" | ||||||
| #include "llvm/Analysis/RegionPass.h" | ||||||
| #include "llvm/Analysis/TargetTransformInfo.h" | ||||||
| #include "llvm/Analysis/UniformityAnalysis.h" | ||||||
| #include "llvm/IR/BasicBlock.h" | ||||||
| #include "llvm/IR/CFG.h" | ||||||
|
|
@@ -128,6 +129,7 @@ struct PredInfo { | |||||
| using BBPredicates = DenseMap<BasicBlock *, PredInfo>; | ||||||
| using PredMap = DenseMap<BasicBlock *, BBPredicates>; | ||||||
| using BB2BBMap = DenseMap<BasicBlock *, BasicBlock *>; | ||||||
| using Val2BBMap = DenseMap<Value *, BasicBlock *>; | ||||||
|
|
||||||
| // A traits type that is intended to be used in graph algorithms. The graph | ||||||
| // traits starts at an entry node, and traverses the RegionNodes that are in | ||||||
|
|
@@ -279,7 +281,7 @@ class StructurizeCFG { | |||||
| ConstantInt *BoolTrue; | ||||||
| ConstantInt *BoolFalse; | ||||||
| Value *BoolPoison; | ||||||
|
|
||||||
| TargetTransformInfo *TTI; | ||||||
| Function *Func; | ||||||
| Region *ParentRegion; | ||||||
|
|
||||||
|
|
@@ -301,8 +303,12 @@ class StructurizeCFG { | |||||
| PredMap LoopPreds; | ||||||
| BranchVector LoopConds; | ||||||
|
|
||||||
| Val2BBMap HoistedValues; | ||||||
|
|
||||||
| RegionNode *PrevNode; | ||||||
|
|
||||||
| void hoistZeroCostElseBlockPhiValues(BasicBlock *ElseBB, BasicBlock *ThenBB); | ||||||
|
|
||||||
| void orderNodes(); | ||||||
|
|
||||||
| void analyzeLoops(RegionNode *N); | ||||||
|
|
@@ -332,6 +338,8 @@ class StructurizeCFG { | |||||
|
|
||||||
| void simplifyAffectedPhis(); | ||||||
|
|
||||||
| void simplifyHoistedPhis(); | ||||||
|
|
||||||
| DebugLoc killTerminator(BasicBlock *BB); | ||||||
|
|
||||||
| void changeExit(RegionNode *Node, BasicBlock *NewExit, | ||||||
|
|
@@ -359,7 +367,7 @@ class StructurizeCFG { | |||||
|
|
||||||
| public: | ||||||
| void init(Region *R); | ||||||
| bool run(Region *R, DominatorTree *DT); | ||||||
| bool run(Region *R, DominatorTree *DT, TargetTransformInfo *TTI); | ||||||
| bool makeUniformRegion(Region *R, UniformityInfo &UA); | ||||||
| }; | ||||||
|
|
||||||
|
|
@@ -385,16 +393,21 @@ class StructurizeCFGLegacyPass : public RegionPass { | |||||
| if (SCFG.makeUniformRegion(R, UA)) | ||||||
| return false; | ||||||
| } | ||||||
| Function *F = R->getEntry()->getParent(); | ||||||
| TargetTransformInfo *TTI = | ||||||
|
||||||
| TargetTransformInfo *TTI = | |
| const TargetTransformInfo *TTI = |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why code size metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are looking for instructions that are just cross-block zero cost copies, I thought Latency or recipThroughput did not matter. But if I am wrong in my understanding, I can change the metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AMDGPU cost model has seen close to 0 effort on getting accurate code sizes. You'll probably get better results for the default latency
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use range loop
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about the arbitrary order of then/else block is a little bit confusing here. Because we treated the first visited successor as Then block, and the other one as Else block throughout the code implementation. Maybe just remove it to avoid such confusion.
VigneshwarJ marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added hoisting logic here, because the else block for structurized CFG is getting defined here. And there's no point in hoisting then block zero-cost instruction phis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.