-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU] Insert inliner anchor earlier #169478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/rovka/machine-inlining-mfi
Are you sure you want to change the base?
[AMDGPU] Insert inliner anchor earlier #169478
Conversation
Add a new hook for inserting passes right after the last DummyCGSCC pass and use it to insert the anchor. This changes the last FunctionPass manager to be an inlining pass manager, thus preserving some of the analyses that might be computed before the inliner and used after it (to be fair that's never going to be a lot of analyses, since inlining is pretty plastic, but at least some of the IR-level analyses that have absolutely no reason to change can be computed only once). This is how I originally designed the code, but I don't feel like I have a good name/abstraction for this exact point in the pipeline, hence the separate patch.
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-backend-amdgpu Author: Diana Picus (rovka) ChangesAdd a new hook for inserting passes right after the last DummyCGSCC pass This is how I originally designed the code, but I don't feel like I have Full diff: https://github.com/llvm/llvm-project/pull/169478.diff 4 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/TargetPassConfig.h b/llvm/include/llvm/CodeGen/TargetPassConfig.h
index 5e0e641a981f9..e8c85246b0049 100644
--- a/llvm/include/llvm/CodeGen/TargetPassConfig.h
+++ b/llvm/include/llvm/CodeGen/TargetPassConfig.h
@@ -364,6 +364,15 @@ class LLVM_ABI TargetPassConfig : public ImmutablePass {
return true;
}
+ /// Add passes at the start of the function pass manager created after
+ /// enforcing CGSCC ordering. This hook is only called when
+ /// requiresCodeGenSCCOrder() returns true.
+ ///
+ /// Targets can use this to insert passes that need to run at the beginning
+ /// of the codegen function pass manager, after the CGSCC boundary has been
+ /// established.
+ virtual void addInitialCGSCCCodeGenPasses() {}
+
/// addMachineSSAOptimization - Add standard passes that optimize machine
/// instructions in SSA form.
virtual void addMachineSSAOptimization();
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index ceae0d29eea90..40458407b733d 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -966,8 +966,10 @@ void TargetPassConfig::addISelPrepare() {
addPreISel();
// Force codegen to run according to the callgraph.
- if (requiresCodeGenSCCOrder())
+ if (requiresCodeGenSCCOrder()) {
addPass(new DummyCGSCCPass);
+ addInitialCGSCCCodeGenPasses();
+ }
if (getOptLevel() != CodeGenOptLevel::None)
addPass(createObjCARCContractPass());
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 60d495d809236..f49f817d80c8d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1245,6 +1245,7 @@ class GCNPassConfig final : public AMDGPUPassConfig {
}
bool addPreISel() override;
+ void addInitialCGSCCCodeGenPasses() override;
void addMachineSSAOptimization() override;
bool addILPOpts() override;
bool addInstSelector() override;
@@ -1504,6 +1505,11 @@ bool GCNPassConfig::addPreISel() {
return false;
}
+void GCNPassConfig::addInitialCGSCCCodeGenPasses() {
+ if (EnableAMDGPUMachineLevelInliner)
+ addPass(createAMDGPUInliningAnchorPass());
+}
+
void GCNPassConfig::addMachineSSAOptimization() {
TargetPassConfig::addMachineSSAOptimization();
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index 8d7409c11537f..01a7c51bfcc40 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -1614,7 +1614,8 @@
; INLINER-NEXT: Loop-Closed SSA Form Pass
; INLINER-NEXT: Analysis if a function is memory bound
; INLINER-NEXT: DummyCGSCCPass
-; INLINER-NEXT: FunctionPass Manager
+; INLINER-NEXT: AMDGPU Inlining Pass Manager
+; INLINER-NEXT: AMDGPU Inlining Anchor
; INLINER-NEXT: Dominator Tree Construction
; INLINER-NEXT: Basic Alias Analysis (stateless AA impl)
; INLINER-NEXT: Function Alias Analysis Results
@@ -1745,15 +1746,11 @@
; INLINER-NEXT: Machine Copy Propagation Pass
; INLINER-NEXT: Post-RA pseudo instruction expansion pass
; INLINER-NEXT: SI Shrink Instructions
-; INLINER-NEXT: AMDGPU Inlining Pass Manager
; INLINER-NEXT: AMDGPU Inlining Anchor
; INLINER-NEXT: AMDGPU Machine Level Inliner
; INLINER-NEXT: SI post-RA bundler
; INLINER-NEXT: MachineDominator Tree Construction
; INLINER-NEXT: Machine Natural Loop Construction
-; INLINER-NEXT: Dominator Tree Construction
-; INLINER-NEXT: Basic Alias Analysis (stateless AA impl)
-; INLINER-NEXT: Function Alias Analysis Results
; INLINER-NEXT: PostRA Machine Instruction Scheduler
; INLINER-NEXT: Machine Block Frequency Analysis
; INLINER-NEXT: MachinePostDominator Tree Construction
|
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.llLLVM.tools/UpdateTestChecks/update_givaluetracking_test_checks/knownbits-const.testIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |

Add a new hook for inserting passes right after the last DummyCGSCC pass
and use it to insert the anchor. This changes the last FunctionPass
manager to be an inlining pass manager, thus preserving some of the
analyses that might be computed before the inliner and used after it (to
be fair that's never going to be a lot of analyses, since inlining is
pretty plastic, but at least some of the IR-level analyses that have
absolutely no reason to change can be computed only once).
This is how I originally designed the code, but I don't feel like I have
a good name/abstraction for this exact point in the pipeline, hence the
separate patch.