Skip to content

Commit d746acc

Browse files
slinder1David Salinas
authored andcommitted
[AMDGPU] Push amdgpu-preload-kern-arg-prolog after livedebugvalues (llvm#126148)
This is effectively a workaround for a bug in livedebugvalues, but seems to potentially be a general improvement, as BB sections seems like it could ruin the special 256-byte prelude scheme that amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later doesn't seem to have any material impact, and just adds livedebugvalues to the list of things which no longer have to deal with pseudo multiple-entry functions. AMDGPU debug-info isn't supported upstream yet, so the bug being avoided isn't testable here. I am posting the patch upstream to avoid an unnecessary diff with AMD's fork.
1 parent c273851 commit d746acc

File tree

2 files changed

+11
-5
lines changed

2 files changed

+11
-5
lines changed

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1147,6 +1147,7 @@ class GCNPassConfig final : public AMDGPUPassConfig {
11471147
void addPostRegAlloc() override;
11481148
void addPreSched2() override;
11491149
void addPreEmitPass() override;
1150+
void addPostBBSections() override;
11501151
};
11511152

11521153
} // end anonymous namespace
@@ -1718,6 +1719,11 @@ void GCNPassConfig::addPreEmitPass() {
17181719
addPass(&AMDGPUInsertDelayAluID);
17191720

17201721
addPass(&BranchRelaxationPassID);
1722+
}
1723+
1724+
void GCNPassConfig::addPostBBSections() {
1725+
// We run this later to avoid passes like livedebugvalues and BBSections
1726+
// having to deal with the apparent multi-entry functions we may generate.
17211727
addPass(createAMDGPUPreloadKernArgPrologLegacyPass());
17221728
}
17231729

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -145,11 +145,11 @@
145145
; GCN-O0-NEXT: SI Final Branch Preparation
146146
; GCN-O0-NEXT: Post RA hazard recognizer
147147
; GCN-O0-NEXT: Branch relaxation pass
148-
; GCN-O0-NEXT: AMDGPU Preload Kernel Arguments Prolog
149148
; GCN-O0-NEXT: Register Usage Information Collector Pass
150149
; GCN-O0-NEXT: Remove Loads Into Fake Uses
151150
; GCN-O0-NEXT: Live DEBUG_VALUE analysis
152151
; GCN-O0-NEXT: Machine Sanitizer Binary Metadata
152+
; GCN-O0-NEXT: AMDGPU Preload Kernel Arguments Prolog
153153
; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis
154154
; GCN-O0-NEXT: Machine Optimization Remark Emitter
155155
; GCN-O0-NEXT: Stack Frame Layout Analysis
@@ -429,11 +429,11 @@
429429
; GCN-O1-NEXT: Post RA hazard recognizer
430430
; GCN-O1-NEXT: AMDGPU Insert Delay ALU
431431
; GCN-O1-NEXT: Branch relaxation pass
432-
; GCN-O1-NEXT: AMDGPU Preload Kernel Arguments Prolog
433432
; GCN-O1-NEXT: Register Usage Information Collector Pass
434433
; GCN-O1-NEXT: Remove Loads Into Fake Uses
435434
; GCN-O1-NEXT: Live DEBUG_VALUE analysis
436435
; GCN-O1-NEXT: Machine Sanitizer Binary Metadata
436+
; GCN-O1-NEXT: AMDGPU Preload Kernel Arguments Prolog
437437
; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
438438
; GCN-O1-NEXT: Machine Optimization Remark Emitter
439439
; GCN-O1-NEXT: Stack Frame Layout Analysis
@@ -741,11 +741,11 @@
741741
; GCN-O1-OPTS-NEXT: Post RA hazard recognizer
742742
; GCN-O1-OPTS-NEXT: AMDGPU Insert Delay ALU
743743
; GCN-O1-OPTS-NEXT: Branch relaxation pass
744-
; GCN-O1-OPTS-NEXT: AMDGPU Preload Kernel Arguments Prolog
745744
; GCN-O1-OPTS-NEXT: Register Usage Information Collector Pass
746745
; GCN-O1-OPTS-NEXT: Remove Loads Into Fake Uses
747746
; GCN-O1-OPTS-NEXT: Live DEBUG_VALUE analysis
748747
; GCN-O1-OPTS-NEXT: Machine Sanitizer Binary Metadata
748+
; GCN-O1-OPTS-NEXT: AMDGPU Preload Kernel Arguments Prolog
749749
; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
750750
; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
751751
; GCN-O1-OPTS-NEXT: Stack Frame Layout Analysis
@@ -1069,11 +1069,11 @@
10691069
; GCN-O2-NEXT: Post RA hazard recognizer
10701070
; GCN-O2-NEXT: AMDGPU Insert Delay ALU
10711071
; GCN-O2-NEXT: Branch relaxation pass
1072-
; GCN-O2-NEXT: AMDGPU Preload Kernel Arguments Prolog
10731072
; GCN-O2-NEXT: Register Usage Information Collector Pass
10741073
; GCN-O2-NEXT: Remove Loads Into Fake Uses
10751074
; GCN-O2-NEXT: Live DEBUG_VALUE analysis
10761075
; GCN-O2-NEXT: Machine Sanitizer Binary Metadata
1076+
; GCN-O2-NEXT: AMDGPU Preload Kernel Arguments Prolog
10771077
; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
10781078
; GCN-O2-NEXT: Machine Optimization Remark Emitter
10791079
; GCN-O2-NEXT: Stack Frame Layout Analysis
@@ -1410,11 +1410,11 @@
14101410
; GCN-O3-NEXT: Post RA hazard recognizer
14111411
; GCN-O3-NEXT: AMDGPU Insert Delay ALU
14121412
; GCN-O3-NEXT: Branch relaxation pass
1413-
; GCN-O3-NEXT: AMDGPU Preload Kernel Arguments Prolog
14141413
; GCN-O3-NEXT: Register Usage Information Collector Pass
14151414
; GCN-O3-NEXT: Remove Loads Into Fake Uses
14161415
; GCN-O3-NEXT: Live DEBUG_VALUE analysis
14171416
; GCN-O3-NEXT: Machine Sanitizer Binary Metadata
1417+
; GCN-O3-NEXT: AMDGPU Preload Kernel Arguments Prolog
14181418
; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
14191419
; GCN-O3-NEXT: Machine Optimization Remark Emitter
14201420
; GCN-O3-NEXT: Stack Frame Layout Analysis

0 commit comments

Comments
 (0)