-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[PowerPC] Add flag to DAG combiner to improve compile time. #86884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] Add flag to DAG combiner to improve compile time. #86884
Conversation
When combining loads we automatically add any new nodes to the list to be combined in turn. This has a positive effect on overall performance but a negative effect on compile time. We have an internal example where compile time becomes much too long to be acceptable. This option (off by default) disables the addition of the new nodes.
|
@llvm/pr-subscribers-llvm-selectiondag Author: Stefan Pintilie (stefanp-ibm) ChangesWhen combining loads we automatically add any new nodes to the list to be Full diff: https://github.com/llvm/llvm-project/pull/86884.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
cl::desc(
"Enable merging extends and rounds into FCOPYSIGN on vector types"));
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+ cl::desc("When combining a load are new nodes added back in"),
+ cl::init(true));
+
+
namespace {
class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
MVT::Other, Chain, ReplLoad.getValue(1));
// Replace uses with load result and token factor
- return CombineTo(N, ReplLoad.getValue(0), Token);
+ return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
}
}
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; BE-NEXT: addi 3, 3, 15
; BE-NEXT: rldicr 3, 3, 0, 59
; BE-NEXT: addi 4, 3, 16
-; BE-NEXT: addi 5, 3, 31
; BE-NEXT: std 4, -8(1)
-; BE-NEXT: rldicr 4, 5, 0, 59
+; BE-NEXT: ld 4, -8(1)
; BE-NEXT: lvx 2, 0, 3
-; BE-NEXT: addi 3, 4, 16
-; BE-NEXT: std 3, -8(1)
-; BE-NEXT: lvx 3, 0, 4
+; BE-NEXT: addi 4, 4, 15
+; BE-NEXT: rldicr 3, 4, 0, 59
+; BE-NEXT: addi 4, 3, 16
+; BE-NEXT: std 4, -8(1)
+; BE-NEXT: lvx 3, 0, 3
; BE-NEXT: blr
;
; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; LE-NEXT: lxvd2x 0, 0, 3
; LE-NEXT: xxswapd 35, 0
; LE-NEXT: blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD: # %bb.0:
+; BENOLOAD-NEXT: ld 3, -8(1)
+; BENOLOAD-NEXT: addi 3, 3, 15
+; BENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT: addi 4, 3, 16
+; BENOLOAD-NEXT: std 4, -8(1)
+; BENOLOAD-NEXT: ld 4, -8(1)
+; BENOLOAD-NEXT: lvx 2, 0, 3
+; BENOLOAD-NEXT: addi 4, 4, 15
+; BENOLOAD-NEXT: rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT: addi 4, 3, 16
+; BENOLOAD-NEXT: std 4, -8(1)
+; BENOLOAD-NEXT: lvx 3, 0, 3
+; BENOLOAD-NEXT: blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD: # %bb.0:
+; LENOLOAD-NEXT: ld 3, -8(1)
+; LENOLOAD-NEXT: addi 3, 3, 15
+; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT: addi 4, 3, 16
+; LENOLOAD-NEXT: std 4, -8(1)
+; LENOLOAD-NEXT: lxvd2x 0, 0, 3
+; LENOLOAD-NEXT: ld 3, -8(1)
+; LENOLOAD-NEXT: addi 3, 3, 15
+; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT: addi 4, 3, 16
+; LENOLOAD-NEXT: std 4, -8(1)
+; LENOLOAD-NEXT: xxswapd 34, 0
+; LENOLOAD-NEXT: lxvd2x 0, 0, 3
+; LENOLOAD-NEXT: xxswapd 35, 0
+; LENOLOAD-NEXT: blr
%args = alloca ptr, align 4
%x = va_arg ptr %args, <8 x i32>
ret <8 x i32> %x
|
|
@llvm/pr-subscribers-backend-powerpc Author: Stefan Pintilie (stefanp-ibm) ChangesWhen combining loads we automatically add any new nodes to the list to be Full diff: https://github.com/llvm/llvm-project/pull/86884.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
cl::desc(
"Enable merging extends and rounds into FCOPYSIGN on vector types"));
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+ cl::desc("When combining a load are new nodes added back in"),
+ cl::init(true));
+
+
namespace {
class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
MVT::Other, Chain, ReplLoad.getValue(1));
// Replace uses with load result and token factor
- return CombineTo(N, ReplLoad.getValue(0), Token);
+ return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
}
}
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; BE-NEXT: addi 3, 3, 15
; BE-NEXT: rldicr 3, 3, 0, 59
; BE-NEXT: addi 4, 3, 16
-; BE-NEXT: addi 5, 3, 31
; BE-NEXT: std 4, -8(1)
-; BE-NEXT: rldicr 4, 5, 0, 59
+; BE-NEXT: ld 4, -8(1)
; BE-NEXT: lvx 2, 0, 3
-; BE-NEXT: addi 3, 4, 16
-; BE-NEXT: std 3, -8(1)
-; BE-NEXT: lvx 3, 0, 4
+; BE-NEXT: addi 4, 4, 15
+; BE-NEXT: rldicr 3, 4, 0, 59
+; BE-NEXT: addi 4, 3, 16
+; BE-NEXT: std 4, -8(1)
+; BE-NEXT: lvx 3, 0, 3
; BE-NEXT: blr
;
; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; LE-NEXT: lxvd2x 0, 0, 3
; LE-NEXT: xxswapd 35, 0
; LE-NEXT: blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD: # %bb.0:
+; BENOLOAD-NEXT: ld 3, -8(1)
+; BENOLOAD-NEXT: addi 3, 3, 15
+; BENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT: addi 4, 3, 16
+; BENOLOAD-NEXT: std 4, -8(1)
+; BENOLOAD-NEXT: ld 4, -8(1)
+; BENOLOAD-NEXT: lvx 2, 0, 3
+; BENOLOAD-NEXT: addi 4, 4, 15
+; BENOLOAD-NEXT: rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT: addi 4, 3, 16
+; BENOLOAD-NEXT: std 4, -8(1)
+; BENOLOAD-NEXT: lvx 3, 0, 3
+; BENOLOAD-NEXT: blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD: # %bb.0:
+; LENOLOAD-NEXT: ld 3, -8(1)
+; LENOLOAD-NEXT: addi 3, 3, 15
+; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT: addi 4, 3, 16
+; LENOLOAD-NEXT: std 4, -8(1)
+; LENOLOAD-NEXT: lxvd2x 0, 0, 3
+; LENOLOAD-NEXT: ld 3, -8(1)
+; LENOLOAD-NEXT: addi 3, 3, 15
+; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT: addi 4, 3, 16
+; LENOLOAD-NEXT: std 4, -8(1)
+; LENOLOAD-NEXT: xxswapd 34, 0
+; LENOLOAD-NEXT: lxvd2x 0, 0, 3
+; LENOLOAD-NEXT: xxswapd 35, 0
+; LENOLOAD-NEXT: blr
%args = alloca ptr, align 4
%x = va_arg ptr %args, <8 x i32>
ret <8 x i32> %x
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
| cl::desc( | ||
| "Enable merging extends and rounds into FCOPYSIGN on vector types")); | ||
|
|
||
| static cl::opt<bool> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These flags aren't really intended for end users, and are difficult to discover. Is there a threshold or something you can add instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you mean about these options being mostly for debug. However, this compile time issue is more of a special case. My fear with adding a threshold is that this is target indep code and it can negatively impact performance on a whole set of targets. To avoid this I would still prefer to keep the default as it is now and add an option that would add a threshold value. So, either way, I get a debug option like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that fear of touching other targets results in an increasing set of single purpose knobs like this, the weight of which is quite high.
|
Can you give any more information on what nodes you are adding that causes such a problem? Are you adding a large number of TokenFactors or something? I should also mention #83422 which will/would/maybe help with compile times in the DAG :) |
Yes, there are quite a few TokenFactors in the test case. Also, we have noticed that reducing the |
|
I think we should close this now? |
It's not worth adding the extra flag considering that this is very much a special case. |
When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.