[PowerPC] Add flag to DAG combiner to improve compile time. #86884

stefanp-synopsys · 2024-03-27T22:32:30Z

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.

When combining loads we automatically add any new nodes to the list to be combined in turn. This has a positive effect on overall performance but a negative effect on compile time. We have an internal example where compile time becomes much too long to be acceptable. This option (off by default) disables the addition of the new nodes.

llvmbot · 2024-03-27T22:32:57Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Stefan Pintilie (stefanp-ibm)

Changes

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.

Full diff: https://github.com/llvm/llvm-project/pull/86884.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-1)
(modified) llvm/test/CodeGen/PowerPC/legalize-vaarg.ll (+42-5)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
     cl::desc(
         "Enable merging extends and rounds into FCOPYSIGN on vector types"));
 
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+            cl::desc("When combining a load are new nodes added back in"),
+            cl::init(true));
+
+
 namespace {
 
   class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
                                   MVT::Other, Chain, ReplLoad.getValue(1));
 
       // Replace uses with load result and token factor
-      return CombineTo(N, ReplLoad.getValue(0), Token);
+      return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
     }
   }
 
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
 ;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
 
 define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-NEXT:    addi 3, 3, 15
 ; BE-NEXT:    rldicr 3, 3, 0, 59
 ; BE-NEXT:    addi 4, 3, 16
-; BE-NEXT:    addi 5, 3, 31
 ; BE-NEXT:    std 4, -8(1)
-; BE-NEXT:    rldicr 4, 5, 0, 59
+; BE-NEXT:    ld 4, -8(1)
 ; BE-NEXT:    lvx 2, 0, 3
-; BE-NEXT:    addi 3, 4, 16
-; BE-NEXT:    std 3, -8(1)
-; BE-NEXT:    lvx 3, 0, 4
+; BE-NEXT:    addi 4, 4, 15
+; BE-NEXT:    rldicr 3, 4, 0, 59
+; BE-NEXT:    addi 4, 3, 16
+; BE-NEXT:    std 4, -8(1)
+; BE-NEXT:    lvx 3, 0, 3
 ; BE-NEXT:    blr
 ;
 ; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; LE-NEXT:    lxvd2x 0, 0, 3
 ; LE-NEXT:    xxswapd 35, 0
 ; LE-NEXT:    blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD:       # %bb.0:
+; BENOLOAD-NEXT:    ld 3, -8(1)
+; BENOLOAD-NEXT:    addi 3, 3, 15
+; BENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    ld 4, -8(1)
+; BENOLOAD-NEXT:    lvx 2, 0, 3
+; BENOLOAD-NEXT:    addi 4, 4, 15
+; BENOLOAD-NEXT:    rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    lvx 3, 0, 3
+; BENOLOAD-NEXT:    blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD:       # %bb.0:
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    xxswapd 34, 0
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    xxswapd 35, 0
+; LENOLOAD-NEXT:    blr
   %args = alloca ptr, align 4
   %x = va_arg ptr %args, <8 x i32>
   ret <8 x i32> %x

llvmbot · 2024-03-27T22:32:57Z

@llvm/pr-subscribers-backend-powerpc

Author: Stefan Pintilie (stefanp-ibm)

Changes

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.

Full diff: https://github.com/llvm/llvm-project/pull/86884.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-1)
(modified) llvm/test/CodeGen/PowerPC/legalize-vaarg.ll (+42-5)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
     cl::desc(
         "Enable merging extends and rounds into FCOPYSIGN on vector types"));
 
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+            cl::desc("When combining a load are new nodes added back in"),
+            cl::init(true));
+
+
 namespace {
 
   class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
                                   MVT::Other, Chain, ReplLoad.getValue(1));
 
       // Replace uses with load result and token factor
-      return CombineTo(N, ReplLoad.getValue(0), Token);
+      return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
     }
   }
 
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
 ;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
 
 define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-NEXT:    addi 3, 3, 15
 ; BE-NEXT:    rldicr 3, 3, 0, 59
 ; BE-NEXT:    addi 4, 3, 16
-; BE-NEXT:    addi 5, 3, 31
 ; BE-NEXT:    std 4, -8(1)
-; BE-NEXT:    rldicr 4, 5, 0, 59
+; BE-NEXT:    ld 4, -8(1)
 ; BE-NEXT:    lvx 2, 0, 3
-; BE-NEXT:    addi 3, 4, 16
-; BE-NEXT:    std 3, -8(1)
-; BE-NEXT:    lvx 3, 0, 4
+; BE-NEXT:    addi 4, 4, 15
+; BE-NEXT:    rldicr 3, 4, 0, 59
+; BE-NEXT:    addi 4, 3, 16
+; BE-NEXT:    std 4, -8(1)
+; BE-NEXT:    lvx 3, 0, 3
 ; BE-NEXT:    blr
 ;
 ; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; LE-NEXT:    lxvd2x 0, 0, 3
 ; LE-NEXT:    xxswapd 35, 0
 ; LE-NEXT:    blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD:       # %bb.0:
+; BENOLOAD-NEXT:    ld 3, -8(1)
+; BENOLOAD-NEXT:    addi 3, 3, 15
+; BENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    ld 4, -8(1)
+; BENOLOAD-NEXT:    lvx 2, 0, 3
+; BENOLOAD-NEXT:    addi 4, 4, 15
+; BENOLOAD-NEXT:    rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    lvx 3, 0, 3
+; BENOLOAD-NEXT:    blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD:       # %bb.0:
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    xxswapd 34, 0
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    xxswapd 35, 0
+; LENOLOAD-NEXT:    blr
   %args = alloca ptr, align 4
   %x = va_arg ptr %args, <8 x i32>
   ret <8 x i32> %x

github-actions · 2024-03-27T22:35:06Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-03-28T12:20:59Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

    cl::desc(
        "Enable merging extends and rounds into FCOPYSIGN on vector types"));

+static cl::opt<bool>


These flags aren't really intended for end users, and are difficult to discover. Is there a threshold or something you can add instead?

I know what you mean about these options being mostly for debug. However, this compile time issue is more of a special case. My fear with adding a threshold is that this is target indep code and it can negatively impact performance on a whole set of targets. To avoid this I would still prefer to keep the default as it is now and add an option that would add a threshold value. So, either way, I get a debug option like this one.

I fear that fear of touching other targets results in an increasing set of single purpose knobs like this, the weight of which is quite high.

RKSimon · 2024-03-28T17:26:01Z

Can you give any more information on what nodes you are adding that causes such a problem? Are you adding a large number of TokenFactors or something?

I should also mention #83422 which will/would/maybe help with compile times in the DAG :)

stefanp-synopsys · 2024-03-28T22:37:09Z

Can you give any more information on what nodes you are adding that causes such a problem? Are you adding a large number of TokenFactors or something?

I should also mention #83422 which will/would/maybe help with compile times in the DAG :)

Yes, there are quite a few TokenFactors in the test case. Also, we have noticed that reducing the GatherAllAliasesMaxDepth from 18 down to 9 seems to help. However, I'm not sure what kind of performance impact that would have. We are investigating in that direction at the moment.

chenzheng1030 · 2024-08-20T02:51:27Z

I think we should close this now?

stefanp-synopsys · 2024-11-29T20:46:06Z

I think we should close this now?

It's not worth adding the extra flag considering that this is very much a special case.
Closing this issue now.

llvmbot added backend:PowerPC llvm:SelectionDAG SelectionDAGISel as well labels Mar 27, 2024

Forgot to run git clang format. This fixes the format.

84fe3fd

stefanp-synopsys self-assigned this Mar 27, 2024

stefanp-synopsys requested review from amy-kwan, chenzheng1030 and lei137 March 27, 2024 23:05

Updated the test case.

b08dba0

arsenm reviewed Mar 28, 2024

View reviewed changes

stefanp-synopsys closed this Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PowerPC] Add flag to DAG combiner to improve compile time. #86884

[PowerPC] Add flag to DAG combiner to improve compile time. #86884

Uh oh!

stefanp-synopsys commented Mar 27, 2024

Uh oh!

llvmbot commented Mar 27, 2024

Uh oh!

llvmbot commented Mar 27, 2024

Uh oh!

github-actions bot commented Mar 27, 2024 •

edited

Loading

Uh oh!

arsenm Mar 28, 2024

Uh oh!

stefanp-synopsys Mar 28, 2024

Uh oh!

arsenm Mar 28, 2024

Uh oh!

RKSimon commented Mar 28, 2024

Uh oh!

stefanp-synopsys commented Mar 28, 2024

Uh oh!

chenzheng1030 commented Aug 20, 2024

Uh oh!

stefanp-synopsys commented Nov 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[PowerPC] Add flag to DAG combiner to improve compile time. #86884

[PowerPC] Add flag to DAG combiner to improve compile time. #86884

Uh oh!

Conversation

stefanp-synopsys commented Mar 27, 2024

Uh oh!

llvmbot commented Mar 27, 2024

Uh oh!

llvmbot commented Mar 27, 2024

Uh oh!

github-actions bot commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

stefanp-synopsys Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

RKSimon commented Mar 28, 2024

Uh oh!

stefanp-synopsys commented Mar 28, 2024

Uh oh!

chenzheng1030 commented Aug 20, 2024

Uh oh!

stefanp-synopsys commented Nov 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Mar 27, 2024 •

edited

Loading