Skip to content

Conversation

@stefanp-synopsys
Copy link
Contributor

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.
@llvmbot llvmbot added backend:PowerPC llvm:SelectionDAG SelectionDAGISel as well labels Mar 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 27, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Stefan Pintilie (stefanp-ibm)

Changes

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.


Full diff: https://github.com/llvm/llvm-project/pull/86884.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-1)
  • (modified) llvm/test/CodeGen/PowerPC/legalize-vaarg.ll (+42-5)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
     cl::desc(
         "Enable merging extends and rounds into FCOPYSIGN on vector types"));
 
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+            cl::desc("When combining a load are new nodes added back in"),
+            cl::init(true));
+
+
 namespace {
 
   class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
                                   MVT::Other, Chain, ReplLoad.getValue(1));
 
       // Replace uses with load result and token factor
-      return CombineTo(N, ReplLoad.getValue(0), Token);
+      return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
     }
   }
 
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
 ;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
 
 define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-NEXT:    addi 3, 3, 15
 ; BE-NEXT:    rldicr 3, 3, 0, 59
 ; BE-NEXT:    addi 4, 3, 16
-; BE-NEXT:    addi 5, 3, 31
 ; BE-NEXT:    std 4, -8(1)
-; BE-NEXT:    rldicr 4, 5, 0, 59
+; BE-NEXT:    ld 4, -8(1)
 ; BE-NEXT:    lvx 2, 0, 3
-; BE-NEXT:    addi 3, 4, 16
-; BE-NEXT:    std 3, -8(1)
-; BE-NEXT:    lvx 3, 0, 4
+; BE-NEXT:    addi 4, 4, 15
+; BE-NEXT:    rldicr 3, 4, 0, 59
+; BE-NEXT:    addi 4, 3, 16
+; BE-NEXT:    std 4, -8(1)
+; BE-NEXT:    lvx 3, 0, 3
 ; BE-NEXT:    blr
 ;
 ; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; LE-NEXT:    lxvd2x 0, 0, 3
 ; LE-NEXT:    xxswapd 35, 0
 ; LE-NEXT:    blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD:       # %bb.0:
+; BENOLOAD-NEXT:    ld 3, -8(1)
+; BENOLOAD-NEXT:    addi 3, 3, 15
+; BENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    ld 4, -8(1)
+; BENOLOAD-NEXT:    lvx 2, 0, 3
+; BENOLOAD-NEXT:    addi 4, 4, 15
+; BENOLOAD-NEXT:    rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    lvx 3, 0, 3
+; BENOLOAD-NEXT:    blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD:       # %bb.0:
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    xxswapd 34, 0
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    xxswapd 35, 0
+; LENOLOAD-NEXT:    blr
   %args = alloca ptr, align 4
   %x = va_arg ptr %args, <8 x i32>
   ret <8 x i32> %x

@llvmbot
Copy link
Member

llvmbot commented Mar 27, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Stefan Pintilie (stefanp-ibm)

Changes

When combining loads we automatically add any new nodes to the list to be
combined in turn. This has a positive effect on overall performance but a
negative effect on compile time. We have an internal example where compile time
becomes much too long to be acceptable. This option (off by default) disables
the addition of the new nodes.


Full diff: https://github.com/llvm/llvm-project/pull/86884.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-1)
  • (modified) llvm/test/CodeGen/PowerPC/legalize-vaarg.ll (+42-5)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 36abe27d262176..a80b0fdc804ab8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -150,6 +150,12 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
     cl::desc(
         "Enable merging extends and rounds into FCOPYSIGN on vector types"));
 
+static cl::opt<bool>
+AddLoadBack("combiner-add-load-back", cl::Hidden,
+            cl::desc("When combining a load are new nodes added back in"),
+            cl::init(true));
+
+
 namespace {
 
   class DAGCombiner {
@@ -18808,7 +18814,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
                                   MVT::Other, Chain, ReplLoad.getValue(1));
 
       // Replace uses with load result and token factor
-      return CombineTo(N, ReplLoad.getValue(0), Token);
+      return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
     }
   }
 
diff --git a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
index b7f8b8af2472aa..264da54f8e6149 100644
--- a/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
+++ b/llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
 ;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
+;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
+;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD
 
 define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-LABEL: test_large_vec_vaarg:
@@ -9,13 +11,14 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; BE-NEXT:    addi 3, 3, 15
 ; BE-NEXT:    rldicr 3, 3, 0, 59
 ; BE-NEXT:    addi 4, 3, 16
-; BE-NEXT:    addi 5, 3, 31
 ; BE-NEXT:    std 4, -8(1)
-; BE-NEXT:    rldicr 4, 5, 0, 59
+; BE-NEXT:    ld 4, -8(1)
 ; BE-NEXT:    lvx 2, 0, 3
-; BE-NEXT:    addi 3, 4, 16
-; BE-NEXT:    std 3, -8(1)
-; BE-NEXT:    lvx 3, 0, 4
+; BE-NEXT:    addi 4, 4, 15
+; BE-NEXT:    rldicr 3, 4, 0, 59
+; BE-NEXT:    addi 4, 3, 16
+; BE-NEXT:    std 4, -8(1)
+; BE-NEXT:    lvx 3, 0, 3
 ; BE-NEXT:    blr
 ;
 ; LE-LABEL: test_large_vec_vaarg:
@@ -35,6 +38,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
 ; LE-NEXT:    lxvd2x 0, 0, 3
 ; LE-NEXT:    xxswapd 35, 0
 ; LE-NEXT:    blr
+;
+; BENOLOAD-LABEL: test_large_vec_vaarg:
+; BENOLOAD:       # %bb.0:
+; BENOLOAD-NEXT:    ld 3, -8(1)
+; BENOLOAD-NEXT:    addi 3, 3, 15
+; BENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    ld 4, -8(1)
+; BENOLOAD-NEXT:    lvx 2, 0, 3
+; BENOLOAD-NEXT:    addi 4, 4, 15
+; BENOLOAD-NEXT:    rldicr 3, 4, 0, 59
+; BENOLOAD-NEXT:    addi 4, 3, 16
+; BENOLOAD-NEXT:    std 4, -8(1)
+; BENOLOAD-NEXT:    lvx 3, 0, 3
+; BENOLOAD-NEXT:    blr
+;
+; LENOLOAD-LABEL: test_large_vec_vaarg:
+; LENOLOAD:       # %bb.0:
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    ld 3, -8(1)
+; LENOLOAD-NEXT:    addi 3, 3, 15
+; LENOLOAD-NEXT:    rldicr 3, 3, 0, 59
+; LENOLOAD-NEXT:    addi 4, 3, 16
+; LENOLOAD-NEXT:    std 4, -8(1)
+; LENOLOAD-NEXT:    xxswapd 34, 0
+; LENOLOAD-NEXT:    lxvd2x 0, 0, 3
+; LENOLOAD-NEXT:    xxswapd 35, 0
+; LENOLOAD-NEXT:    blr
   %args = alloca ptr, align 4
   %x = va_arg ptr %args, <8 x i32>
   ret <8 x i32> %x

@github-actions
Copy link

github-actions bot commented Mar 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

cl::desc(
"Enable merging extends and rounds into FCOPYSIGN on vector types"));

static cl::opt<bool>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags aren't really intended for end users, and are difficult to discover. Is there a threshold or something you can add instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean about these options being mostly for debug. However, this compile time issue is more of a special case. My fear with adding a threshold is that this is target indep code and it can negatively impact performance on a whole set of targets. To avoid this I would still prefer to keep the default as it is now and add an option that would add a threshold value. So, either way, I get a debug option like this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that fear of touching other targets results in an increasing set of single purpose knobs like this, the weight of which is quite high.

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 28, 2024

Can you give any more information on what nodes you are adding that causes such a problem? Are you adding a large number of TokenFactors or something?

I should also mention #83422 which will/would/maybe help with compile times in the DAG :)

@stefanp-synopsys
Copy link
Contributor Author

Can you give any more information on what nodes you are adding that causes such a problem? Are you adding a large number of TokenFactors or something?

I should also mention #83422 which will/would/maybe help with compile times in the DAG :)

Yes, there are quite a few TokenFactors in the test case. Also, we have noticed that reducing the GatherAllAliasesMaxDepth from 18 down to 9 seems to help. However, I'm not sure what kind of performance impact that would have. We are investigating in that direction at the moment.

@chenzheng1030
Copy link
Collaborator

I think we should close this now?

@stefanp-synopsys
Copy link
Contributor Author

I think we should close this now?

It's not worth adding the extra flag considering that this is very much a special case.
Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:PowerPC llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants