Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,11 @@ static cl::opt<bool> EnableVectorFCopySignExtendRound(
cl::desc(
"Enable merging extends and rounds into FCOPYSIGN on vector types"));

static cl::opt<bool>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags aren't really intended for end users, and are difficult to discover. Is there a threshold or something you can add instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean about these options being mostly for debug. However, this compile time issue is more of a special case. My fear with adding a threshold is that this is target indep code and it can negatively impact performance on a whole set of targets. To avoid this I would still prefer to keep the default as it is now and add an option that would add a threshold value. So, either way, I get a debug option like this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that fear of touching other targets results in an increasing set of single purpose knobs like this, the weight of which is quite high.

AddLoadBack("combiner-add-load-back", cl::Hidden,
cl::desc("When combining a load are new nodes added back in"),
cl::init(true));

namespace {

class DAGCombiner {
Expand Down Expand Up @@ -18808,7 +18813,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
MVT::Other, Chain, ReplLoad.getValue(1));

// Replace uses with load result and token factor
return CombineTo(N, ReplLoad.getValue(0), Token);
return CombineTo(N, ReplLoad.getValue(0), Token, AddLoadBack);
}
}

Expand Down
36 changes: 36 additions & 0 deletions llvm/test/CodeGen/PowerPC/legalize-vaarg.ll
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=BE
;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec | FileCheck %s -check-prefix=LE
;RUN: llc < %s --mtriple=powerpc64-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=BENOLOAD
;RUN: llc < %s --mtriple=powerpc64le-unknown-linux-gnu -mattr=+altivec -combiner-add-load-back=false | FileCheck %s -check-prefix=LENOLOAD

define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; BE-LABEL: test_large_vec_vaarg:
Expand Down Expand Up @@ -35,6 +37,40 @@ define <8 x i32> @test_large_vec_vaarg(i32 %n, ...) {
; LE-NEXT: lxvd2x 0, 0, 3
; LE-NEXT: xxswapd 35, 0
; LE-NEXT: blr
;
; BENOLOAD-LABEL: test_large_vec_vaarg:
; BENOLOAD: # %bb.0:
; BENOLOAD-NEXT: ld 3, -8(1)
; BENOLOAD-NEXT: addi 3, 3, 15
; BENOLOAD-NEXT: rldicr 3, 3, 0, 59
; BENOLOAD-NEXT: addi 4, 3, 16
; BENOLOAD-NEXT: std 4, -8(1)
; BENOLOAD-NEXT: ld 4, -8(1)
; BENOLOAD-NEXT: lvx 2, 0, 3
; BENOLOAD-NEXT: addi 4, 4, 15
; BENOLOAD-NEXT: rldicr 3, 4, 0, 59
; BENOLOAD-NEXT: addi 4, 3, 16
; BENOLOAD-NEXT: std 4, -8(1)
; BENOLOAD-NEXT: lvx 3, 0, 3
; BENOLOAD-NEXT: blr
;
; LENOLOAD-LABEL: test_large_vec_vaarg:
; LENOLOAD: # %bb.0:
; LENOLOAD-NEXT: ld 3, -8(1)
; LENOLOAD-NEXT: addi 3, 3, 15
; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
; LENOLOAD-NEXT: addi 4, 3, 16
; LENOLOAD-NEXT: std 4, -8(1)
; LENOLOAD-NEXT: lxvd2x 0, 0, 3
; LENOLOAD-NEXT: ld 3, -8(1)
; LENOLOAD-NEXT: addi 3, 3, 15
; LENOLOAD-NEXT: rldicr 3, 3, 0, 59
; LENOLOAD-NEXT: addi 4, 3, 16
; LENOLOAD-NEXT: std 4, -8(1)
; LENOLOAD-NEXT: xxswapd 34, 0
; LENOLOAD-NEXT: lxvd2x 0, 0, 3
; LENOLOAD-NEXT: xxswapd 35, 0
; LENOLOAD-NEXT: blr
%args = alloca ptr, align 4
%x = va_arg ptr %args, <8 x i32>
ret <8 x i32> %x
Expand Down