Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions llvm/lib/Target/PowerPC/PPCISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15556,6 +15556,63 @@ SDValue PPCTargetLowering::combineSetCC(SDNode *N,
SDValue Add = DAG.getNode(ISD::ADD, DL, OpVT, LHS, RHS.getOperand(1));
return DAG.getSetCC(DL, VT, Add, DAG.getConstant(0, DL, OpVT), CC);
}
if (Subtarget.hasVSX()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a documentation as to the type of optimization that is being done in this block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vcmpequb is an Altivec vector instruction, not VSX.

if (LHS.getOpcode() == ISD::LOAD && RHS.getOpcode() == ISD::LOAD &&
LHS.hasOneUse() && RHS.hasOneUse() &&
LHS.getValueType() == MVT::i128 && RHS.getValueType() == MVT::i128) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the type restriction?

Copy link
Contributor Author

@diggerlin diggerlin Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the pass expand-memcmp

 %bcmp = tail call i32 @bcmp(ptr noundef nonnull dereferenceable(16) %a, ptr noundef nonnull dereferenceable(16) %b, i64 16)
  %cmp = icmp eq i32 %bcmp, 0
  %conv = zext i1 %cmp to i32
  ret i32 %conv

is changed to

%0 = load i128, ptr %a, align 1
 %1 = load i128, ptr %b, align 1
 %2 = icmp ne i128 %0, %1
 %3 = zext i1 %2 to i32
 %cmp = icmp eq i32 %3, 0
 %conv = zext i1 %cmp to i32
 ret i32 %conv

but in original code, the load i128, ptr %a, align 1 is lowered to

t27: i64,ch = load<(load (s64) from %ir.a, align 1)> t0, t2, undef:i64
           t32: i64,ch = load<(load (s64) from %ir.b, align 1)> t0, t4, undef:i64

in 64-bit mode, it is not efficient with two ld instruction in 64-bit mode or four lwz in 32-bit mode.

we want to i128 to be converted to vector load. so there is type restriction.

SDLoc DL(N);
SelectionDAG &DAG = DCI.DAG;
auto *LA = dyn_cast<LoadSDNode>(LHS);
auto *LB = dyn_cast<LoadSDNode>(RHS);
if (!LA || !LB)
return SDValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't all conditions that are not meet for this optimization results in the default behaviour for this function on line 15618 below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I will fix it, thanks, I thought that the ISD::SETCC only return i1, and the function PPCTargetLowering::DAGCombineTruncBoolExt only deal with i32 and i64, so I return SDValue() here directly. but the ISD::SETCC maybe return i32/i64 too.


// If either memory operation (LA or LB) is volatile, do not perform any
// optimization or transformation. Volatile operations must be preserved
// as written to ensure correct program behavior, so we return an empty
// SDValue to indicate no action.
if (LA->isVolatile() || LB->isVolatile())
return SDValue();

// Only combine loads if both use the unindexed addressing mode.
// PowerPC AltiVec/VMX does not support vector loads or stores with
// pre/post-increment addressing. Indexed modes may imply implicit
// pointer updates, which are not compatible with AltiVec vector
// instructions.
if (LA->getAddressingMode() != ISD::UNINDEXED ||
LB->getAddressingMode() != ISD::UNINDEXED)
return SDValue();

// Only combine loads if both are non-extending loads
// (ISD::NON_EXTLOAD). Extending loads (such as ISD::ZEXTLOAD or
// ISD::SEXTLOAD) perform zero or sign extension, which may change the
// loaded value's semantics and are not compatible with vector loads.
if (LA->getExtensionType() != ISD::NON_EXTLOAD ||
LB->getExtensionType() != ISD::NON_EXTLOAD)
return SDValue();
// Build new v16i8 loads using the same chain/base/MMO (no extra memory
// op).
SDValue LHSVec = DAG.getLoad(MVT::v16i8, DL, LA->getChain(),
LA->getBasePtr(), LA->getMemOperand());
SDValue RHSVec = DAG.getLoad(MVT::v16i8, DL, LB->getChain(),
LB->getBasePtr(), LB->getMemOperand());

SDValue IntrID =
DAG.getTargetConstant(Intrinsic::ppc_altivec_vcmpequb_p, DL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just use getConstant.

Subtarget.isPPC64() ? MVT::i64 : MVT::i32);
SDValue CRSel =
DAG.getConstant(2, DL, MVT::i32); // which CR6 predicate field
SDValue Ops[] = {IntrID, CRSel, LHSVec, RHSVec};
SDValue PredResult =
DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, MVT::i32, Ops);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops[] is not needed, just inine it to the call?


// ppc_altivec_vcmpequb_p returns 1 when two vectors are the same,
// so we need to invert the CC opcode.
return DAG.getSetCC(DL, N->getValueType(0), PredResult,
DAG.getConstant(0, DL, MVT::i32),
CC == ISD::SETNE ? ISD::SETEQ : ISD::SETNE);
}
}
}

return DAGCombineTruncBoolExt(N, DCI);
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ bool PPCTTIImpl::enableAggressiveInterleaving(bool LoopHasReductions) const {
PPCTTIImpl::TTI::MemCmpExpansionOptions
PPCTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
TTI::MemCmpExpansionOptions Options;
Options.LoadSizes = {8, 4, 2, 1};
Options.LoadSizes = {16, 8, 4, 2, 1};
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
return Options;
}
Expand Down
45 changes: 17 additions & 28 deletions llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll
Original file line number Diff line number Diff line change
Expand Up @@ -35,18 +35,13 @@ define signext i32 @zeroEqualityTest02(ptr %x, ptr %y) {
define signext i32 @zeroEqualityTest01(ptr %x, ptr %y) {
; CHECK-LABEL: zeroEqualityTest01:
; CHECK: # %bb.0:
; CHECK-NEXT: ld 5, 0(3)
; CHECK-NEXT: ld 6, 0(4)
; CHECK-NEXT: cmpld 5, 6
; CHECK-NEXT: bne 0, .LBB1_2
; CHECK-NEXT: # %bb.1: # %loadbb1
; CHECK-NEXT: ld 5, 8(3)
; CHECK-NEXT: ld 4, 8(4)
; CHECK-NEXT: li 3, 0
; CHECK-NEXT: cmpld 5, 4
; CHECK-NEXT: beqlr 0
; CHECK-NEXT: .LBB1_2: # %res_block
; CHECK-NEXT: li 3, 1
; CHECK-NEXT: lxvd2x 34, 0, 4
; CHECK-NEXT: lxvd2x 35, 0, 3
; CHECK-NEXT: vcmpequb. 2, 3, 2
; CHECK-NEXT: mfocrf 3, 2
; CHECK-NEXT: rlwinm 3, 3, 25, 31, 31
; CHECK-NEXT: cntlzw 3, 3
; CHECK-NEXT: srwi 3, 3, 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra instruction? I think isolating and flipping the bit can just be rlwinm/xori.

; CHECK-NEXT: blr
%call = tail call signext i32 @memcmp(ptr %x, ptr %y, i64 16)
%not.tobool = icmp ne i32 %call, 0
Expand Down Expand Up @@ -85,7 +80,7 @@ define signext i32 @zeroEqualityTest03(ptr %x, ptr %y) {
; Validate with > 0
define signext i32 @zeroEqualityTest04() {
; CHECK-LABEL: zeroEqualityTest04:
; CHECK: # %bb.0: # %loadbb
; CHECK: # %bb.0:
; CHECK-NEXT: li 3, 0
; CHECK-NEXT: blr
%call = tail call signext i32 @memcmp(ptr @zeroEqualityTest02.buffer1, ptr @zeroEqualityTest02.buffer2, i64 16)
Expand All @@ -97,7 +92,7 @@ define signext i32 @zeroEqualityTest04() {
; Validate with < 0
define signext i32 @zeroEqualityTest05() {
; CHECK-LABEL: zeroEqualityTest05:
; CHECK: # %bb.0: # %loadbb
; CHECK: # %bb.0:
; CHECK-NEXT: li 3, 0
; CHECK-NEXT: blr
%call = tail call signext i32 @memcmp(ptr @zeroEqualityTest03.buffer1, ptr @zeroEqualityTest03.buffer2, i64 16)
Expand All @@ -109,7 +104,7 @@ define signext i32 @zeroEqualityTest05() {
; Validate with memcmp()?:
define signext i32 @equalityFoldTwoConstants() {
; CHECK-LABEL: equalityFoldTwoConstants:
; CHECK: # %bb.0: # %loadbb
; CHECK: # %bb.0:
; CHECK-NEXT: li 3, 1
; CHECK-NEXT: blr
%call = tail call signext i32 @memcmp(ptr @zeroEqualityTest04.buffer1, ptr @zeroEqualityTest04.buffer2, i64 16)
Expand All @@ -122,23 +117,17 @@ define signext i32 @equalityFoldOneConstant(ptr %X) {
; CHECK-LABEL: equalityFoldOneConstant:
; CHECK: # %bb.0:
; CHECK-NEXT: li 5, 1
; CHECK-NEXT: ld 4, 0(3)
; CHECK-NEXT: ld 4, 8(3)
; CHECK-NEXT: ld 3, 0(3)
; CHECK-NEXT: rldic 5, 5, 32, 31
; CHECK-NEXT: cmpld 4, 5
; CHECK-NEXT: bne 0, .LBB6_2
; CHECK-NEXT: # %bb.1: # %loadbb1
; CHECK-NEXT: xor 3, 3, 5
; CHECK-NEXT: lis 5, -32768
; CHECK-NEXT: ld 4, 8(3)
; CHECK-NEXT: li 3, 0
; CHECK-NEXT: ori 5, 5, 1
; CHECK-NEXT: rldic 5, 5, 1, 30
; CHECK-NEXT: cmpld 4, 5
; CHECK-NEXT: beq 0, .LBB6_3
; CHECK-NEXT: .LBB6_2: # %res_block
; CHECK-NEXT: li 3, 1
; CHECK-NEXT: .LBB6_3: # %endblock
; CHECK-NEXT: cntlzw 3, 3
; CHECK-NEXT: srwi 3, 3, 5
; CHECK-NEXT: xor 4, 4, 5
; CHECK-NEXT: or 3, 3, 4
; CHECK-NEXT: cntlzd 3, 3
; CHECK-NEXT: rldicl 3, 3, 58, 63
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not change this sequence? It seems like a side effect and I'm not sure it's faster or slower.

; CHECK-NEXT: blr
%call = tail call signext i32 @memcmp(ptr @zeroEqualityTest04.buffer1, ptr %X, i64 16)
%not.tobool = icmp eq i32 %call, 0
Expand Down
112 changes: 20 additions & 92 deletions llvm/test/CodeGen/PowerPC/memcmp32_fixsize.ll
Original file line number Diff line number Diff line change
Expand Up @@ -14,110 +14,38 @@
define dso_local signext range(i32 0, 2) i32 @cmpeq16(ptr noundef readonly captures(none) %a, ptr noundef readonly captures(none) %b) {
; CHECK-AIX32-P8-LABEL: cmpeq16:
; CHECK-AIX32-P8: # %bb.0: # %entry
; CHECK-AIX32-P8-NEXT: lwz r5, 4(r3)
; CHECK-AIX32-P8-NEXT: lwz r6, 0(r3)
; CHECK-AIX32-P8-NEXT: lwz r7, 4(r4)
; CHECK-AIX32-P8-NEXT: lwz r8, 0(r4)
; CHECK-AIX32-P8-NEXT: xor r6, r6, r8
; CHECK-AIX32-P8-NEXT: xor r5, r5, r7
; CHECK-AIX32-P8-NEXT: or. r5, r5, r6
; CHECK-AIX32-P8-NEXT: bne cr0, L..BB0_2
; CHECK-AIX32-P8-NEXT: # %bb.1: # %loadbb1
; CHECK-AIX32-P8-NEXT: lwz r5, 12(r3)
; CHECK-AIX32-P8-NEXT: lwz r3, 8(r3)
; CHECK-AIX32-P8-NEXT: lwz r6, 12(r4)
; CHECK-AIX32-P8-NEXT: lwz r4, 8(r4)
; CHECK-AIX32-P8-NEXT: xor r3, r3, r4
; CHECK-AIX32-P8-NEXT: xor r4, r5, r6
; CHECK-AIX32-P8-NEXT: or. r3, r4, r3
; CHECK-AIX32-P8-NEXT: li r3, 0
; CHECK-AIX32-P8-NEXT: beq cr0, L..BB0_3
; CHECK-AIX32-P8-NEXT: L..BB0_2: # %res_block
; CHECK-AIX32-P8-NEXT: li r3, 1
; CHECK-AIX32-P8-NEXT: L..BB0_3: # %endblock
; CHECK-AIX32-P8-NEXT: cntlzw r3, r3
; CHECK-AIX32-P8-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-AIX32-P8-NEXT: lxvw4x vs34, 0, r4
; CHECK-AIX32-P8-NEXT: lxvw4x vs35, 0, r3
; CHECK-AIX32-P8-NEXT: vcmpequb. v2, v3, v2
; CHECK-AIX32-P8-NEXT: mfocrf r3, 2
; CHECK-AIX32-P8-NEXT: rlwinm r3, r3, 25, 31, 31
; CHECK-AIX32-P8-NEXT: blr
;
; CHECK-AIX32-P10-LABEL: cmpeq16:
; CHECK-AIX32-P10: # %bb.0: # %entry
; CHECK-AIX32-P10-NEXT: lwz r5, 4(r3)
; CHECK-AIX32-P10-NEXT: lwz r6, 0(r3)
; CHECK-AIX32-P10-NEXT: lwz r7, 4(r4)
; CHECK-AIX32-P10-NEXT: xor r5, r5, r7
; CHECK-AIX32-P10-NEXT: lwz r8, 0(r4)
; CHECK-AIX32-P10-NEXT: xor r6, r6, r8
; CHECK-AIX32-P10-NEXT: or. r5, r5, r6
; CHECK-AIX32-P10-NEXT: bne cr0, L..BB0_2
; CHECK-AIX32-P10-NEXT: # %bb.1: # %loadbb1
; CHECK-AIX32-P10-NEXT: lwz r5, 12(r3)
; CHECK-AIX32-P10-NEXT: lwz r3, 8(r3)
; CHECK-AIX32-P10-NEXT: lwz r6, 12(r4)
; CHECK-AIX32-P10-NEXT: lwz r4, 8(r4)
; CHECK-AIX32-P10-NEXT: xor r3, r3, r4
; CHECK-AIX32-P10-NEXT: xor r4, r5, r6
; CHECK-AIX32-P10-NEXT: or. r3, r4, r3
; CHECK-AIX32-P10-NEXT: li r3, 0
; CHECK-AIX32-P10-NEXT: beq cr0, L..BB0_3
; CHECK-AIX32-P10-NEXT: L..BB0_2: # %res_block
; CHECK-AIX32-P10-NEXT: li r3, 1
; CHECK-AIX32-P10-NEXT: L..BB0_3: # %endblock
; CHECK-AIX32-P10-NEXT: cntlzw r3, r3
; CHECK-AIX32-P10-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-AIX32-P10-NEXT: lxv vs34, 0(r4)
; CHECK-AIX32-P10-NEXT: lxv vs35, 0(r3)
; CHECK-AIX32-P10-NEXT: vcmpequb. v2, v3, v2
; CHECK-AIX32-P10-NEXT: setbc r3, 4*cr6+lt
; CHECK-AIX32-P10-NEXT: blr
;
; CHECK-LINUX32-P8-LABEL: cmpeq16:
; CHECK-LINUX32-P8: # %bb.0: # %entry
; CHECK-LINUX32-P8-NEXT: lwz r5, 0(r3)
; CHECK-LINUX32-P8-NEXT: lwz r6, 4(r3)
; CHECK-LINUX32-P8-NEXT: lwz r7, 0(r4)
; CHECK-LINUX32-P8-NEXT: lwz r8, 4(r4)
; CHECK-LINUX32-P8-NEXT: xor r6, r6, r8
; CHECK-LINUX32-P8-NEXT: xor r5, r5, r7
; CHECK-LINUX32-P8-NEXT: or. r5, r5, r6
; CHECK-LINUX32-P8-NEXT: bne cr0, .LBB0_2
; CHECK-LINUX32-P8-NEXT: # %bb.1: # %loadbb1
; CHECK-LINUX32-P8-NEXT: lwz r5, 8(r3)
; CHECK-LINUX32-P8-NEXT: lwz r3, 12(r3)
; CHECK-LINUX32-P8-NEXT: lwz r6, 8(r4)
; CHECK-LINUX32-P8-NEXT: lwz r4, 12(r4)
; CHECK-LINUX32-P8-NEXT: xor r3, r3, r4
; CHECK-LINUX32-P8-NEXT: xor r4, r5, r6
; CHECK-LINUX32-P8-NEXT: or. r3, r4, r3
; CHECK-LINUX32-P8-NEXT: li r3, 0
; CHECK-LINUX32-P8-NEXT: beq cr0, .LBB0_3
; CHECK-LINUX32-P8-NEXT: .LBB0_2: # %res_block
; CHECK-LINUX32-P8-NEXT: li r3, 1
; CHECK-LINUX32-P8-NEXT: .LBB0_3: # %endblock
; CHECK-LINUX32-P8-NEXT: cntlzw r3, r3
; CHECK-LINUX32-P8-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-LINUX32-P8-NEXT: lxvd2x vs0, 0, r4
; CHECK-LINUX32-P8-NEXT: xxswapd vs34, vs0
; CHECK-LINUX32-P8-NEXT: lxvd2x vs0, 0, r3
; CHECK-LINUX32-P8-NEXT: xxswapd vs35, vs0
; CHECK-LINUX32-P8-NEXT: vcmpequb. v2, v3, v2
; CHECK-LINUX32-P8-NEXT: mfocrf r3, 2
; CHECK-LINUX32-P8-NEXT: rlwinm r3, r3, 25, 31, 31
; CHECK-LINUX32-P8-NEXT: blr
;
; CHECK-LINUX32-P10-LABEL: cmpeq16:
; CHECK-LINUX32-P10: # %bb.0: # %entry
; CHECK-LINUX32-P10-NEXT: lwz r5, 0(r3)
; CHECK-LINUX32-P10-NEXT: lwz r6, 4(r3)
; CHECK-LINUX32-P10-NEXT: lwz r7, 0(r4)
; CHECK-LINUX32-P10-NEXT: xor r5, r5, r7
; CHECK-LINUX32-P10-NEXT: lwz r8, 4(r4)
; CHECK-LINUX32-P10-NEXT: xor r6, r6, r8
; CHECK-LINUX32-P10-NEXT: or. r5, r5, r6
; CHECK-LINUX32-P10-NEXT: bne cr0, .LBB0_2
; CHECK-LINUX32-P10-NEXT: # %bb.1: # %loadbb1
; CHECK-LINUX32-P10-NEXT: lwz r5, 8(r3)
; CHECK-LINUX32-P10-NEXT: lwz r3, 12(r3)
; CHECK-LINUX32-P10-NEXT: lwz r6, 8(r4)
; CHECK-LINUX32-P10-NEXT: lwz r4, 12(r4)
; CHECK-LINUX32-P10-NEXT: xor r3, r3, r4
; CHECK-LINUX32-P10-NEXT: xor r4, r5, r6
; CHECK-LINUX32-P10-NEXT: or. r3, r4, r3
; CHECK-LINUX32-P10-NEXT: li r3, 0
; CHECK-LINUX32-P10-NEXT: beq cr0, .LBB0_3
; CHECK-LINUX32-P10-NEXT: .LBB0_2: # %res_block
; CHECK-LINUX32-P10-NEXT: li r3, 1
; CHECK-LINUX32-P10-NEXT: .LBB0_3: # %endblock
; CHECK-LINUX32-P10-NEXT: cntlzw r3, r3
; CHECK-LINUX32-P10-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-LINUX32-P10-NEXT: lxv vs34, 0(r4)
; CHECK-LINUX32-P10-NEXT: lxv vs35, 0(r3)
; CHECK-LINUX32-P10-NEXT: vcmpequb. v2, v3, v2
; CHECK-LINUX32-P10-NEXT: setbc r3, 4*cr6+lt
; CHECK-LINUX32-P10-NEXT: blr
entry:
%bcmp = tail call i32 @bcmp(ptr noundef nonnull dereferenceable(16) %a, ptr noundef nonnull dereferenceable(16) %b, i32 16)
Expand Down
78 changes: 18 additions & 60 deletions llvm/test/CodeGen/PowerPC/memcmp64_fixsize.ll
Original file line number Diff line number Diff line change
Expand Up @@ -14,78 +14,36 @@
define dso_local signext range(i32 0, 2) i32 @cmpeq16(ptr noundef readonly captures(none) %a, ptr noundef readonly captures(none) %b) {
; CHECK-AIX64-32-P8-LABEL: cmpeq16:
; CHECK-AIX64-32-P8: # %bb.0: # %entry
; CHECK-AIX64-32-P8-NEXT: ld r5, 0(r3)
; CHECK-AIX64-32-P8-NEXT: ld r6, 0(r4)
; CHECK-AIX64-32-P8-NEXT: cmpld r5, r6
; CHECK-AIX64-32-P8-NEXT: bne cr0, L..BB0_2
; CHECK-AIX64-32-P8-NEXT: # %bb.1: # %loadbb1
; CHECK-AIX64-32-P8-NEXT: ld r5, 8(r3)
; CHECK-AIX64-32-P8-NEXT: ld r4, 8(r4)
; CHECK-AIX64-32-P8-NEXT: li r3, 0
; CHECK-AIX64-32-P8-NEXT: cmpld r5, r4
; CHECK-AIX64-32-P8-NEXT: beq cr0, L..BB0_3
; CHECK-AIX64-32-P8-NEXT: L..BB0_2: # %res_block
; CHECK-AIX64-32-P8-NEXT: li r3, 1
; CHECK-AIX64-32-P8-NEXT: L..BB0_3: # %endblock
; CHECK-AIX64-32-P8-NEXT: cntlzw r3, r3
; CHECK-AIX64-32-P8-NEXT: srwi r3, r3, 5
; CHECK-AIX64-32-P8-NEXT: lxvw4x vs34, 0, r4
; CHECK-AIX64-32-P8-NEXT: lxvw4x vs35, 0, r3
; CHECK-AIX64-32-P8-NEXT: vcmpequb. v2, v3, v2
; CHECK-AIX64-32-P8-NEXT: mfocrf r3, 2
; CHECK-AIX64-32-P8-NEXT: rlwinm r3, r3, 25, 31, 31
; CHECK-AIX64-32-P8-NEXT: blr
;
; CHECK-AIX64-32-P10-LABEL: cmpeq16:
; CHECK-AIX64-32-P10: # %bb.0: # %entry
; CHECK-AIX64-32-P10-NEXT: ld r5, 0(r3)
; CHECK-AIX64-32-P10-NEXT: ld r6, 0(r4)
; CHECK-AIX64-32-P10-NEXT: cmpld r5, r6
; CHECK-AIX64-32-P10-NEXT: bne cr0, L..BB0_2
; CHECK-AIX64-32-P10-NEXT: # %bb.1: # %loadbb1
; CHECK-AIX64-32-P10-NEXT: ld r5, 8(r3)
; CHECK-AIX64-32-P10-NEXT: ld r4, 8(r4)
; CHECK-AIX64-32-P10-NEXT: li r3, 0
; CHECK-AIX64-32-P10-NEXT: cmpld r5, r4
; CHECK-AIX64-32-P10-NEXT: beq cr0, L..BB0_3
; CHECK-AIX64-32-P10-NEXT: L..BB0_2: # %res_block
; CHECK-AIX64-32-P10-NEXT: li r3, 1
; CHECK-AIX64-32-P10-NEXT: L..BB0_3: # %endblock
; CHECK-AIX64-32-P10-NEXT: cntlzw r3, r3
; CHECK-AIX64-32-P10-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-AIX64-32-P10-NEXT: lxv vs34, 0(r4)
; CHECK-AIX64-32-P10-NEXT: lxv vs35, 0(r3)
; CHECK-AIX64-32-P10-NEXT: vcmpequb. v2, v3, v2
; CHECK-AIX64-32-P10-NEXT: setbc r3, 4*cr6+lt
; CHECK-AIX64-32-P10-NEXT: blr
;
; CHECK-LINUX64-P8-LABEL: cmpeq16:
; CHECK-LINUX64-P8: # %bb.0: # %entry
; CHECK-LINUX64-P8-NEXT: ld r5, 0(r3)
; CHECK-LINUX64-P8-NEXT: ld r6, 0(r4)
; CHECK-LINUX64-P8-NEXT: cmpld r5, r6
; CHECK-LINUX64-P8-NEXT: bne cr0, .LBB0_2
; CHECK-LINUX64-P8-NEXT: # %bb.1: # %loadbb1
; CHECK-LINUX64-P8-NEXT: ld r5, 8(r3)
; CHECK-LINUX64-P8-NEXT: ld r4, 8(r4)
; CHECK-LINUX64-P8-NEXT: li r3, 0
; CHECK-LINUX64-P8-NEXT: cmpld r5, r4
; CHECK-LINUX64-P8-NEXT: beq cr0, .LBB0_3
; CHECK-LINUX64-P8-NEXT: .LBB0_2: # %res_block
; CHECK-LINUX64-P8-NEXT: li r3, 1
; CHECK-LINUX64-P8-NEXT: .LBB0_3: # %endblock
; CHECK-LINUX64-P8-NEXT: cntlzw r3, r3
; CHECK-LINUX64-P8-NEXT: srwi r3, r3, 5
; CHECK-LINUX64-P8-NEXT: lxvd2x vs34, 0, r4
; CHECK-LINUX64-P8-NEXT: lxvd2x vs35, 0, r3
; CHECK-LINUX64-P8-NEXT: vcmpequb. v2, v3, v2
; CHECK-LINUX64-P8-NEXT: mfocrf r3, 2
; CHECK-LINUX64-P8-NEXT: rlwinm r3, r3, 25, 31, 31
; CHECK-LINUX64-P8-NEXT: blr
;
; CHECK-LINUX64-P10-LABEL: cmpeq16:
; CHECK-LINUX64-P10: # %bb.0: # %entry
; CHECK-LINUX64-P10-NEXT: ld r5, 0(r3)
; CHECK-LINUX64-P10-NEXT: ld r6, 0(r4)
; CHECK-LINUX64-P10-NEXT: cmpld r5, r6
; CHECK-LINUX64-P10-NEXT: bne cr0, .LBB0_2
; CHECK-LINUX64-P10-NEXT: # %bb.1: # %loadbb1
; CHECK-LINUX64-P10-NEXT: ld r5, 8(r3)
; CHECK-LINUX64-P10-NEXT: ld r4, 8(r4)
; CHECK-LINUX64-P10-NEXT: li r3, 0
; CHECK-LINUX64-P10-NEXT: cmpld r5, r4
; CHECK-LINUX64-P10-NEXT: beq cr0, .LBB0_3
; CHECK-LINUX64-P10-NEXT: .LBB0_2: # %res_block
; CHECK-LINUX64-P10-NEXT: li r3, 1
; CHECK-LINUX64-P10-NEXT: .LBB0_3: # %endblock
; CHECK-LINUX64-P10-NEXT: cntlzw r3, r3
; CHECK-LINUX64-P10-NEXT: rlwinm r3, r3, 27, 31, 31
; CHECK-LINUX64-P10-NEXT: lxv vs34, 0(r4)
; CHECK-LINUX64-P10-NEXT: lxv vs35, 0(r3)
; CHECK-LINUX64-P10-NEXT: vcmpequb. v2, v3, v2
; CHECK-LINUX64-P10-NEXT: setbc r3, 4*cr6+lt
; CHECK-LINUX64-P10-NEXT: blr
entry:
%bcmp = tail call i32 @bcmp(ptr noundef nonnull dereferenceable(16) %a, ptr noundef nonnull dereferenceable(16) %b, i64 16)
Expand Down
Loading
Loading