Skip to content

Commit 3fc1aad

Browse files
tonykuttaiTony Varghese
andauthored
[PowerPC] Merge vsr(vsro(input, byte_shift), bit_shift) to vsrq(input, res_bit_shift) (#154388)
This change implements a patfrag based pattern matching ~dag combiner~ that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR (Vector Shift Right)` instructions into a single `VSRQ (Vector Shift Right Quadword)` instruction on Power10+ processors. Vector right shift operations like `vec_srl(vec_sro(input, byte_shift), bit_shift)` generate two separate instructions `(VSRO + VSR)` when they could be optimised into a single `VSRQ `instruction that performs the equivalent operation. ``` vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift) where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift ``` Note: ``` vsro : Vector Shift Right by Octet VX-form - vsro VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32]. - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the left. - The result is placed into VSR[VRT+32]. vsr : Vector Shift Right VX-form - vsr VRT, VRA, VRB - The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits. - Bits shifted out of bit 127 are lost. - Zeros are supplied to the vacated bits on the left. - The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined. vsrq : Vector Shift Right Quadword VX-form - vsrq VRT,VRA,VRB - Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32]. - src1 is shifted right by the number of bits specified in the low-order 7 bits of src2. - Bits shifted out the least-significant bit are lost. - Zeros are supplied to the vacated bits on the left. - The result is placed into VSR[VRT+32]. ``` --------- Co-authored-by: Tony Varghese <[email protected]>
1 parent 2e7ea9c commit 3fc1aad

File tree

6 files changed

+360
-1
lines changed

6 files changed

+360
-1
lines changed

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1693,6 +1693,8 @@ const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
16931693
case PPCISD::XXPERM:
16941694
return "PPCISD::XXPERM";
16951695
case PPCISD::VECSHL: return "PPCISD::VECSHL";
1696+
case PPCISD::VSRQ:
1697+
return "PPCISD::VSRQ";
16961698
case PPCISD::CMPB: return "PPCISD::CMPB";
16971699
case PPCISD::Hi: return "PPCISD::Hi";
16981700
case PPCISD::Lo: return "PPCISD::Lo";

llvm/lib/Target/PowerPC/PPCISelLowering.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,9 @@ namespace llvm {
498498
/// SETBCR - The ISA 3.1 (P10) SETBCR instruction.
499499
SETBCR,
500500

501+
/// VSRQ - The ISA 3.1 (P10) Vector Shift right quadword instruction
502+
VSRQ,
503+
501504
// NOTE: The nodes below may require PC-Rel specific patterns if the
502505
// address could be PC-Relative. When adding new nodes below, consider
503506
// whether or not the address can be PC-Relative and add the corresponding

llvm/lib/Target/PowerPC/PPCInstrAltivec.td

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,13 @@ def immEQOneV : PatLeaf<(build_vector), [{
261261
return C->isOne();
262262
return false;
263263
}]>;
264+
265+
def VSRVSRO : PatFrag<(ops node:$input, node:$shift),
266+
(int_ppc_altivec_vsr
267+
(int_ppc_altivec_vsro node:$input, node:$shift),
268+
node:$shift),
269+
[{ return N->getOperand(1).hasOneUse(); }]>;
270+
264271
//===----------------------------------------------------------------------===//
265272
// Helpers for defining instructions that directly correspond to intrinsics.
266273

llvm/lib/Target/PowerPC/PPCInstrInfo.td

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ def SDT_PPCVecShift : SDTypeProfile<1, 3, [ SDTCisVec<0>,
5858
SDTCisVec<1>, SDTCisVec<2>, SDTCisPtrTy<3>
5959
]>;
6060

61+
def SDT_PPCVecShiftQuad : SDTypeProfile<1, 2, [
62+
SDTCisVec<0>, SDTCisSameAs<0,1>, SDTCisSameAs<0,2>
63+
]>;
64+
6165
def SDT_PPCVecInsert : SDTypeProfile<1, 3, [ SDTCisVec<0>,
6266
SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
6367
]>;
@@ -157,6 +161,8 @@ def PPCfctiwz : SDNode<"PPCISD::FCTIWZ", SDTFPUnaryOp, []>;
157161
def PPCfctiduz: SDNode<"PPCISD::FCTIDUZ",SDTFPUnaryOp, []>;
158162
def PPCfctiwuz: SDNode<"PPCISD::FCTIWUZ",SDTFPUnaryOp, []>;
159163

164+
def PPCvsrq: SDNode<"PPCISD::VSRQ", SDT_PPCVecShiftQuad, []>;
165+
160166
def PPCstrict_fcfid : SDNode<"PPCISD::STRICT_FCFID",
161167
SDTFPUnaryOp, [SDNPHasChain]>;
162168
def PPCstrict_fcfidu : SDNode<"PPCISD::STRICT_FCFIDU",

llvm/lib/Target/PowerPC/PPCInstrP10.td

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1918,7 +1918,8 @@ let Predicates = [IsISA3_1] in {
19181918
RegConstraint<"$VDi = $VD">;
19191919
def VSLQ : VX1_VT5_VA5_VB5<261, "vslq", []>;
19201920
def VSRAQ : VX1_VT5_VA5_VB5<773, "vsraq", []>;
1921-
def VSRQ : VX1_VT5_VA5_VB5<517, "vsrq", []>;
1921+
def VSRQ : VX1_VT5_VA5_VB5<517, "vsrq",
1922+
[(set v4i32:$VD, (PPCvsrq v4i32:$VA, v4i32:$VB))]>;
19221923
def VRLQ : VX1_VT5_VA5_VB5<5, "vrlq", []>;
19231924
def XSCVQPUQZ : X_VT5_XO5_VB5<63, 0, 836, "xscvqpuqz", []>;
19241925
def XSCVQPSQZ : X_VT5_XO5_VB5<63, 8, 836, "xscvqpsqz", []>;
@@ -2053,6 +2054,9 @@ let Predicates = [IsISA3_1, HasFPU] in {
20532054

20542055
//---------------------------- Anonymous Patterns ----------------------------//
20552056
let Predicates = [IsISA3_1] in {
2057+
// Exploit vsrq instruction to optimize VSR(VSRO (input, vsro_byte_shift), vsr_bit_shift)
2058+
// to VSRQ(input, vsrq_bit_shift)
2059+
def : Pat<(VSRVSRO v4i32:$vA, v4i32:$vB), (VSRQ $vA, $vB)>;
20562060
// Exploit the vector multiply high instructions using intrinsics.
20572061
def : Pat<(v4i32 (int_ppc_altivec_vmulhsw v4i32:$vA, v4i32:$vB)),
20582062
(v4i32 (VMULHSW $vA, $vB))>;

0 commit comments

Comments
 (0)