-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[GlobalIsel] Combine zext of trunc (episode II) #108305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-aarch64 Author: Thorsten Schütt (tschuett) ChangesThe One with the Sonogram at the End Either replace zext(trunc(x)) with x or If we're actually extending zero bits, then if Credits: https://reviews.llvm.org/D96031 Test: AMDGPU/GlobalISel/combine-zext-trunc.mir Patch is 564.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108305.diff 68 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 828532dcffb7d3..bf32dcf5f2c85a 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -387,9 +387,6 @@ class CombinerHelper {
/// Transform anyext(trunc(x)) to x.
bool matchCombineAnyExtTrunc(MachineInstr &MI, Register &Reg);
- /// Transform zext(trunc(x)) to x.
- bool matchCombineZextTrunc(MachineInstr &MI, Register &Reg);
-
/// Transform trunc (shl x, K) to shl (trunc x), K
/// if K < VT.getScalarSizeInBits().
///
@@ -909,6 +906,10 @@ class CombinerHelper {
bool matchCastOfBuildVector(const MachineInstr &CastMI,
const MachineInstr &BVMI, BuildFnTy &MatchInfo);
+ /// Transform zext of truncate to x or and(x, mask).
+ bool matchCombineZextTrunc(const MachineInstr &ZextMI,
+ const MachineInstr &TruncMI, BuildFnTy &MatchInfo);
+
private:
/// Checks for legality of an indexed variant of \p LdSt.
bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index a595a51d7b01ff..587dbe20e94c35 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -758,15 +758,6 @@ def anyext_trunc_fold: GICombineRule <
(apply [{ Helper.replaceSingleDefInstWithReg(*${root}, ${matchinfo}); }])
>;
-// Fold (zext (trunc x)) -> x if the source type is same as the destination type
-// and truncated bits are known to be zero.
-def zext_trunc_fold: GICombineRule <
- (defs root:$root, register_matchinfo:$matchinfo),
- (match (wip_match_opcode G_ZEXT):$root,
- [{ return Helper.matchCombineZextTrunc(*${root}, ${matchinfo}); }]),
- (apply [{ Helper.replaceSingleDefInstWithReg(*${root}, ${matchinfo}); }])
->;
-
def not_cmp_fold_matchinfo : GIDefMatchData<"SmallVector<Register, 4>">;
def not_cmp_fold : GICombineRule<
(defs root:$d, not_cmp_fold_matchinfo:$info),
@@ -1894,6 +1885,15 @@ class integer_of_opcode<Instruction castOpcode> : GICombineRule <
def integer_of_truncate : integer_of_opcode<G_TRUNC>;
+/// Transform zext of truncate to x or and(x, mask).
+def zext_of_truncate : GICombineRule <
+ (defs root:$root, build_fn_matchinfo:$matchinfo),
+ (match (G_TRUNC $trunc, $src):$TruncMI,
+ (G_ZEXT $root, $trunc):$ZextMI,
+ [{ return Helper.matchCombineZextTrunc(*${ZextMI}, *${TruncMI}, ${matchinfo}); }]),
+ (apply [{ Helper.applyBuildFn(*${ZextMI}, ${matchinfo}); }])>;
+
+
def cast_combines: GICombineGroup<[
truncate_of_zext,
truncate_of_sext,
@@ -1915,7 +1915,8 @@ def cast_combines: GICombineGroup<[
narrow_binop_and,
narrow_binop_or,
narrow_binop_xor,
- integer_of_truncate
+ integer_of_truncate,
+ zext_of_truncate
]>;
@@ -1951,7 +1952,7 @@ def const_combines : GICombineGroup<[constant_fold_fp_ops, const_ptradd_to_i2p,
def known_bits_simplifications : GICombineGroup<[
redundant_and, redundant_sext_inreg, redundant_or, urem_pow2_to_mask,
- zext_trunc_fold, icmp_to_true_false_known_bits, icmp_to_lhs_known_bits,
+ icmp_to_true_false_known_bits, icmp_to_lhs_known_bits,
sext_inreg_to_zext_inreg]>;
def width_reduction_combines : GICombineGroup<[reduce_shl_of_extend,
diff --git a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
index 547529bbe699ab..5addf93599085a 100644
--- a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
@@ -333,8 +333,10 @@ MachineInstrBuilder CSEMIRBuilder::buildConstant(const DstOp &Res,
// For vectors, CSE the element only for now.
LLT Ty = Res.getLLTTy(*getMRI());
- if (Ty.isVector())
+ if (Ty.isFixedVector())
return buildSplatBuildVector(Res, buildConstant(Ty.getElementType(), Val));
+ if (Ty.isScalableVector())
+ return buildSplatVector(Res, buildConstant(Ty.getElementType(), Val));
FoldingSetNodeID ID;
GISelInstProfileBuilder ProfBuilder(ID, *getMRI());
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index df9c12bc9c97bd..14d4e413456403 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -2524,20 +2524,6 @@ bool CombinerHelper::matchCombineAnyExtTrunc(MachineInstr &MI, Register &Reg) {
m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))));
}
-bool CombinerHelper::matchCombineZextTrunc(MachineInstr &MI, Register &Reg) {
- assert(MI.getOpcode() == TargetOpcode::G_ZEXT && "Expected a G_ZEXT");
- Register DstReg = MI.getOperand(0).getReg();
- Register SrcReg = MI.getOperand(1).getReg();
- LLT DstTy = MRI.getType(DstReg);
- if (mi_match(SrcReg, MRI,
- m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))))) {
- unsigned DstSize = DstTy.getScalarSizeInBits();
- unsigned SrcSize = MRI.getType(SrcReg).getScalarSizeInBits();
- return KB->getKnownBits(Reg).countMinLeadingZeros() >= DstSize - SrcSize;
- }
- return false;
-}
-
static LLT getMidVTForTruncRightShiftCombine(LLT ShiftTy, LLT TruncTy) {
const unsigned ShiftSize = ShiftTy.getScalarSizeInBits();
const unsigned TruncSize = TruncTy.getScalarSizeInBits();
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 30557e6a2304e6..2171f2f6feb7eb 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -359,3 +359,94 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI,
return false;
}
}
+
+bool CombinerHelper::matchCombineZextTrunc(const MachineInstr &ZextMI,
+ const MachineInstr &TruncMI,
+ BuildFnTy &MatchInfo) {
+ const GZext *Zext = cast<GZext>(&ZextMI);
+ const GTrunc *Trunc = cast<GTrunc>(&TruncMI);
+
+ Register Dst = Zext->getReg(0);
+ Register Mid = Zext->getSrcReg();
+ Register Src = Trunc->getSrcReg();
+
+ LLT DstTy = MRI.getType(Dst);
+ LLT SrcTy = MRI.getType(Src);
+
+ if (!MRI.hasOneNonDBGUse(Mid))
+ return false;
+
+ unsigned DstSize = DstTy.getScalarSizeInBits();
+ unsigned MidSize = MRI.getType(Mid).getScalarSizeInBits();
+ unsigned SrcSize = SrcTy.getScalarSizeInBits();
+
+ // Are the truncated bits known to be zero?
+ if (DstTy == SrcTy &&
+ (KB->getKnownBits(Src).countMinLeadingZeros() >= DstSize - MidSize)) {
+ MatchInfo = [=](MachineIRBuilder &B) { B.buildCopy(Dst, Src); };
+ return true;
+ }
+
+ // If the sizes are just right we can convert this into a logical
+ // 'and', which will be much cheaper than the pair of casts.
+
+ // If we're actually extending zero bits, then if
+ // SrcSize < DstSize: zext(Src & mask)
+ // SrcSize == DstSize: Src & mask
+ // SrcSize > DstSize: trunc(Src) & mask
+
+ if (DstSize == SrcSize) {
+ // Src & mask.
+
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {DstTy}}) ||
+ !isConstantLegalOrBeforeLegalizer(DstTy))
+ return false;
+
+ // build mask.
+ APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
+
+ MatchInfo = [=](MachineIRBuilder &B) {
+ auto Mask = B.buildConstant(DstTy, AndValue);
+ B.buildAnd(Dst, Src, Mask);
+ };
+ return true;
+ }
+
+ if (SrcSize < DstSize) {
+ // zext(Src & mask).
+
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {SrcTy}}) ||
+ !isConstantLegalOrBeforeLegalizer(SrcTy) ||
+ !isLegalOrBeforeLegalizer({TargetOpcode::G_ZEXT, {DstTy, SrcTy}}))
+ return false;
+
+ APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
+
+ MatchInfo = [=](MachineIRBuilder &B) {
+ auto Mask = B.buildConstant(SrcTy, AndValue);
+ auto And = B.buildAnd(SrcTy, Src, Mask);
+ B.buildZExt(Dst, And);
+ };
+ return true;
+ }
+
+ if (SrcSize > DstSize) {
+ // trunc(Src) & mask.
+
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {DstTy}}) ||
+ !isConstantLegalOrBeforeLegalizer(DstTy) ||
+ !isLegalOrBeforeLegalizer({TargetOpcode::G_TRUNC, {DstTy, SrcTy}}))
+ return false;
+
+ APInt AndValue(APInt::getLowBitsSet(DstSize, MidSize));
+
+ MatchInfo = [=](MachineIRBuilder &B) {
+ auto Mask = B.buildConstant(DstTy, AndValue);
+ auto Trunc = B.buildTrunc(DstTy, Src);
+ B.buildAnd(Dst, Trunc, Mask);
+ };
+ return true;
+ }
+
+ return false;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index b2a3f9392157d1..25db0e678f49ce 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -168,6 +168,6 @@ def AMDGPUPostLegalizerCombiner: GICombiner<
def AMDGPURegBankCombiner : GICombiner<
"AMDGPURegBankCombinerImpl",
[unmerge_merge, unmerge_cst, unmerge_undef,
- zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
+ int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {
}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index de3f323891a36a..ddcc31d23b56d2 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -1938,14 +1938,14 @@ define i8 @atomicrmw_add_i8(ptr %ptr, i8 %rhs) {
define i8 @atomicrmw_xchg_i8(ptr %ptr, i8 %rhs) {
; CHECK-NOLSE-O1-LABEL: atomicrmw_xchg_i8:
; CHECK-NOLSE-O1: ; %bb.0:
-; CHECK-NOLSE-O1-NEXT: ; kill: def $w1 killed $w1 def $x1
+; CHECK-NOLSE-O1-NEXT: mov x8, x0
; CHECK-NOLSE-O1-NEXT: LBB28_1: ; %atomicrmw.start
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
-; CHECK-NOLSE-O1-NEXT: ldxrb w8, [x0]
-; CHECK-NOLSE-O1-NEXT: stxrb w9, w1, [x0]
+; CHECK-NOLSE-O1-NEXT: ldxrb w0, [x8]
+; CHECK-NOLSE-O1-NEXT: stxrb w9, w1, [x8]
; CHECK-NOLSE-O1-NEXT: cbnz w9, LBB28_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
-; CHECK-NOLSE-O1-NEXT: mov w0, w8
+; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0
; CHECK-NOLSE-O1-NEXT: ret
;
; CHECK-OUTLINE-O1-LABEL: atomicrmw_xchg_i8:
@@ -2993,14 +2993,14 @@ define i16 @atomicrmw_add_i16(ptr %ptr, i16 %rhs) {
define i16 @atomicrmw_xchg_i16(ptr %ptr, i16 %rhs) {
; CHECK-NOLSE-O1-LABEL: atomicrmw_xchg_i16:
; CHECK-NOLSE-O1: ; %bb.0:
-; CHECK-NOLSE-O1-NEXT: ; kill: def $w1 killed $w1 def $x1
+; CHECK-NOLSE-O1-NEXT: mov x8, x0
; CHECK-NOLSE-O1-NEXT: LBB38_1: ; %atomicrmw.start
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
-; CHECK-NOLSE-O1-NEXT: ldxrh w8, [x0]
-; CHECK-NOLSE-O1-NEXT: stxrh w9, w1, [x0]
+; CHECK-NOLSE-O1-NEXT: ldxrh w0, [x8]
+; CHECK-NOLSE-O1-NEXT: stxrh w9, w1, [x8]
; CHECK-NOLSE-O1-NEXT: cbnz w9, LBB38_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
-; CHECK-NOLSE-O1-NEXT: mov w0, w8
+; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0
; CHECK-NOLSE-O1-NEXT: ret
;
; CHECK-OUTLINE-O1-LABEL: atomicrmw_xchg_i16:
@@ -5996,7 +5996,6 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-NOLSE-O1-LABEL: cmpxchg_i8:
; CHECK-NOLSE-O1: ; %bb.0:
; CHECK-NOLSE-O1-NEXT: mov x8, x0
-; CHECK-NOLSE-O1-NEXT: ; kill: def $w2 killed $w2 def $x2
; CHECK-NOLSE-O1-NEXT: LBB67_1: ; %cmpxchg.start
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldxrb w0, [x8]
@@ -6103,7 +6102,6 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-NOLSE-O1-LABEL: cmpxchg_i16:
; CHECK-NOLSE-O1: ; %bb.0:
; CHECK-NOLSE-O1-NEXT: mov x8, x0
-; CHECK-NOLSE-O1-NEXT: ; kill: def $w2 killed $w2 def $x2
; CHECK-NOLSE-O1-NEXT: LBB68_1: ; %cmpxchg.start
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldxrh w0, [x8]
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index c6819ff39ed33e..c02390c4df12dd 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -746,20 +746,20 @@ define i8 @atomicrmw_xchg_i8(ptr %ptr, i8 %rhs) {
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $w1, $x0
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: renamable $w1 = KILL $w1, implicit-def $x1
+ ; CHECK-NEXT: $x8 = ORRXrs $xzr, $x0, 0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.atomicrmw.start:
; CHECK-NEXT: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
- ; CHECK-NEXT: liveins: $x0, $x1
+ ; CHECK-NEXT: liveins: $w1, $x8
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: renamable $w8 = LDXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
- ; CHECK-NEXT: early-clobber renamable $w9 = STXRB renamable $w1, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
+ ; CHECK-NEXT: renamable $w0 = LDXRB renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+ ; CHECK-NEXT: early-clobber renamable $w9 = STXRB renamable $w1, renamable $x8, pcsections !0 :: (volatile store (s8) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w9, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.atomicrmw.end:
- ; CHECK-NEXT: liveins: $x8
+ ; CHECK-NEXT: liveins: $x0
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: $w0 = ORRWrs $wzr, $w8, 0, implicit killed $x8
+ ; CHECK-NEXT: $w0 = KILL renamable $w0, implicit killed $x0
; CHECK-NEXT: RET undef $lr, implicit $w0
%res = atomicrmw xchg ptr %ptr, i8 %rhs monotonic, !pcsections !0
ret i8 %res
@@ -999,20 +999,20 @@ define i16 @atomicrmw_xchg_i16(ptr %ptr, i16 %rhs) {
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $w1, $x0
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: renamable $w1 = KILL $w1, implicit-def $x1
+ ; CHECK-NEXT: $x8 = ORRXrs $xzr, $x0, 0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.atomicrmw.start:
; CHECK-NEXT: successors: %bb.1(0x7c000000), %bb.2(0x04000000)
- ; CHECK-NEXT: liveins: $x0, $x1
+ ; CHECK-NEXT: liveins: $w1, $x8
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: renamable $w8 = LDXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
- ; CHECK-NEXT: early-clobber renamable $w9 = STXRH renamable $w1, renamable $x0, pcsections !0 :: (volatile store (s16) into %ir.ptr)
+ ; CHECK-NEXT: renamable $w0 = LDXRH renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s16) from %ir.ptr)
+ ; CHECK-NEXT: early-clobber renamable $w9 = STXRH renamable $w1, renamable $x8, pcsections !0 :: (volatile store (s16) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w9, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.atomicrmw.end:
- ; CHECK-NEXT: liveins: $x8
+ ; CHECK-NEXT: liveins: $x0
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: $w0 = ORRWrs $wzr, $w8, 0, implicit killed $x8
+ ; CHECK-NEXT: $w0 = KILL renamable $w0, implicit killed $x0
; CHECK-NEXT: RET undef $lr, implicit $w0
%res = atomicrmw xchg ptr %ptr, i16 %rhs monotonic, !pcsections !0
ret i16 %res
@@ -1229,11 +1229,10 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-NEXT: liveins: $w1, $w2, $x0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: $x8 = ORRXrs $xzr, $x0, 0
- ; CHECK-NEXT: renamable $w2 = KILL $w2, implicit-def $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.cmpxchg.start:
; CHECK-NEXT: successors: %bb.2(0x7c000000), %bb.4(0x04000000)
- ; CHECK-NEXT: liveins: $w1, $x2, $x8
+ ; CHECK-NEXT: liveins: $w1, $w2, $x8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $w0 = LDXRB renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s8) from %ir.ptr)
; CHECK-NEXT: renamable $w9 = ANDWri renamable $w0, 7, pcsections !0
@@ -1242,7 +1241,7 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.cmpxchg.trystore:
; CHECK-NEXT: successors: %bb.3(0x04000000), %bb.1(0x7c000000)
- ; CHECK-NEXT: liveins: $w1, $x0, $x2, $x8
+ ; CHECK-NEXT: liveins: $w1, $w2, $x0, $x8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: early-clobber renamable $w9 = STXRB renamable $w2, renamable $x8, pcsections !0 :: (volatile store (s8) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w9, %bb.1
@@ -1272,11 +1271,10 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-NEXT: liveins: $w1, $w2, $x0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: $x8 = ORRXrs $xzr, $x0, 0
- ; CHECK-NEXT: renamable $w2 = KILL $w2, implicit-def $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.cmpxchg.start:
; CHECK-NEXT: successors: %bb.2(0x7c000000), %bb.4(0x04000000)
- ; CHECK-NEXT: liveins: $w1, $x2, $x8
+ ; CHECK-NEXT: liveins: $w1, $w2, $x8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: renamable $w0 = LDXRH renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s16) from %ir.ptr)
; CHECK-NEXT: renamable $w9 = ANDWri renamable $w0, 15, pcsections !0
@@ -1285,7 +1283,7 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.cmpxchg.trystore:
; CHECK-NEXT: successors: %bb.3(0x04000000), %bb.1(0x7c000000)
- ; CHECK-NEXT: liveins: $w1, $x0, $x2, $x8
+ ; CHECK-NEXT: liveins: $w1, $w2, $x0, $x8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: early-clobber renamable $w9 = STXRH renamable $w2, renamable $x8, pcsections !0 :: (volatile store (s16) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w9, %bb.1
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
index c98dcf6ccb7966..f29fa86123c8c4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
@@ -49,8 +49,8 @@ body: |
; CHECK: liveins: $x0, $x1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %arg1:_(s64) = COPY $x0
- ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC %arg1(s64)
- ; CHECK-NEXT: %zext:_(s64) = G_ZEXT [[TRUNC]](s32)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4294967295
+ ; CHECK-NEXT: %zext:_(s64) = G_AND %arg1, [[C]]
; CHECK-NEXT: $x0 = COPY %zext(s64)
; CHECK-NEXT: RET_ReallyLR implicit $x0
%arg1:_(s64) = COPY $x0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
index 86fa12aa064acb..3e98a5e8e88009 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
@@ -361,10 +361,11 @@ body: |
; CHECK: liveins: $x0, $x1, $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
- ; CHECK-NEXT: %c:_(s1) = G_TRUNC [[COPY]](s64)
; CHECK-NEXT: %one:_(s8) = G_CONSTANT i8 101
- ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s8) = G_ZEXT %c(s1)
- ; CHECK-NEXT: %sel:_(s8) = G_ADD [[ZEXT]], %one
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 1
+ ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s8) = G_AND [[TRUNC]], [[C]]
+ ; CHECK-NEXT: %sel:_(s8) = G_ADD [[AND]], %one
; CHECK-NEXT: %ext:_(s32) = G_ANYEXT %sel(s8)
; CHECK-NEXT: $w0 = COPY %ext(s32)
%0:_(s64) = COPY $x0
@@ -417,10 +418,11 @@ body: |
; CHECK: liveins: $x0, $x1, $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
- ; CHECK-NEXT: %c:_(s1) = G_TRUNC [[COPY]](s64)
- ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s8) = G_ZEXT %c(s1)
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 6
- ; CHECK-NEXT: %sel:_(s8) = G_SHL [[ZEXT]], [[C]](s8)
+ ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 1
+ ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s8) = G_AND [[TRUNC]], [[C]]
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s8) = G_CONSTANT i8 6
+ ; CHECK-NEXT: %sel:_(s8) = G_SHL [[AND]...
[truncated]
|
| // For vectors, CSE the element only for now. | ||
| LLT Ty = Res.getLLTTy(*getMRI()); | ||
| if (Ty.isVector()) | ||
| if (Ty.isFixedVector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crash in combine-with-flags.mir. Blame me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit arbitrary that you only check for the masked off bits being zero in the SrcSize == DstSize case, since in all three cases the AND could be avoided if they're known to be zero.
As an alternative, why not remove this code and leave it to a later AND combine to remove the AND if it can prove it is redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, this was ported from the original combine. I would assume that it is more powerful than the combines below.
Maybe I misunderstand, but for the and to be redundant either the zext or trunc must be a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the mask is never zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "SrcSize == DstSize" case, the mask is:
APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
I don't believe that it could be zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I misunderstand, but for the and to be redundant either the zext or trunc must be a no-op.
zext and trunc are never no-ops since they always change the bit width.
I believe the mask is never zero.
I'm talking about the mask being zero, I'm talking about being able to prove that the masked-off bits of the other value are zero. I.e. (x AND mask) can be combined to x if you can prove (with KnownBits) that every zero bit in mask is also zero in x. That's what the code above is doing for the DstSize == SrcSize case, but there is almost certainly a separate combine that will run afterwards and do it anyway, and do it for all three cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For modularity, I would prefer to leave it to the other combine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said the first combine comes from episode I of this combine. The and comes from InstCombine:
| // If we're actually extending zero bits, then if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the test in episode I of this combine.
|
We have a redundant
|
Maybe... but calling computeKnownBits on a G_CONSTANT should be pretty cheap anyway. |
|
Then there should be two versions of the redundant_and combine: and |
|
Can I gather some opinions on the result of the combine:
|
|
Ping. |
1 similar comment
|
Ping. |
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this, the DAG already does this and I don't see a reason to reinvent the logic here. We can just port the logic over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is a mess. I would expect G_SPLAT_VECTOR to just handle fixed vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_BUILD_VECTOR is for fixed-length vectors. G_SPLAT_VECTOR is for scalable vectors. It takes a register and implicitly broadcasts it over the scalable vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't understand why it would be that way (or why this would be an implementation detail users of the MachineIRBuilder would need to concern themselves with)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying issue is that we never teached buildConstant about scalable vectors/types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lost all the checks (this is a really bad update_llc_test_checks bug when this happens)
|
The DAG has the known bits part: I am not sure that it has the Thanks for taking a look! |
I haven't found it yet either. |
This is the reason why I am undecided whether this PR is in this version/configuration beneficial or not. And as the author I have zero voting rights. |
InstCombine is different. Generally should refer to what the DAG is doing for prior art.
The zext(trunc) combine I think is straightforwardly desirable, as the DAG does it. I'm confused about the and combine; was that changed in a previous revision? |
|
The prevision episode had the known bits part. This episode has some cleanups and the SrcSize == DstSize: a & mask should be beneficial. |
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
Outdated
Show resolved
Hide resolved
The One with the Sonogram at the End Either replace zext(trunc(x)) with x or If we're actually extending zero bits, then if SrcSize < DstSize: zext(a & mask) SrcSize == DstSize: a & mask SrcSize > DstSize: trunc(a) & mask Credits: https://reviews.llvm.org/D96031 InstCombinerImpl::visitZExt LegalizationArtifactCombiner::tryCombineZExt Test: AMDGPU/GlobalISel/combine-zext-trunc.mir
|
I deactivated two combines. The noise moved from AArch64 to AMDGPU. |
|
The first combine used |
I'm confused. Is there now the zext (trunc) -> and fold implemented somewhere else? |
|
|
I closed this PR because of the regressions. |
Doesn't make sense to me. We still need this at some point. You can just add it and enable later when they are avoided? |
|
We have an existing combine zext(trunc) as noted above. With this PR I wanted to do cleanups, i.e., improve the pattern. And investigate, if we can add an |
The One with the Sonogram at the End
replace zext(trunc(x)) with x
SrcSize == DstSize: Src & mask
Credits:
https://reviews.llvm.org/D96031
Test: AMDGPU/GlobalISel/combine-zext-trunc.mir