[GlobalIsel] Combine zext of trunc (episode II) #108305

tschuett · 2024-09-11T23:39:36Z

The One with the Sonogram at the End

replace zext(trunc(x)) with x

SrcSize == DstSize: Src & mask

Credits:
https://reviews.llvm.org/D96031

Test: AMDGPU/GlobalISel/combine-zext-trunc.mir

llvmbot · 2024-09-11T23:40:10Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Thorsten Schütt (tschuett)

Changes

The One with the Sonogram at the End

Either replace zext(trunc(x)) with x

or

If we're actually extending zero bits, then if
SrcSize < DstSize: zext(a & mask)
SrcSize == DstSize: a & mask
SrcSize > DstSize: trunc(a) & mask

Credits: https://reviews.llvm.org/D96031
InstCombinerImpl::visitZExt
LegalizationArtifactCombiner::tryCombineZExt

Test: AMDGPU/GlobalISel/combine-zext-trunc.mir

Patch is 564.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108305.diff

68 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+4-3)
(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+12-11)
(modified) llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp (+3-1)
(modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (-14)
(modified) llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp (+91)
(modified) llvm/lib/Target/AMDGPU/AMDGPUCombine.td (+1-1)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll (+8-10)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll (+16-18)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir (+9-7)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-with-flags.mir (+21-12)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/inline-memset.mir (+25-22)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-extending-loads.mir (+3-3)
(modified) llvm/test/CodeGen/AArch64/aarch64-mops.ll (+2-4)
(modified) llvm/test/CodeGen/AArch64/aarch64-smull.ll (+4-4)
(modified) llvm/test/CodeGen/AArch64/addsub_ext.ll (+4-44)
(modified) llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll (+5-6)
(modified) llvm/test/CodeGen/AArch64/vecreduce-add.ll (+69-107)
(modified) llvm/test/CodeGen/AArch64/zext.ll (+100-97)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/addo.ll (+39)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/addsubu64.ll (+8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll (+10-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/br-constant-invalid-sgpr-copy.ll (+4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shl-from-extend-narrow.postlegal.mir (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-shl-from-extend-narrow.prelegal.mir (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-zext-trunc.mir (+92-10)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll (+10-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+468-324)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+241-136)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll (+40-8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/is-safe-to-sink-bug.ll (+10-7)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+35-8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll (+39)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll (+36)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i32.ll (+6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i64.ll (+3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll (+57-16)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+362-217)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+9)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll (+303-108)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+51-36)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+545-465)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll (+17-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+81-60)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll (+314-119)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.128.ll (+48-37)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.96.ll (+46-37)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/subo.ll (+39)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll (+236)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll (+4-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+161-134)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll (+13-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll (+236)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/vni8-across-blocks.ll (+9)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll (+9-2)
(modified) llvm/test/CodeGen/AMDGPU/bfi_int.ll (+24)
(modified) llvm/test/CodeGen/AMDGPU/constrained-shift.ll (+12)
(modified) llvm/test/CodeGen/AMDGPU/fptrunc.ll (+64-23)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.wave32.ll (+33-13)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.inverse.ballot.i32.ll (+28-12)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.inverse.ballot.i64.ll (+9)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+13)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+13)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.pops.exiting.wave.id.ll (+35-18)
(modified) llvm/test/CodeGen/AMDGPU/pseudo-scalar-transcendental.ll (+127-56)
(modified) llvm/test/CodeGen/AMDGPU/scratch-pointer-sink.ll (+4)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 828532dcffb7d3..bf32dcf5f2c85a 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -387,9 +387,6 @@ class CombinerHelper {
   /// Transform anyext(trunc(x)) to x.
   bool matchCombineAnyExtTrunc(MachineInstr &MI, Register &Reg);
 
-  /// Transform zext(trunc(x)) to x.
-  bool matchCombineZextTrunc(MachineInstr &MI, Register &Reg);
-
   /// Transform trunc (shl x, K) to shl (trunc x), K
   ///    if K < VT.getScalarSizeInBits().
   ///
@@ -909,6 +906,10 @@ class CombinerHelper {
   bool matchCastOfBuildVector(const MachineInstr &CastMI,
                               const MachineInstr &BVMI, BuildFnTy &MatchInfo);
 
+  /// Transform zext of truncate to x or and(x, mask).
+  bool matchCombineZextTrunc(const MachineInstr &ZextMI,
+                             const MachineInstr &TruncMI, BuildFnTy &MatchInfo);
+
 private:
   /// Checks for legality of an indexed variant of \p LdSt.
   bool isIndexedLoadStoreLegal(GLoadStore &LdSt) const;
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index a595a51d7b01ff..587dbe20e94c35 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -758,15 +758,6 @@ def anyext_trunc_fold: GICombineRule <
   (apply [{ Helper.replaceSingleDefInstWithReg(*${root}, ${matchinfo}); }])
 >;
 
-// Fold (zext (trunc x)) -> x if the source type is same as the destination type
-// and truncated bits are known to be zero.
-def zext_trunc_fold: GICombineRule <
-  (defs root:$root, register_matchinfo:$matchinfo),
-  (match (wip_match_opcode G_ZEXT):$root,
-         [{ return Helper.matchCombineZextTrunc(*${root}, ${matchinfo}); }]),
-  (apply [{ Helper.replaceSingleDefInstWithReg(*${root}, ${matchinfo}); }])
->;
-
 def not_cmp_fold_matchinfo : GIDefMatchData<"SmallVector<Register, 4>">;
 def not_cmp_fold : GICombineRule<
   (defs root:$d, not_cmp_fold_matchinfo:$info),
@@ -1894,6 +1885,15 @@ class integer_of_opcode<Instruction castOpcode> : GICombineRule <
 
 def integer_of_truncate : integer_of_opcode<G_TRUNC>;
 
+/// Transform zext of truncate to x or and(x, mask).
+def zext_of_truncate : GICombineRule <
+  (defs root:$root, build_fn_matchinfo:$matchinfo),
+  (match (G_TRUNC $trunc, $src):$TruncMI,
+         (G_ZEXT $root, $trunc):$ZextMI,
+         [{ return Helper.matchCombineZextTrunc(*${ZextMI}, *${TruncMI}, ${matchinfo}); }]),
+  (apply [{ Helper.applyBuildFn(*${ZextMI}, ${matchinfo}); }])>;
+
+
 def cast_combines: GICombineGroup<[
   truncate_of_zext,
   truncate_of_sext,
@@ -1915,7 +1915,8 @@ def cast_combines: GICombineGroup<[
   narrow_binop_and,
   narrow_binop_or,
   narrow_binop_xor,
-  integer_of_truncate
+  integer_of_truncate,
+  zext_of_truncate
 ]>;
 
 
@@ -1951,7 +1952,7 @@ def const_combines : GICombineGroup<[constant_fold_fp_ops, const_ptradd_to_i2p,
 
 def known_bits_simplifications : GICombineGroup<[
   redundant_and, redundant_sext_inreg, redundant_or, urem_pow2_to_mask,
-  zext_trunc_fold, icmp_to_true_false_known_bits, icmp_to_lhs_known_bits,
+  icmp_to_true_false_known_bits, icmp_to_lhs_known_bits,
   sext_inreg_to_zext_inreg]>;
 
 def width_reduction_combines : GICombineGroup<[reduce_shl_of_extend,
diff --git a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
index 547529bbe699ab..5addf93599085a 100644
--- a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
@@ -333,8 +333,10 @@ MachineInstrBuilder CSEMIRBuilder::buildConstant(const DstOp &Res,
 
   // For vectors, CSE the element only for now.
   LLT Ty = Res.getLLTTy(*getMRI());
-  if (Ty.isVector())
+  if (Ty.isFixedVector())
     return buildSplatBuildVector(Res, buildConstant(Ty.getElementType(), Val));
+  if (Ty.isScalableVector())
+    return buildSplatVector(Res, buildConstant(Ty.getElementType(), Val));
 
   FoldingSetNodeID ID;
   GISelInstProfileBuilder ProfBuilder(ID, *getMRI());
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index df9c12bc9c97bd..14d4e413456403 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -2524,20 +2524,6 @@ bool CombinerHelper::matchCombineAnyExtTrunc(MachineInstr &MI, Register &Reg) {
                   m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))));
 }
 
-bool CombinerHelper::matchCombineZextTrunc(MachineInstr &MI, Register &Reg) {
-  assert(MI.getOpcode() == TargetOpcode::G_ZEXT && "Expected a G_ZEXT");
-  Register DstReg = MI.getOperand(0).getReg();
-  Register SrcReg = MI.getOperand(1).getReg();
-  LLT DstTy = MRI.getType(DstReg);
-  if (mi_match(SrcReg, MRI,
-               m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))))) {
-    unsigned DstSize = DstTy.getScalarSizeInBits();
-    unsigned SrcSize = MRI.getType(SrcReg).getScalarSizeInBits();
-    return KB->getKnownBits(Reg).countMinLeadingZeros() >= DstSize - SrcSize;
-  }
-  return false;
-}
-
 static LLT getMidVTForTruncRightShiftCombine(LLT ShiftTy, LLT TruncTy) {
   const unsigned ShiftSize = ShiftTy.getScalarSizeInBits();
   const unsigned TruncSize = TruncTy.getScalarSizeInBits();
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
index 30557e6a2304e6..2171f2f6feb7eb 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp
@@ -359,3 +359,94 @@ bool CombinerHelper::matchCastOfInteger(const MachineInstr &CastMI,
     return false;
   }
 }
+
+bool CombinerHelper::matchCombineZextTrunc(const MachineInstr &ZextMI,
+                                           const MachineInstr &TruncMI,
+                                           BuildFnTy &MatchInfo) {
+  const GZext *Zext = cast<GZext>(&ZextMI);
+  const GTrunc *Trunc = cast<GTrunc>(&TruncMI);
+
+  Register Dst = Zext->getReg(0);
+  Register Mid = Zext->getSrcReg();
+  Register Src = Trunc->getSrcReg();
+
+  LLT DstTy = MRI.getType(Dst);
+  LLT SrcTy = MRI.getType(Src);
+
+  if (!MRI.hasOneNonDBGUse(Mid))
+    return false;
+
+  unsigned DstSize = DstTy.getScalarSizeInBits();
+  unsigned MidSize = MRI.getType(Mid).getScalarSizeInBits();
+  unsigned SrcSize = SrcTy.getScalarSizeInBits();
+
+  // Are the truncated bits known to be zero?
+  if (DstTy == SrcTy &&
+      (KB->getKnownBits(Src).countMinLeadingZeros() >= DstSize - MidSize)) {
+    MatchInfo = [=](MachineIRBuilder &B) { B.buildCopy(Dst, Src); };
+    return true;
+  }
+
+  // If the sizes are just right we can convert this into a logical
+  // 'and', which will be much cheaper than the pair of casts.
+
+  // If we're actually extending zero bits, then if
+  // SrcSize <  DstSize: zext(Src & mask)
+  // SrcSize == DstSize: Src & mask
+  // SrcSize  > DstSize: trunc(Src) & mask
+
+  if (DstSize == SrcSize) {
+    // Src & mask.
+
+    if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {DstTy}}) ||
+        !isConstantLegalOrBeforeLegalizer(DstTy))
+      return false;
+
+    // build mask.
+    APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
+
+    MatchInfo = [=](MachineIRBuilder &B) {
+      auto Mask = B.buildConstant(DstTy, AndValue);
+      B.buildAnd(Dst, Src, Mask);
+    };
+    return true;
+  }
+
+  if (SrcSize < DstSize) {
+    // zext(Src & mask).
+
+    if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {SrcTy}}) ||
+        !isConstantLegalOrBeforeLegalizer(SrcTy) ||
+        !isLegalOrBeforeLegalizer({TargetOpcode::G_ZEXT, {DstTy, SrcTy}}))
+      return false;
+
+    APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
+
+    MatchInfo = [=](MachineIRBuilder &B) {
+      auto Mask = B.buildConstant(SrcTy, AndValue);
+      auto And = B.buildAnd(SrcTy, Src, Mask);
+      B.buildZExt(Dst, And);
+    };
+    return true;
+  }
+
+  if (SrcSize > DstSize) {
+    // trunc(Src) & mask.
+
+    if (!isLegalOrBeforeLegalizer({TargetOpcode::G_AND, {DstTy}}) ||
+        !isConstantLegalOrBeforeLegalizer(DstTy) ||
+        !isLegalOrBeforeLegalizer({TargetOpcode::G_TRUNC, {DstTy, SrcTy}}))
+      return false;
+
+    APInt AndValue(APInt::getLowBitsSet(DstSize, MidSize));
+
+    MatchInfo = [=](MachineIRBuilder &B) {
+      auto Mask = B.buildConstant(DstTy, AndValue);
+      auto Trunc = B.buildTrunc(DstTy, Src);
+      B.buildAnd(Dst, Trunc, Mask);
+    };
+    return true;
+  }
+
+  return false;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index b2a3f9392157d1..25db0e678f49ce 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -168,6 +168,6 @@ def AMDGPUPostLegalizerCombiner: GICombiner<
 def AMDGPURegBankCombiner : GICombiner<
   "AMDGPURegBankCombinerImpl",
   [unmerge_merge, unmerge_cst, unmerge_undef,
-   zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
+   int_minmax_to_med3, ptr_add_immed_chain,
    fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {
 }
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index de3f323891a36a..ddcc31d23b56d2 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -1938,14 +1938,14 @@ define i8 @atomicrmw_add_i8(ptr %ptr, i8 %rhs) {
 define i8 @atomicrmw_xchg_i8(ptr %ptr, i8 %rhs) {
 ; CHECK-NOLSE-O1-LABEL: atomicrmw_xchg_i8:
 ; CHECK-NOLSE-O1:       ; %bb.0:
-; CHECK-NOLSE-O1-NEXT:    ; kill: def $w1 killed $w1 def $x1
+; CHECK-NOLSE-O1-NEXT:    mov x8, x0
 ; CHECK-NOLSE-O1-NEXT:  LBB28_1: ; %atomicrmw.start
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NOLSE-O1-NEXT:    ldxrb w8, [x0]
-; CHECK-NOLSE-O1-NEXT:    stxrb w9, w1, [x0]
+; CHECK-NOLSE-O1-NEXT:    ldxrb w0, [x8]
+; CHECK-NOLSE-O1-NEXT:    stxrb w9, w1, [x8]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w9, LBB28_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
-; CHECK-NOLSE-O1-NEXT:    mov w0, w8
+; CHECK-NOLSE-O1-NEXT:    ; kill: def $w0 killed $w0 killed $x0
 ; CHECK-NOLSE-O1-NEXT:    ret
 ;
 ; CHECK-OUTLINE-O1-LABEL: atomicrmw_xchg_i8:
@@ -2993,14 +2993,14 @@ define i16 @atomicrmw_add_i16(ptr %ptr, i16 %rhs) {
 define i16 @atomicrmw_xchg_i16(ptr %ptr, i16 %rhs) {
 ; CHECK-NOLSE-O1-LABEL: atomicrmw_xchg_i16:
 ; CHECK-NOLSE-O1:       ; %bb.0:
-; CHECK-NOLSE-O1-NEXT:    ; kill: def $w1 killed $w1 def $x1
+; CHECK-NOLSE-O1-NEXT:    mov x8, x0
 ; CHECK-NOLSE-O1-NEXT:  LBB38_1: ; %atomicrmw.start
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NOLSE-O1-NEXT:    ldxrh w8, [x0]
-; CHECK-NOLSE-O1-NEXT:    stxrh w9, w1, [x0]
+; CHECK-NOLSE-O1-NEXT:    ldxrh w0, [x8]
+; CHECK-NOLSE-O1-NEXT:    stxrh w9, w1, [x8]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w9, LBB38_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
-; CHECK-NOLSE-O1-NEXT:    mov w0, w8
+; CHECK-NOLSE-O1-NEXT:    ; kill: def $w0 killed $w0 killed $x0
 ; CHECK-NOLSE-O1-NEXT:    ret
 ;
 ; CHECK-OUTLINE-O1-LABEL: atomicrmw_xchg_i16:
@@ -5996,7 +5996,6 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
 ; CHECK-NOLSE-O1-LABEL: cmpxchg_i8:
 ; CHECK-NOLSE-O1:       ; %bb.0:
 ; CHECK-NOLSE-O1-NEXT:    mov x8, x0
-; CHECK-NOLSE-O1-NEXT:    ; kill: def $w2 killed $w2 def $x2
 ; CHECK-NOLSE-O1-NEXT:  LBB67_1: ; %cmpxchg.start
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldxrb w0, [x8]
@@ -6103,7 +6102,6 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
 ; CHECK-NOLSE-O1-LABEL: cmpxchg_i16:
 ; CHECK-NOLSE-O1:       ; %bb.0:
 ; CHECK-NOLSE-O1-NEXT:    mov x8, x0
-; CHECK-NOLSE-O1-NEXT:    ; kill: def $w2 killed $w2 def $x2
 ; CHECK-NOLSE-O1-NEXT:  LBB68_1: ; %cmpxchg.start
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldxrh w0, [x8]
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index c6819ff39ed33e..c02390c4df12dd 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -746,20 +746,20 @@ define i8 @atomicrmw_xchg_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w1 = KILL $w1, implicit-def $x1
+  ; CHECK-NEXT:   $x8 = ORRXrs $xzr, $x0, 0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
-  ; CHECK-NEXT:   liveins: $x0, $x1
+  ; CHECK-NEXT:   liveins: $w1, $x8
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
-  ; CHECK-NEXT:   early-clobber renamable $w9 = STXRB renamable $w1, renamable $x0, pcsections !0 :: (volatile store (s8) into %ir.ptr)
+  ; CHECK-NEXT:   renamable $w0 = LDXRB renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s8) from %ir.ptr)
+  ; CHECK-NEXT:   early-clobber renamable $w9 = STXRB renamable $w1, renamable $x8, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w9, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.atomicrmw.end:
-  ; CHECK-NEXT:   liveins: $x8
+  ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   $w0 = ORRWrs $wzr, $w8, 0, implicit killed $x8
+  ; CHECK-NEXT:   $w0 = KILL renamable $w0, implicit killed $x0
   ; CHECK-NEXT:   RET undef $lr, implicit $w0
   %res = atomicrmw xchg ptr %ptr, i8 %rhs monotonic, !pcsections !0
   ret i8 %res
@@ -999,20 +999,20 @@ define i16 @atomicrmw_xchg_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   successors: %bb.1(0x80000000)
   ; CHECK-NEXT:   liveins: $w1, $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w1 = KILL $w1, implicit-def $x1
+  ; CHECK-NEXT:   $x8 = ORRXrs $xzr, $x0, 0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.atomicrmw.start:
   ; CHECK-NEXT:   successors: %bb.1(0x7c000000), %bb.2(0x04000000)
-  ; CHECK-NEXT:   liveins: $x0, $x1
+  ; CHECK-NEXT:   liveins: $w1, $x8
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   renamable $w8 = LDXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
-  ; CHECK-NEXT:   early-clobber renamable $w9 = STXRH renamable $w1, renamable $x0, pcsections !0 :: (volatile store (s16) into %ir.ptr)
+  ; CHECK-NEXT:   renamable $w0 = LDXRH renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s16) from %ir.ptr)
+  ; CHECK-NEXT:   early-clobber renamable $w9 = STXRH renamable $w1, renamable $x8, pcsections !0 :: (volatile store (s16) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w9, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.atomicrmw.end:
-  ; CHECK-NEXT:   liveins: $x8
+  ; CHECK-NEXT:   liveins: $x0
   ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   $w0 = ORRWrs $wzr, $w8, 0, implicit killed $x8
+  ; CHECK-NEXT:   $w0 = KILL renamable $w0, implicit killed $x0
   ; CHECK-NEXT:   RET undef $lr, implicit $w0
   %res = atomicrmw xchg ptr %ptr, i16 %rhs monotonic, !pcsections !0
   ret i16 %res
@@ -1229,11 +1229,10 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   $x8 = ORRXrs $xzr, $x0, 0
-  ; CHECK-NEXT:   renamable $w2 = KILL $w2, implicit-def $x2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.4(0x04000000)
-  ; CHECK-NEXT:   liveins: $w1, $x2, $x8
+  ; CHECK-NEXT:   liveins: $w1, $w2, $x8
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   renamable $w0 = LDXRB renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w9 = ANDWri renamable $w0, 7, pcsections !0
@@ -1242,7 +1241,7 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cmpxchg.trystore:
   ; CHECK-NEXT:   successors: %bb.3(0x04000000), %bb.1(0x7c000000)
-  ; CHECK-NEXT:   liveins: $w1, $x0, $x2, $x8
+  ; CHECK-NEXT:   liveins: $w1, $w2, $x0, $x8
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   early-clobber renamable $w9 = STXRB renamable $w2, renamable $x8, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w9, %bb.1
@@ -1272,11 +1271,10 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
   ; CHECK-NEXT:   liveins: $w1, $w2, $x0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   $x8 = ORRXrs $xzr, $x0, 0
-  ; CHECK-NEXT:   renamable $w2 = KILL $w2, implicit-def $x2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.1.cmpxchg.start:
   ; CHECK-NEXT:   successors: %bb.2(0x7c000000), %bb.4(0x04000000)
-  ; CHECK-NEXT:   liveins: $w1, $x2, $x8
+  ; CHECK-NEXT:   liveins: $w1, $w2, $x8
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   renamable $w0 = LDXRH renamable $x8, implicit-def $x0, pcsections !0 :: (volatile load (s16) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w9 = ANDWri renamable $w0, 15, pcsections !0
@@ -1285,7 +1283,7 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT: bb.2.cmpxchg.trystore:
   ; CHECK-NEXT:   successors: %bb.3(0x04000000), %bb.1(0x7c000000)
-  ; CHECK-NEXT:   liveins: $w1, $x0, $x2, $x8
+  ; CHECK-NEXT:   liveins: $w1, $w2, $x0, $x8
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   early-clobber renamable $w9 = STXRH renamable $w2, renamable $x8, pcsections !0 :: (volatile store (s16) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w9, %bb.1
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
index c98dcf6ccb7966..f29fa86123c8c4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-extract-vec-elt.mir
@@ -49,8 +49,8 @@ body:             |
     ; CHECK: liveins: $x0, $x1
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: %arg1:_(s64) = COPY $x0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC %arg1(s64)
-    ; CHECK-NEXT: %zext:_(s64) = G_ZEXT [[TRUNC]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4294967295
+    ; CHECK-NEXT: %zext:_(s64) = G_AND %arg1, [[C]]
     ; CHECK-NEXT: $x0 = COPY %zext(s64)
     ; CHECK-NEXT: RET_ReallyLR implicit $x0
     %arg1:_(s64) = COPY $x0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
index 86fa12aa064acb..3e98a5e8e88009 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
@@ -361,10 +361,11 @@ body:             |
     ; CHECK: liveins: $x0, $x1, $x2
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
-    ; CHECK-NEXT: %c:_(s1) = G_TRUNC [[COPY]](s64)
     ; CHECK-NEXT: %one:_(s8) = G_CONSTANT i8 101
-    ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s8) = G_ZEXT %c(s1)
-    ; CHECK-NEXT: %sel:_(s8) = G_ADD [[ZEXT]], %one
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 1
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s8) = G_AND [[TRUNC]], [[C]]
+    ; CHECK-NEXT: %sel:_(s8) = G_ADD [[AND]], %one
     ; CHECK-NEXT: %ext:_(s32) = G_ANYEXT %sel(s8)
     ; CHECK-NEXT: $w0 = COPY %ext(s32)
     %0:_(s64) = COPY $x0
@@ -417,10 +418,11 @@ body:             |
     ; CHECK: liveins: $x0, $x1, $x2
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
-    ; CHECK-NEXT: %c:_(s1) = G_TRUNC [[COPY]](s64)
-    ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s8) = G_ZEXT %c(s1)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 6
-    ; CHECK-NEXT: %sel:_(s8) = G_SHL [[ZEXT]], [[C]](s8)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s8) = G_CONSTANT i8 1
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s8) = G_AND [[TRUNC]], [[C]]
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s8) = G_CONSTANT i8 6
+    ; CHECK-NEXT: %sel:_(s8) = G_SHL [[AND]...
[truncated]

tschuett · 2024-09-11T23:40:23Z

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp

  // For vectors, CSE the element only for now.
  LLT Ty = Res.getLLTTy(*getMRI());
-  if (Ty.isVector())
+  if (Ty.isFixedVector())


Crash in combine-with-flags.mir. Blame me.

jayfoad · 2024-09-12T08:48:05Z

llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp

It seems a bit arbitrary that you only check for the masked off bits being zero in the SrcSize == DstSize case, since in all three cases the AND could be avoided if they're known to be zero.

As an alternative, why not remove this code and leave it to a later AND combine to remove the AND if it can prove it is redundant?

Firstly, this was ported from the original combine. I would assume that it is more powerful than the combines below.

Maybe I misunderstand, but for the and to be redundant either the zext or trunc must be a no-op.

I believe the mask is never zero.

For the "SrcSize == DstSize" case, the mask is:
APInt AndValue(APInt::getLowBitsSet(SrcSize, MidSize));
I don't believe that it could be zero.

Maybe I misunderstand, but for the and to be redundant either the zext or trunc must be a no-op.

zext and trunc are never no-ops since they always change the bit width.

I believe the mask is never zero.

I'm talking about the mask being zero, I'm talking about being able to prove that the masked-off bits of the other value are zero. I.e. (x AND mask) can be combined to x if you can prove (with KnownBits) that every zero bit in mask is also zero in x. That's what the code above is doing for the DstSize == SrcSize case, but there is almost certainly a separate combine that will run afterwards and do it anyway, and do it for all three cases.

For modularity, I would prefer to leave it to the other combine.

As I said the first combine comes from episode I of this combine. The and comes from InstCombine:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Line 1212 in d5d6b44

// If we're actually extending zero bits, then if

tschuett · 2024-09-12T12:43:42Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Here is the test in episode I of this combine.

tschuett · 2024-09-12T19:26:06Z

We have a redundant and combine, but it queries known bits on both sides. For modularity, we could keep in this combine only and s and leave it to the redundant and check, but the first combine in this PR only calls known bits on one register, which ought to be cheaper.

llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Line 3211 in dbc90b5

bool CombinerHelper::matchRedundantAnd(MachineInstr &MI,

jayfoad · 2024-09-13T09:25:21Z

but the first combine in this PR only calls known bits on one register, which ought to be cheaper.

Maybe... but calling computeKnownBits on a G_CONSTANT should be pretty cheap anyway.

tschuett · 2024-09-13T09:40:51Z

Then there should be two versions of the redundant_and combine:

def redundant_and: GICombineRule <
  (defs root:$root, register_matchinfo:$matchinfo),
  (match (wip_match_opcode G_AND):$root,
         [{ return Helper.matchRedundantAnd(*${root}, ${matchinfo}); }]),
  (apply [{ Helper.replaceSingleDefInstWithReg(*${root}, ${matchinfo}); }])
>;

and

def redundant_and_const: GICombineRule <
  (defs root:$root, register_matchinfo:$matchinfo),
  (match (G_CONSTANT $const, $imm),
         (G_AND $root, $x, $const):$And,
         [{ return Helper.matchRedundantAndWithConstant(*${And}, ${matchinfo}); }]),
  (apply [{ Helper.replaceSingleDefInstWithReg(*${And}, ${matchinfo}); }])
>;

tschuett · 2024-09-25T15:49:56Z

Can I gather some opinions on the result of the combine:

profitable
too many regressions
let's play with the and combines and see which subset is profitable.

tschuett · 2024-10-01T14:51:37Z

Ping.

tschuett · 2024-10-08T03:44:39Z

Ping.

arsenm

I think we should do this, the DAG already does this and I don't see a reason to reinvent the logic here. We can just port the logic over.

arsenm · 2024-10-10T14:51:49Z

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp

This API is a mess. I would expect G_SPLAT_VECTOR to just handle fixed vectors

G_BUILD_VECTOR is for fixed-length vectors. G_SPLAT_VECTOR is for scalable vectors. It takes a register and implicitly broadcasts it over the scalable vector.

Yes, but I don't understand why it would be that way (or why this would be an implementation detail users of the MachineIRBuilder would need to concern themselves with)

The underlying issue is that we never teached buildConstant about scalable vectors/types.

arsenm · 2024-10-10T15:23:27Z

llvm/test/CodeGen/AArch64/addsub_ext.ll

This lost all the checks (this is a really bad update_llc_test_checks bug when this happens)

llvm/test/CodeGen/AArch64/arm64-ldxr-stxr.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/addo.ll

tschuett · 2024-10-10T18:05:56Z

The DAG has the known bits part:
https://github.com/llvm/llvm-project/blob/4ddc756bccb34f3d07e30c9ca96bba32cb0cf4f9/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14011

I am not sure that it has the and combines.

Thanks for taking a look!

arsenm · 2024-10-10T19:44:36Z

I am not sure that it has the and combines.

I haven't found it yet either.

tschuett · 2024-10-10T19:49:45Z

I am not sure that it has the and combines.

I haven't found it yet either.

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Line 1237 in a4916d2

// If this is a TRUNC followed by a ZEXT then we are dealing with integral

This is the reason why I am undecided whether this PR is in this version/configuration beneficial or not. And as the author I have zero voting rights.

arsenm · 2024-10-10T20:23:09Z

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Line 1237 in a4916d2

// If this is a TRUNC followed by a ZEXT then we are dealing with integral

InstCombine is different. Generally should refer to what the DAG is doing for prior art.

This is the reason why I am undecided whether this PR is in this version/configuration beneficial or not. And as the author I have zero voting rights.

The zext(trunc) combine I think is straightforwardly desirable, as the DAG does it. I'm confused about the and combine; was that changed in a previous revision?

tschuett · 2024-10-10T20:27:02Z

The prevision episode had the known bits part. This episode has some cleanups and the and combines. At least for

SrcSize == DstSize: a & mask

should be beneficial.

llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll

The One with the Sonogram at the End Either replace zext(trunc(x)) with x or If we're actually extending zero bits, then if SrcSize < DstSize: zext(a & mask) SrcSize == DstSize: a & mask SrcSize > DstSize: trunc(a) & mask Credits: https://reviews.llvm.org/D96031 InstCombinerImpl::visitZExt LegalizationArtifactCombiner::tryCombineZExt Test: AMDGPU/GlobalISel/combine-zext-trunc.mir

llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp

tschuett · 2024-10-12T16:10:23Z

I deactivated two combines. The noise moved from AArch64 to AMDGPU.

tschuett · 2024-10-13T08:34:51Z

The first combine used Helper.replaceSingleDefInstWithReg and ran on the AMDGPURegBankCombiner. I prefer the lambda matchinfo, which cannot run after regbank select. Now, I enabled both.

arsenm · 2024-10-20T22:43:28Z

The first combine used Helper.replaceSingleDefInstWithReg and ran on the AMDGPURegBankCombiner. I prefer the lambda matchinfo, which cannot run after regbank select. Now, I enabled both.

I'm confused. Is there now the zext (trunc) -> and fold implemented somewhere else?

tschuett · 2024-10-21T05:00:15Z

llvm-project/llvm/include/llvm/Target/GlobalISel/Combine.td

Line 761 in 173907b

    
           // Fold (zext (trunc x)) -> x if the source type is same as the destination type

tschuett · 2024-10-21T08:34:58Z

I closed this PR because of the regressions.

arsenm · 2024-10-21T17:18:36Z

I closed this PR because of the regressions.

Doesn't make sense to me. We still need this at some point. You can just add it and enable later when they are avoided?

tschuett · 2024-10-21T17:23:27Z

We have an existing combine zext(trunc) as noted above. With this PR I wanted to do cleanups, i.e., improve the pattern.

And investigate, if we can add an and combine for zext(trunc), but the regression are not acceptable.

llvmbot added backend:AArch64 backend:AMDGPU llvm:globalisel labels Sep 11, 2024

tschuett commented Sep 11, 2024

View reviewed changes

jayfoad reviewed Sep 12, 2024

View reviewed changes

tschuett commented Sep 12, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated

Copy link

Author

tschuett Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the test in episode I of this combine.

arsenm reviewed Oct 10, 2024

View reviewed changes

arsenm reviewed Oct 11, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll Outdated Show resolved Hide resolved

tschuett commented Oct 11, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll Outdated Show resolved Hide resolved

Thorsten Schütt added 4 commits October 12, 2024 09:56

fix tests

256ef01

fix test

d201def

update combine set

6bf97b2

tschuett commented Oct 12, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelperCasts.cpp Outdated Show resolved Hide resolved

Thorsten Schütt added 2 commits October 13, 2024 07:49

remove and combines

e1ee164

next try

c15b5b0

[GlobalIsel] Combine zext of trunc (episode II) #108305

[GlobalIsel] Combine zext of trunc (episode II) #108305

Uh oh!

Conversation

tschuett commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tschuett commented Sep 12, 2024

Uh oh!

jayfoad commented Sep 13, 2024

Uh oh!

tschuett commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tschuett commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tschuett commented Oct 1, 2024

Uh oh!

tschuett commented Oct 8, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tschuett commented Oct 10, 2024

Uh oh!

arsenm commented Oct 10, 2024

Uh oh!

tschuett commented Oct 10, 2024

Uh oh!

arsenm commented Oct 10, 2024

Uh oh!

tschuett commented Oct 10, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tschuett commented Oct 12, 2024

Uh oh!

tschuett commented Oct 13, 2024

Uh oh!

arsenm commented Oct 20, 2024

Uh oh!

tschuett commented Oct 21, 2024

Uh oh!

tschuett commented Oct 21, 2024

Uh oh!

tschuett commented Sep 11, 2024 •

edited

Loading

llvmbot commented Sep 11, 2024 •

edited

Loading

tschuett commented Sep 13, 2024 •

edited

Loading

tschuett commented Sep 25, 2024 •

edited

Loading