Skip to content

Conversation

@Aethezz
Copy link
Contributor

@Aethezz Aethezz commented Aug 13, 2025

Fixes an issue where bits next to the sign bit were not constant-folded when squaring a sign- or zero-extended small integer. Added logic to detect when both operands of a multiplication are the same extended value, allowing InstCombine to mark bits above the maximum possible square as known zero. This enables correct folding of (x * x) & (1 << N) to 0 when N is out of range.

Proof: https://alive2.llvm.org/ce/z/YGou44

Fixes #152061

@Aethezz Aethezz requested a review from nikic as a code owner August 13, 2025 20:14
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Aug 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 13, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (Aethezz)

Changes

Fixes an issue where bits next to the sign bit were not constant-folded when squaring a sign- or zero-extended small integer. Added logic to detect when both operands of a multiplication are the same extended value, allowing InstCombine to mark bits above the maximum possible square as known zero. This enables correct folding of (x * x) & (1 << N) to 0 when N is out of range.

Fixes #152061


Full diff: https://github.com/llvm/llvm-project/pull/153484.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+43)
  • (modified) llvm/test/Analysis/ValueTracking/known-bits.ll (+33)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index af85ce4077ec8..7a973140f6075 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -423,6 +423,49 @@ static void computeKnownBitsMul(const Value *Op0, const Value *Op1, bool NSW,
     Known.makeNonNegative();
   else if (isKnownNegative && !Known.isNonNegative())
     Known.makeNegative();
+
+  // Additional logic: If both operands are the same sign- or zero-extended
+  // value from a small integer, and the multiplication is (sext x) * (sext x)
+  // or (zext x) * (zext x), then the result cannot set bits above the maximum
+  // possible square. This allows InstCombine and other passes to fold (x * x) &
+  // (1 << N) to 0 when N is out of range.
+  using namespace PatternMatch;
+  const Value *A = nullptr;
+  // Only handle the case where both operands are the same extension of the same
+  // value.
+  if ((match(Op0, m_SExt(m_Value(A))) && match(Op1, m_SExt(m_Specific(A)))) ||
+      (match(Op0, m_ZExt(m_Value(A))) && match(Op1, m_ZExt(m_Specific(A))))) {
+    Type *FromTy = A->getType();
+    Type *ToTy = Op0->getType();
+    if (FromTy->isIntegerTy() && ToTy->isIntegerTy() &&
+        FromTy->getScalarSizeInBits() < ToTy->getScalarSizeInBits()) {
+      unsigned FromBits = FromTy->getScalarSizeInBits();
+      unsigned ToBits = ToTy->getScalarSizeInBits();
+      // For both signed and unsigned, the maximum absolute value is max(|min|,
+      // |max|)
+      APInt minVal(FromBits, 0), maxVal(FromBits, 0);
+      bool isSigned = isa<SExtInst>(Op0);
+      if (isSigned) {
+        minVal = APInt::getSignedMinValue(FromBits);
+        maxVal = APInt::getSignedMaxValue(FromBits);
+      } else {
+        minVal = APInt::getMinValue(FromBits);
+        maxVal = APInt::getMaxValue(FromBits);
+      }
+      APInt absMin = minVal.abs();
+      APInt absMax = maxVal.abs();
+      APInt maxAbs = absMin.ugt(absMax) ? absMin : absMax;
+      APInt maxSquare = maxAbs.zext(ToBits);
+      maxSquare = maxSquare * maxSquare;
+      // All bits above the highest set bit in maxSquare are known zero.
+      unsigned MaxBit = maxSquare.isZero() ? 0 : maxSquare.logBase2();
+      if (MaxBit + 1 < ToBits) {
+        APInt KnownZeroMask =
+            APInt::getHighBitsSet(ToBits, ToBits - (MaxBit + 1));
+        Known.Zero |= KnownZeroMask;
+      }
+    }
+  }
 }
 
 void llvm::computeKnownBitsFromRangeMetadata(const MDNode &Ranges,
diff --git a/llvm/test/Analysis/ValueTracking/known-bits.ll b/llvm/test/Analysis/ValueTracking/known-bits.ll
index 5b71402a96f0d..d9f119bd0d146 100644
--- a/llvm/test/Analysis/ValueTracking/known-bits.ll
+++ b/llvm/test/Analysis/ValueTracking/known-bits.ll
@@ -49,3 +49,36 @@ define i1 @vec_reverse_known_bits_demanded_fail(<4 x i8> %xx) {
   %r = icmp slt i8 %ele, 0
   ret i1 %r
 }
+
+; Test known bits for (sext i8 x) * (sext i8 x)
+; RUN: opt -passes=instcombine < %s -S | FileCheck %s --check-prefix=SEXT_SQUARE
+
+define i1 @sext_square_bit31(i8 %x) {
+; SEXT_SQUARE-LABEL: @sext_square_bit31(
+; SEXT_SQUARE-NEXT:    ret i1 false
+  %sx = sext i8 %x to i32
+  %mul = mul nsw i32 %sx, %sx
+  %and = and i32 %mul, 2147483648 ; 1 << 31
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @sext_square_bit30(i8 %x) {
+; SEXT_SQUARE-LABEL: @sext_square_bit30(
+; SEXT_SQUARE-NEXT:    ret i1 false
+  %sx = sext i8 %x to i32
+  %mul = mul nsw i32 %sx, %sx
+  %and = and i32 %mul, 1073741824 ; 1 << 30
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @sext_square_bit14(i8 %x) {
+; SEXT_SQUARE-LABEL: @sext_square_bit14(
+; SEXT_SQUARE-NOT: ret i1 false
+  %sx = sext i8 %x to i32
+  %mul = mul nsw i32 %sx, %sx
+  %and = and i32 %mul, 16384 ; 1 << 14
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}

@Aethezz Aethezz changed the title [InstCombine][missed-optimizations] Fold out-of-range bits for squaring signed integers [InstCombine] Fold out-of-range bits for squaring signed integers Aug 13, 2025
// Only handle the case where both operands are the same extension of the same
// value.
if ((match(Op0, m_SExt(m_Value(A))) && match(Op1, m_SExt(m_Specific(A)))) ||
(match(Op0, m_ZExt(m_Value(A))) && match(Op1, m_ZExt(m_Specific(A))))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the zext handling here is useful. This will be handled by the generic code.

For the sext case, we know that the result is non-negative (due to self-multiply) and that we have a certain number of sign bits (due to multiply of sext), so together we know that the sign bits are actually zero bits.

I think the principled thing to do here would be, for self-multiplies, to call ComputeNumSignBits() and then set all those bits to zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, the zext is redundant. I’ve updated the code so that for self-multiplies using sext, we now call ComputeNumSignBits() to determine the number of sign bits and mark them as known zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after reviewing the previous commit, how should we call ComputeNumSignBits() and set the corresponding bits to zero? In this function, we only track known bits and don’t explicitly compute the product, so it’s unclear how to determine the exact number of sign bits.

I’ve made another commit that reverts to the previous approach using max/min value boundaries and removed the zext handling for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the same logic as ComputeNumSignBits:

unsigned OutValidBits =
(TyBits - SignBitsOp0 + 1) + (TyBits - SignBitsOp1 + 1);
return OutValidBits > TyBits ? 1 : TyBits - OutValidBits + 1;

Adjusted for the case where the sign bits are the same for both operands:

      unsigned OutValidBits = 2 * (TyBits - SignBits + 1);
      unsigned OutSignBits = OutValidBits > TyBits ? 1 : TyBits - OutValidBits + 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, i have added this into last commit. One question: currently my code uses match while other parts of this function use Op0 == Op1. Should we only handle the explicit self-multiply case (x * x), or also consider cases where both operands are sign-extensions of the same value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to handle sign extensions of the same value, as CSE will convert this into one sign extension used in both operands. So we should use Op0 == Op1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, i moved it into the selfmultiply handling instead which uses Op0 == Op1

@nikic nikic requested a review from dtcxzyw August 17, 2025 14:49
@github-actions
Copy link

github-actions bot commented Aug 19, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

}

; Test known bits for (sext i8 x) * (sext i8 x)
; RUN: opt -passes=instcombine < %s -S | FileCheck %s --check-prefix=SEXT_SQUARE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add an extra run line to this test. If this does not fold through -passes=instsimplify, then this should be tested inside llvm/test/Transforms/InstCombine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fold with instsimplify so I moved it into llvm/test/Transforms/InstCombine

@nikic
Copy link
Contributor

nikic commented Aug 19, 2025

@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Aug 19, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same for both cases. I think it would be cleaner if you just added an extra if (SelfMultiply) {} after the current KnownBits::mul() call.

Copy link
Contributor Author

@Aethezz Aethezz Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing, I found that adding the extra if (SelfMultiply) {} causes the update_test_checks.py run to produce unfolded IR. I think it has to do with the previous if (SelfMultiply) where SelfMultiply &= isGuaranteedNotToBeUndef(Op0, Q.AC, Q.CxtI, Q.DT, Depth + 1); returns false. This causes our logic to not run since SelfMultiply is now false.

Edit: I printed out SelfMultiply and it indeed becomes false after that line, however the IR is folded with -O2 still

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (SelfMultiply) where SelfMultiply &= isGuaranteedNotToBeUndef(Op0, Q.AC, Q.CxtI, Q.DT, Depth + 1); returns false.

Can you add noundef to the parameter %x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, adding noundef to %x resolved the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this one is incorrect: https://alive2.llvm.org/ce/z/y8D-Yg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I don't really get why this case is folding. We should have TyBits = 32, SignBits = 9, OutValidBits = 16, OutSignBits = 17. So I would not expect bit 1<<14 to be set to zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I didn't notice that this one is actually using CHECK-NOT, not CHECK. So everything is correct.

…tion to drop ternary and moved test to llvm/test/Transforms/InstCombine since wasn't folded with instsimplifiy
@Aethezz Aethezz force-pushed the Fix-#152061 branch 2 times, most recently from c85b6a4 to 98c0200 Compare August 20, 2025 18:19
@Aethezz Aethezz requested a review from nikic August 26, 2025 20:27
@Aethezz
Copy link
Contributor Author

Aethezz commented Sep 3, 2025

@nikic

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit ef1539c into llvm:main Sep 5, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Testing bit next to signbit does not constant fold when squaring signed integers

4 participants