[LAA] Be more careful when evaluating AddRecs at symbolic max BTC. #128061

fhahn · 2025-02-20T20:29:05Z

Evaluating AR at the symbolic max BTC may wrap and create an expression that is less than the start of the AddRec due to wrapping (for example consider MaxBTC = -2).

If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get incremented by EltSize before returning, so this effectively sets ScEnd to unsigned max. Note that LAA separately checks that accesses cannot not wrap (52ded67, #127543), so unsigned max represents an upper bound.

When there is a computable backedge-taken count, we are guaranteed to execute the number of iterations, and if any pointer would wrap it would be UB (or the access will never be executed, so cannot alias). It includes new tests from the previous discussion that show a case we wrap with a BTC, but it is UB due to the pointer after the object wrapping (in evaluate-at-backedge-taken-count-wrapping.ll)

Note that an earlier version of the patch was shared as #106530, but I accidentally deleted the branch and now I cannot figure out how to reopen that PR.

llvmbot · 2025-02-20T20:29:42Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Evaluating AR at the symbolic max BTC may wrap and create an expression that is less than the start of the AddRec due to wrapping (for example consider MaxBTC = -2).

If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get incremented by EltSize before returning, so this effectively sets ScEnd to unsigned max. Note that LAA separately checks that accesses cannot not wrap (52ded67, #127543), so unsigned max represents an upper bound.

When there is a computable backedge-taken count, we are guaranteed to execute the number of iterations, and if any pointer would wrap it would be UB (or the access will never be executed, so cannot alias). It includes new tests from the previous discussion that show a case we wrap with a BTC, but it is UB due to the pointer after the object wrapping (in evaluate-at-backedge-taken-count-wrapping.ll)

Note that an earlier version of the patch was shared as #106530, but I accidentally deleted the branch and now I cannot figure out how to reopen that PR.

Full diff: https://github.com/llvm/llvm-project/pull/128061.diff

5 Files Affected:

(modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+1-1)
(modified) llvm/lib/Analysis/Loads.cpp (+4-1)
(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+32-10)
(added) llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll (+92)
(modified) llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll (+2-4)

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index cb6f47e3a76be..91802cc4361ae 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -872,7 +872,7 @@ bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
 /// NoConflict = (P2.Start >= P1.End) || (P1.Start >= P2.End)
 std::pair<const SCEV *, const SCEV *> getStartAndEndForAccess(
     const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, const SCEV *MaxBECount,
-    ScalarEvolution *SE,
+    const SCEV *SymbolicMaxBECount, ScalarEvolution *SE,
     DenseMap<std::pair<const SCEV *, Type *>,
              std::pair<const SCEV *, const SCEV *>> *PointerBounds);
 
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index b461c41d29e84..5a8eedfa261d2 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -319,11 +319,14 @@ bool llvm::isDereferenceableAndAlignedInLoop(
   const SCEV *MaxBECount =
       Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
                  : SE.getConstantMaxBackedgeTakenCount(L);
+  const SCEV *SymbolicMaxBECount =
+      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
+                 : SE.getConstantMaxBackedgeTakenCount(L);
   if (isa<SCEVCouldNotCompute>(MaxBECount))
     return false;
 
   const auto &[AccessStart, AccessEnd] = getStartAndEndForAccess(
-      L, PtrScev, LI->getType(), MaxBECount, &SE, nullptr);
+      L, PtrScev, LI->getType(), MaxBECount, SymbolicMaxBECount, &SE, nullptr);
   if (isa<SCEVCouldNotCompute>(AccessStart) ||
       isa<SCEVCouldNotCompute>(AccessEnd))
     return false;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index a1d91de3bb788..cdce1f1941c2f 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -190,7 +190,7 @@ RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(
 
 std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
     const Loop *Lp, const SCEV *PtrExpr, Type *AccessTy, const SCEV *MaxBECount,
-    ScalarEvolution *SE,
+    const SCEV *SymbolicMaxBECount, ScalarEvolution *SE,
     DenseMap<std::pair<const SCEV *, Type *>,
              std::pair<const SCEV *, const SCEV *>> *PointerBounds) {
   std::pair<const SCEV *, const SCEV *> *PtrBoundsPair;
@@ -206,11 +206,31 @@ std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
   const SCEV *ScStart;
   const SCEV *ScEnd;
 
+  auto &DL = Lp->getHeader()->getDataLayout();
+  Type *IdxTy = DL.getIndexType(PtrExpr->getType());
+  const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
   if (SE->isLoopInvariant(PtrExpr, Lp)) {
     ScStart = ScEnd = PtrExpr;
   } else if (auto *AR = dyn_cast<SCEVAddRecExpr>(PtrExpr)) {
     ScStart = AR->getStart();
-    ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+    if (!isa<SCEVCouldNotCompute>(MaxBECount))
+      // Evaluating AR at an exact BTC is safe: LAA separately checks that
+      // accesses cannot wrap in the loop. If evaluating AR at BTC wraps, then
+      // the loop either triggers UB when executing a memory access with a
+      // poison pointer or the wrapping/poisoned pointer is not used.
+      ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+    else {
+      // Evaluating AR at MaxBTC may wrap and create an expression that is less
+      // than the start of the AddRec due to wrapping (for example consider
+      // MaxBTC = -2). If that's the case, set ScEnd to -(EltSize + 1). ScEnd
+      // will get incremented by EltSize before returning, so this effectively
+      // sets ScEnd to unsigned max. Note that LAA separately checks that
+      // accesses cannot not wrap, so unsigned max represents an upper bound.
+      ScEnd = AR->evaluateAtIteration(SymbolicMaxBECount, *SE);
+      if (!SE->isKnownNonNegative(SE->getMinusSCEV(ScEnd, ScStart)))
+        ScEnd = SE->getNegativeSCEV(
+            SE->getAddExpr(EltSizeSCEV, SE->getOne(EltSizeSCEV->getType())));
+    }
     const SCEV *Step = AR->getStepRecurrence(*SE);
 
     // For expressions with negative step, the upper bound is ScStart and the
@@ -232,9 +252,6 @@ std::pair<const SCEV *, const SCEV *> llvm::getStartAndEndForAccess(
   assert(SE->isLoopInvariant(ScEnd, Lp) && "ScEnd needs to be invariant");
 
   // Add the size of the pointed element to ScEnd.
-  auto &DL = Lp->getHeader()->getDataLayout();
-  Type *IdxTy = DL.getIndexType(PtrExpr->getType());
-  const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
   ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);
 
   std::pair<const SCEV *, const SCEV *> Res = {ScStart, ScEnd};
@@ -250,9 +267,11 @@ void RuntimePointerChecking::insert(Loop *Lp, Value *Ptr, const SCEV *PtrExpr,
                                     unsigned DepSetId, unsigned ASId,
                                     PredicatedScalarEvolution &PSE,
                                     bool NeedsFreeze) {
-  const SCEV *MaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+  const SCEV *SymbolicMaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+  const SCEV *MaxBECount = PSE.getBackedgeTakenCount();
   const auto &[ScStart, ScEnd] = getStartAndEndForAccess(
-      Lp, PtrExpr, AccessTy, MaxBECount, PSE.getSE(), &DC.getPointerBounds());
+      Lp, PtrExpr, AccessTy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+      &DC.getPointerBounds());
   assert(!isa<SCEVCouldNotCompute>(ScStart) &&
          !isa<SCEVCouldNotCompute>(ScEnd) &&
          "must be able to compute both start and end expressions");
@@ -1933,11 +1952,14 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
   // required for correctness.
   if (SE.isLoopInvariant(Src, InnermostLoop) ||
       SE.isLoopInvariant(Sink, InnermostLoop)) {
-    const SCEV *MaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
+    const SCEV *MaxBECount = PSE.getBackedgeTakenCount();
+    const SCEV *SymbolicMaxBECount = PSE.getSymbolicMaxBackedgeTakenCount();
     const auto &[SrcStart_, SrcEnd_] = getStartAndEndForAccess(
-        InnermostLoop, Src, ATy, MaxBECount, PSE.getSE(), &PointerBounds);
+        InnermostLoop, Src, ATy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+        &PointerBounds);
     const auto &[SinkStart_, SinkEnd_] = getStartAndEndForAccess(
-        InnermostLoop, Sink, BTy, MaxBECount, PSE.getSE(), &PointerBounds);
+        InnermostLoop, Sink, BTy, MaxBECount, SymbolicMaxBECount, PSE.getSE(),
+        &PointerBounds);
     if (!isa<SCEVCouldNotCompute>(SrcStart_) &&
         !isa<SCEVCouldNotCompute>(SrcEnd_) &&
         !isa<SCEVCouldNotCompute>(SinkStart_) &&
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll
new file mode 100644
index 0000000000000..d58dd38d9fef8
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll
@@ -0,0 +1,92 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='print<access-info>' -disable-output %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+
+; Note: The datalayout for the test specifies a 32 bit index type.
+
+; No UB: accessing last valid byte, pointer after the object
+; doesnt wrap (%p + 2147483647).
+define void @pointer_after_object_does_not_wrap(i32 %y, ptr %s, ptr %p) {
+; CHECK-LABEL: 'pointer_after_object_does_not_wrap'
+; CHECK-NEXT:    loop:
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
+; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+; CHECK-NEXT:        Against group ([[GRP2:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep1.iv = getelementptr inbounds i8, ptr %s, i32 %iv
+; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP1]]:
+; CHECK-NEXT:          (Low: (%y + %p) High: (2147483647 + %p))
+; CHECK-NEXT:            Member: {(%y + %p),+,1}<nw><%loop>
+; CHECK-NEXT:        Group [[GRP2]]:
+; CHECK-NEXT:          (Low: (%y + %s) High: (2147483647 + %s))
+; CHECK-NEXT:            Member: {(%y + %s),+,1}<nw><%loop>
+; CHECK-EMPTY:
+; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT:      SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Expressions re-written:
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop ]
+  %gep1.iv = getelementptr inbounds i8 , ptr %s, i32 %iv
+  %load = load i8, ptr %gep1.iv, align 4
+  %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+  store i8 %load, ptr %gep2.iv, align 4
+  %iv.next = add nsw i32 %iv, 1
+  %c.2 = icmp slt i32 %iv.next, 2147483647
+  br i1 %c.2, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+; UB: accessing %p + 2147483646 and p + 2147483647.
+; Pointer the past the object would wrap in signed.
+define void @pointer_after_object_would_wrap(i32 %y, ptr %s, ptr %p) {
+; CHECK-LABEL: 'pointer_after_object_would_wrap'
+; CHECK-NEXT:    loop:
+; CHECK-NEXT:      Memory dependences are safe with run-time checks
+; CHECK-NEXT:      Dependences:
+; CHECK-NEXT:      Run-time memory checks:
+; CHECK-NEXT:      Check 0:
+; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+; CHECK-NEXT:        Against group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT:          %gep1.iv = getelementptr inbounds i8, ptr %s, i32 %iv
+; CHECK-NEXT:      Grouped accesses:
+; CHECK-NEXT:        Group [[GRP3]]:
+; CHECK-NEXT:          (Low: (%y + %p) High: (-2147483648 + %p))
+; CHECK-NEXT:            Member: {(%y + %p),+,1}<nw><%loop>
+; CHECK-NEXT:        Group [[GRP4]]:
+; CHECK-NEXT:          (Low: (%y + %s) High: (-2147483648 + %s))
+; CHECK-NEXT:            Member: {(%y + %s),+,1}<nw><%loop>
+; CHECK-EMPTY:
+; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT:      SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT:      Expressions re-written:
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop ]
+  %gep1.iv = getelementptr inbounds i8 , ptr %s, i32 %iv
+  %load = load i16, ptr %gep1.iv, align 4
+  %gep2.iv = getelementptr inbounds i8, ptr %p, i32 %iv
+  store i16 %load, ptr %gep2.iv, align 4
+  %iv.next = add nsw i32 %iv, 1
+  %c.2 = icmp slt i32 %iv.next, 2147483647
+  br i1 %c.2, label %loop, label %exit
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
index dd06cab26d095..0aa74c7b6442b 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-symbolic-max-backedge-taken-count-may-wrap.ll
@@ -3,7 +3,6 @@
 
 target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
 
-; FIXME: Start == End for access group with AddRec.
 define void @runtime_checks_with_symbolic_max_btc_neg_1(ptr %P, ptr %S, i32 %x, i32 %y) {
 ; CHECK-LABEL: 'runtime_checks_with_symbolic_max_btc_neg_1'
 ; CHECK-NEXT:    loop:
@@ -17,7 +16,7 @@ define void @runtime_checks_with_symbolic_max_btc_neg_1(ptr %P, ptr %S, i32 %x,
 ; CHECK-NEXT:        ptr %S
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP1]]:
-; CHECK-NEXT:          (Low: ((4 * %y) + %P) High: ((4 * %y) + %P))
+; CHECK-NEXT:          (Low: ((4 * %y) + %P) High: -1)
 ; CHECK-NEXT:            Member: {((4 * %y) + %P),+,4}<%loop>
 ; CHECK-NEXT:        Group [[GRP2]]:
 ; CHECK-NEXT:          (Low: %S High: (4 + %S))
@@ -44,7 +43,6 @@ exit:
   ret void
 }
 
-; FIXME: Start > End for access group with AddRec.
 define void @runtime_check_with_symbolic_max_btc_neg_2(ptr %P, ptr %S, i32 %x, i32 %y) {
 ; CHECK-LABEL: 'runtime_check_with_symbolic_max_btc_neg_2'
 ; CHECK-NEXT:    loop:
@@ -58,7 +56,7 @@ define void @runtime_check_with_symbolic_max_btc_neg_2(ptr %P, ptr %S, i32 %x, i
 ; CHECK-NEXT:        ptr %S
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP3]]:
-; CHECK-NEXT:          (Low: ((4 * %y) + %P) High: (-4 + (4 * %y) + %P))
+; CHECK-NEXT:          (Low: ((4 * %y) + %P) High: -1)
 ; CHECK-NEXT:            Member: {((4 * %y) + %P),+,4}<%loop>
 ; CHECK-NEXT:        Group [[GRP4]]:
 ; CHECK-NEXT:          (Low: %S High: (4 + %S))

fhahn · 2025-02-20T20:32:22Z

I am not sure if there's a better way to check if evaluateAtIteration may wrap. It also currently always passes both BTC and SymbolicMaxBTC as there is one caller where PSE directly isn't available. This could probably also be improved once we agree on how to best prevent wrapping.

github-actions · 2025-02-20T20:34:53Z

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn · 2025-02-20T20:45:29Z

llvm/test/Analysis/LoopAccessAnalysis/evaluate-at-backedge-taken-count-wrapping.ll

+
+target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+
+; Note: The datalayout for the test specifies a 32 bit index type.


Alive proofs for the test cases showing the last accessed address doesn't have UB (@src1/@tgt1) and has UB (@src2/@tgt2): https://alive2.llvm.org/ce/z/EJVZep

Just a side-note, but the tests might be a little easier to understand if we use a 8-bit index type.

artagnon

Very confused about the code, although the tests seem to check out. BackedgeTakenInfo has an IsComplete indicating whether SCEVCouldNotCompute will be returned.

artagnon · 2025-02-20T21:18:08Z

llvm/lib/Analysis/Loads.cpp

      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
                 : SE.getConstantMaxBackedgeTakenCount(L);
+  const SCEV *SymbolicMaxBECount =
+      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)


s/Constant/Symbolic/? Perhaps pass ExitKind to getPredicatedBackedgeTakenCount?

Yep this should be constant, will fix, thanks

But this isn't the symbolic maximum, right? The documentation says:

/// A constant which provides an upper bound on the exact trip count. ConstantMaximum, /// An expression which provides an upper bound on the exact trip count. SymbolicMaximum,

Surely, it should be called ConstMaxBECount?

artagnon

Sorry, I got confused in the previous review. This evaluateAtIteration wrapping is troubling me, and I'm not sure how it wraps: perhaps @nikic can chime in?

artagnon · 2025-02-20T21:43:11Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

  } else if (auto *AR = dyn_cast<SCEVAddRecExpr>(PtrExpr)) {
    ScStart = AR->getStart();
-    ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+    if (!isa<SCEVCouldNotCompute>(BTC))


Not sure why we are passing the exact BTC, and handling the case where it is a could-not-compute. Why not just pass the symbolic max as before, and have the logic below?

If we can compute the back edge taken count, we are guaranteed to execute exactly that amount of iterations.

the symbolic max back edge taken count is an upper bound and the loop may exit at any earlier iteration (eg because it has an uncountable exit).

As per the comment, computable BTC means we should be able to rely on the fact that the pointers cannot wrap in any iteration. If we instead only have symbolic mac BTC, we may only execute a smaller number of iterations than the max, and then only those iterations are guaranteed to not wrap in general, so evaluating at the symbolic max may wrap.

One case to consider is when the symbolic max BTC is a SCEVUnknown, we will form a SCEvMultiply expression for which we cannot determine if it wraps or not (vs the case when the symbolic BTC is a constant)

Thanks for the explanation. My confusion is the following: if we have a computable BTC, isn't Exact = SymbolicMax? If we don't have a computable BTC, Exact = SCEVCouldNotCompute and SymbolicMax could be a SCEVConstant, general SCEV expression, SCEVUnknown, or SCEVCouldNotCompute, in the worst case. If my reasoning is correct, there is no additional information in Exact over the SymbolicMax, and we shouldn't have to pass Exact. In the test cases you have added, isn't SymbolicMax a SCEVConstant = INT_MAX? What does evaluating an AddRec at the INT_MAX iteration wrap to? Not -(EltSize + 1), or evaluating the AddRec at INT_MIN? Perhaps worth adding some SCEV tests for this evaluation, as a separate patch that we can verify?

When SymbolicMax is a SCEVUnknown, it means that the iteration is bounded by some function argument IR value, right? In this case, Exact will also be the same SCEVUnknown, and if we pass INT_MAX when calling the function, the evaluation will wrap, and this is UB anyway?

What happens when SymbolicMax is a SCEVCouldNotCompute? I think this will result in a crash with the current code.

Okay, just thinking out loud here: for simplicity, let AR = {0, +, 1} and let SymbolicMax BTC = INT_MAX. Then, we compute AddExpr(0, MulExpr(1, INT_MAX)). I don't think this overflows. Now, let AR = {0, + 2}. Then, we compute AddExpr(0, MulExpr(2, BinomialCoefficient(INT_MAX, 2)) where the binomial coefficient evaluates to INT_MAX * (INT_MAX - 1) / 2. Naively doing this would overflow even for BTC equal to sqrt(INT_MAX), but it looks like BinomialCoefficient is written carefully, although the final result is truncated (?). In conclusion, it looks like the problem is that evaluateAtIteration does not wrap, but rather truncates the result?

Thanks for digging into this! One clarification is that we evaluate at BTC = UNSIGNED_MAX. So {0, +, 1} won't wrap, but adding 1 will (getStartAndEndForAccess will compute the first address after the last access).

When we have strides larger than 1, the last accessed address will be something like %start + stride * UNSIGNED_MAX, which should wrap to something like %start - %stride. I am not entirely sure if there may be other wrapping issues with how evaluateAtIteration internally computes the result, but the original end point computed for runtime_checks_with_symbolic_max_btc_neg_1 should illustrates that: start == end due to adding %stride to the result of evaluateAtIteration.

artagnon · 2025-02-20T21:45:05Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+  const SCEV *SymbolicMaxBTC = PSE.getSymbolicMaxBackedgeTakenCount();
+  const SCEV *BTC = PSE.getBackedgeTakenCount();
+  const auto &[ScStart, ScEnd] =
+      getStartAndEndForAccess(Lp, PtrExpr, AccessTy, BTC, SymbolicMaxBTC,
+                              PSE.getSE(), &DC.getPointerBounds());


Are we changing this because the exact BTC gives better results in some cases?

It's changed to differentiate the cases where we can and cannot compute the BTC exactly (there may not be a computable BTC for loops with early exits)

artagnon · 2025-02-20T21:50:18Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+      ScEnd = AR->evaluateAtIteration(SymbolicMaxBTC, *SE);
+      if (!SE->isKnownNonNegative(SE->getMinusSCEV(ScEnd, ScStart)))
+        ScEnd = SE->getNegativeSCEV(
+            SE->getAddExpr(EltSizeSCEV, SE->getOne(EltSizeSCEV->getType())));


Not sure how evaluateAtIteration overflows:

Result = SE.getAddExpr(Result, SE.getMulExpr(Operands[i], Coeff));

david-arm · 2025-02-21T09:34:59Z

llvm/lib/Analysis/Loads.cpp

      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
                 : SE.getConstantMaxBackedgeTakenCount(L);
+  const SCEV *SymbolicMaxBECount =
+      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)


But this isn't the symbolic maximum, right? The documentation says:

/// A constant which provides an upper bound on the exact trip count. ConstantMaximum, /// An expression which provides an upper bound on the exact trip count. SymbolicMaximum,

Surely, it should be called ConstMaxBECount?

david-arm · 2025-02-21T09:36:20Z

llvm/lib/Analysis/Loads.cpp

  const SCEV *MaxBECount =
      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
                 : SE.getConstantMaxBackedgeTakenCount(L);
+  const SCEV *SymbolicMaxBECount =


This value is identical to MaxBECount so why not just pass MaxBECount in as both arguments to getStartAndEndForAccess?

Updated the naming to clarify the names, thanks

artagnon · 2025-02-21T16:06:24Z

It also currently always passes both BTC and SymbolicMaxBTC as there is one caller where PSE directly isn't available. This could probably also be improved once we agree on how to best prevent wrapping.

Not sure I understand the problem: the Loads caller has Predicates which we can pass, making it equivalent to a PSE call?

nikic · 2025-02-25T13:39:02Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+      // TODO: Use additional information to determine no-wrap including
+      // size/dereferencability info from the accessed object.
+      ScEnd = AR->evaluateAtIteration(MaxBTC, *SE);
+      if (!SE->isKnownNonNegative(SE->getMinusSCEV(ScEnd, ScStart)))


Why is it sufficient to check that the difference is non-negative? Can't it happen that the addrec wraps but still ends up at a value > ScStart?

Yes, I also added a test to that effect now, this was an initial attempt to avoid regressions, which turned out to be a bit tricky to fix.

For now, I tried to check the object size if known to see if the maximum value of the add-rec will be inside the object in evaluateAddRecAtMaxBTCWillNotWrap

fhahn

Rebased, the latest version also fixes incorrectly determining that accesses in loops are dereferenceable (see dereferenceable-info-from-assumption-variable-size.ll)

fhahn · 2025-02-25T13:45:48Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

  } else if (auto *AR = dyn_cast<SCEVAddRecExpr>(PtrExpr)) {
    ScStart = AR->getStart();
-    ScEnd = AR->evaluateAtIteration(MaxBECount, *SE);
+    if (!isa<SCEVCouldNotCompute>(BTC))


Thanks for digging into this! One clarification is that we evaluate at BTC = UNSIGNED_MAX. So {0, +, 1} won't wrap, but adding 1 will (getStartAndEndForAccess will compute the first address after the last access).

When we have strides larger than 1, the last accessed address will be something like %start + stride * UNSIGNED_MAX, which should wrap to something like %start - %stride. I am not entirely sure if there may be other wrapping issues with how evaluateAtIteration internally computes the result, but the original end point computed for runtime_checks_with_symbolic_max_btc_neg_1 should illustrates that: start == end due to adding %stride to the result of evaluateAtIteration.

fhahn · 2025-02-25T13:47:31Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+  const SCEV *SymbolicMaxBTC = PSE.getSymbolicMaxBackedgeTakenCount();
+  const SCEV *BTC = PSE.getBackedgeTakenCount();
+  const auto &[ScStart, ScEnd] =
+      getStartAndEndForAccess(Lp, PtrExpr, AccessTy, BTC, SymbolicMaxBTC,
+                              PSE.getSE(), &DC.getPointerBounds());


It's changed to differentiate the cases where we can and cannot compute the BTC exactly (there may not be a computable BTC for loops with early exits)

fhahn · 2025-02-25T14:01:42Z

llvm/lib/Analysis/Loads.cpp

  const SCEV *MaxBECount =
      Predicates ? SE.getPredicatedConstantMaxBackedgeTakenCount(L, *Predicates)
                 : SE.getConstantMaxBackedgeTakenCount(L);
+  const SCEV *SymbolicMaxBECount =


Updated the naming to clarify the names, thanks

david-arm · 2025-02-26T14:52:22Z

llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll

 ; CHECK-NEXT:    [[P2:%.*]] = alloca [1024 x i8], align 1
 ; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
 ; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
-; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]


This looks like a regression to me. Are you suggesting that we simply cannot ever check for dereferenceability in reverse loops for some fundamental reason or that the test itself was invalid? If it's the latter I'm happy to rewrite the test. :) It would be a shame to lose this functionality.

The functionality for supporting testing whether loads could be dereferenced in reverse loops was added in #96752 specifically to support such cases.

Yep the original version wasn't handling this, but should be fixed in the latest version.

Extend test coverage for #128061.

Extend test coverage for llvm/llvm-project#128061.

Use dereferenceable attribute instead of assumption to make the tests independent of #128061.

…of assumption. Use dereferenceable attribute instead of assumption to make the tests independent of llvm/llvm-project#128061.

david-arm

LGTM.

david-arm · 2025-06-20T09:50:59Z

llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtime-checks.ll

 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group GRP0:
-; CHECK-NEXT:          (Low: %B High: (4004 + %B))
+; CHECK-NEXT:          (Low: %B High: inttoptr (i64 -1 to ptr))


This needs a TODO because we shouldn't be doing this for dereferenceable pointers, right? Surely if it's guaranteed to be dereferenceable that implies it should not wrap? For example, I'm struggling to see how a C++ object passed by reference to a function could be allocated across a wrapped address space and be legal. I would expect any attempt to actually use the object triggers undefined behaviour in the C++ specification. I realise there's more to life than C++ - this is just one example of course.

Also, if I've understood correctly it will significantly impact @huntergr-arm's work to enable vectorisation of early exit loops with loads and stores from dereferenceable memory.

Again, I'm happy to accept the patch as is, just saying that we should improve this in future if we ever want to make progress with early exit vectorisaton.

That test was accessing one past the dereferenceable range (see the original bound of 4004 + %B), so I think using -1 is the best we can do here. The name of the test was confusing, as it actually executes 1001 iterations.

There's now a variant that actually executes at most 1000 iterations (for which we don't pessimize the bounds) and this test has been renamed to 1001 iterations.

Ah ok, that's great. I think the name of the test did confuse me and made me quite worried. However, I think there's still work to do here in the long term once we support using first-faulting loads to vectorise loops with loads and stores using arbitrary pointers. CC @huntergr-arm. We can revisit this later.

david-arm · 2025-06-20T09:53:01Z

llvm/test/Analysis/LoopAccessAnalysis/early-exit-runtime-checks.ll

I think I understand. It sounds like you're saying that without the early exit we don't really care if the runtime checks are nonsense or not because the entire loop is UB anyway? Whereas for early exit loops the loop may or may not be UB and so we do care about getting the right runtime checks.

…overflow

…max BTC. (#128061) Evaluating AR at the symbolic max BTC may wrap and create an expression that is less than the start of the AddRec due to wrapping (for example consider MaxBTC = -2). If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get incremented by EltSize before returning, so this effectively sets ScEnd to unsigned max. Note that LAA separately checks that accesses cannot not wrap (52ded67, llvm/llvm-project#127543), so unsigned max represents an upper bound. When there is a computable backedge-taken count, we are guaranteed to execute the number of iterations, and if any pointer would wrap it would be UB (or the access will never be executed, so cannot alias). It includes new tests from the previous discussion that show a case we wrap with a BTC, but it is UB due to the pointer after the object wrapping (in `evaluate-at-backedge-taken-count-wrapping.ll`) When we have only a maximum backedge taken count, we instead try to use dereferenceability information to determine if the pointer access must be in bounds for the maximum backedge taken count. PR: llvm/llvm-project#128061

nikic

(nits)

nikic · 2025-06-23T20:46:06Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp


+/// Returns \p A + \p B, if it is guaranteed not to unsigned wrap. Otherwise
+/// return nullptr. \p A and \p B must have the same type.
+static const SCEV *addSCEVOverflow(const SCEV *A, const SCEV *B,


I think addSCEVNoOverflow / mulSCEVNoOverflow would be a better name for these.

Yeah updated in b876910, thanks

nikic · 2025-06-23T20:46:43Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+/// return nullptr. \p A and \p B must have the same type.
+static const SCEV *addSCEVOverflow(const SCEV *A, const SCEV *B,
+                                   ScalarEvolution &SE) {
+  if (!SE.willNotOverflow(Instruction::Add, false, A, B))


Add /*IsSigned=*/

Done thanks

nikic · 2025-06-23T20:53:37Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

+      SE.getMinusSCEV(AR->getStart(), StartPtr), WiderTy);
+
+  const SCEV *OffsetAtLastIter =
+      mulSCEVOverflow(MaxBTC, SE.getAbsExpr(Step, false), SE);


/*IsNSW=*/

Done thanks

Adjust naming and add argument comments as suggested.

…lvm#128061) Evaluating AR at the symbolic max BTC may wrap and create an expression that is less than the start of the AddRec due to wrapping (for example consider MaxBTC = -2). If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get incremented by EltSize before returning, so this effectively sets ScEnd to unsigned max. Note that LAA separately checks that accesses cannot not wrap (52ded67, llvm#127543), so unsigned max represents an upper bound. When there is a computable backedge-taken count, we are guaranteed to execute the number of iterations, and if any pointer would wrap it would be UB (or the access will never be executed, so cannot alias). It includes new tests from the previous discussion that show a case we wrap with a BTC, but it is UB due to the pointer after the object wrapping (in `evaluate-at-backedge-taken-count-wrapping.ll`) When we have only a maximum backedge taken count, we instead try to use dereferenceability information to determine if the pointer access must be in bounds for the maximum backedge taken count. PR: llvm#128061

Adjust naming and add argument comments as suggested.

This patch extends the logic added in llvm#128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed.

annamthomas · 2025-07-20T18:55:32Z

@fhahn, to clarify my understanding here: this patch is to prevent incorrect runtime checks from being generated when we the last access address may overflow. If the last access overflows, this is in fact UB. However, for early exit loops, in such cases, we may not have a UB in the original program (since we may early exit) but we end up generating incorrect runtime checks when vectorizing.
For regular loops, such a case (last access overflows) is a UB, so it does not matter that we are generating an incorrect runtime check.

If this is only to prevent incorrect runtime checks, we technically need to check this only for loops that need runtime checks?

Code form for llvm#128061 (comment).

fhahn · 2025-07-21T18:20:56Z

@fhahn, to clarify my understanding here: this patch is to prevent incorrect runtime checks from being generated when we the last access address may overflow. If the last access overflows, this is in fact UB. However, for early exit loops, in such cases, we may not have a UB in the original program (since we may early exit) but we end up generating incorrect runtime checks when vectorizing. For regular loops, such a case (last access overflows) is a UB, so it does not matter that we are generating an incorrect runtime check.

If this is only to prevent incorrect runtime checks, we technically need to check this only for loops that need runtime checks?

Hmm, I think the check is needed for any user of getStartAndEndForAccess, otherwise it's not guaranteed that Start < End and that computing end does not wrap. If that would happen, then using the Start/End SCEVs is probably not safe?

…47047) This patch extends the logic added in #128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: #147047

…NotWrap (#147047) This patch extends the logic added in llvm/llvm-project#128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: llvm/llvm-project#147047

…vm#147047) This patch extends the logic added in llvm#128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: llvm#147047 (cherry picked from commit 2ae996c)

fhahn requested review from Meinersbur, artagnon, david-arm, nikic and preames February 20, 2025 20:29

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Feb 20, 2025

fhahn force-pushed the laa-symbolic-max-btc-overflow branch from b985f31 to 19fe791 Compare February 20, 2025 20:31

fhahn mentioned this pull request Feb 20, 2025

[LAA] Be more careful when evaluating AddRecs at symbolic max BTC. #106530

Closed

fhahn commented Feb 20, 2025

View reviewed changes

artagnon reviewed Feb 20, 2025

View reviewed changes

david-arm reviewed Feb 21, 2025

View reviewed changes

fhahn mentioned this pull request Feb 24, 2025

[Loads] Support dereferenceable assumption with variable size. #128436

Merged

fhahn force-pushed the laa-symbolic-max-btc-overflow branch from d63d103 to a5b5a13 Compare February 25, 2025 13:33

llvmbot added the llvm:transforms label Feb 25, 2025

nikic reviewed Feb 25, 2025

View reviewed changes

fhahn commented Feb 25, 2025

View reviewed changes

david-arm reviewed Feb 26, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Mar 13, 2025

[LAA] Add extra tests for #128061.

dfb661c

Extend test coverage for #128061.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 13, 2025

Automerge: [LAA] Add extra tests for #128061.

31756c1

Extend test coverage for llvm/llvm-project#128061.

fhahn added a commit that referenced this pull request Mar 18, 2025

[LV] Update test to use dereferenceable attribute instead of assumption.

1442fe0

Use dereferenceable attribute instead of assumption to make the tests independent of #128061.

fhahn force-pushed the laa-symbolic-max-btc-overflow branch from a5b5a13 to 86016b2 Compare March 18, 2025 22:25

fhahn added 6 commits June 19, 2025 16:28

!fix formatting

4c5d9ed

!fixup fix isDereferenceableAndAlignedInLoop.

198b3b1

!fixup check object size to determine no-wrap.

92d8bd6

!fixup update after rebase, extend offset as well.

2636025

!fixup address comments, thanks

41f58ca

!fixup address comments, thanks

fc91973

fhahn force-pushed the laa-symbolic-max-btc-overflow branch from 0989341 to fc91973 Compare June 19, 2025 16:12

david-arm approved these changes Jun 20, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into laa-symbolic-max-btc-…

3764f69

…overflow

fhahn merged commit 5d01697 into llvm:main Jun 23, 2025
6 of 7 checks passed

fhahn deleted the laa-symbolic-max-btc-overflow branch June 23, 2025 19:23

nikic reviewed Jun 23, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Jun 24, 2025

[LAA] Address follow-up suggestions for #128061.

b876910

Adjust naming and add argument comments as suggested.

DrSergei pushed a commit to DrSergei/llvm-project that referenced this pull request Jun 24, 2025

[LAA] Address follow-up suggestions for llvm#128061.

ab393f3

Adjust naming and add argument comments as suggested.

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025

[LAA] Address follow-up suggestions for llvm#128061.

d05f728

Adjust naming and add argument comments as suggested.

fhahn mentioned this pull request Jul 4, 2025

[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap #147047

Merged

annamthomas added a commit to annamthomas/llvm-project that referenced this pull request Jul 21, 2025

Wrapping test need to be done only if runtime checks present

93736a3

Code form for llvm#128061 (comment).

annamthomas mentioned this pull request Jul 21, 2025

Wrapping test need to be done only if runtime checks present #149856

Closed


		target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

		; Note: The datalayout for the test specifies a 32 bit index type.

[LAA] Be more careful when evaluating AddRecs at symbolic max BTC. #128061

[LAA] Be more careful when evaluating AddRecs at symbolic max BTC. #128061

Uh oh!

Conversation

fhahn commented Feb 20, 2025

Uh oh!

llvmbot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artagnon Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artagnon commented Feb 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Feb 20, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading

artagnon Feb 21, 2025 •

edited

Loading

david-arm Jun 24, 2025 •

edited

Loading