[LAA] Prepare to handle diff type sizes (NFC) #122318

artagnon · 2025-01-09T17:13:38Z

As depend_diff_types shows, there are several places where the HasSameSize check can be relaxed for higher analysis precision. As a first step, return both the source size and the sink size from getDependenceDistanceStrideAndSize, along with a HasSameSize boolean for the moment.

llvmbot · 2025-01-09T17:14:11Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

The HasSameSize checks, which are triggered on different store sizes, in MemDepChecker::isDependent are ad-hoc and imprecise, leading to spurious dependencies and runtime-checks. Identify that the exact scenario in which to bail out is unequal store sizes when dependence distance is zero, and check precisely this condition in
MemDepChecker::getDependenceDistanceAndSize, eliminating all the ad-hoc checks in isDependent and making LoopAccessAnalysis more precise.

Full diff: https://github.com/llvm/llvm-project/pull/122318.diff

4 Files Affected:

(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+19-26)
(modified) llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll (-4)
(modified) llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll (+9-22)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+33-8)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 38e9145826c08e..90a9de8bf1a601 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1990,10 +1990,17 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
   }
 
   uint64_t TypeByteSize = DL.getTypeAllocSize(ATy);
-  bool HasSameSize =
-      DL.getTypeStoreSizeInBits(ATy) == DL.getTypeStoreSizeInBits(BTy);
-  if (!HasSameSize)
-    TypeByteSize = 0;
+
+  // When the distance is zero, we're reading/writing the same memory location:
+  // check that the store sizes are equal. Otherwise, fail with an unknown
+  // dependence for which we should not generate runtime checks.
+  TypeSize AStoreSz = DL.getTypeStoreSizeInBits(ATy),
+           BStoreSz = DL.getTypeStoreSizeInBits(BTy);
+  if (Dist->isZero() && AStoreSz != BStoreSz) {
+    LLVM_DEBUG(
+        dbgs() << "LAA: zero dependence distance but different type sizes\n");
+    return Dependence::Unknown;
+  }
 
   StrideAPtrInt = std::abs(StrideAPtrInt);
   StrideBPtrInt = std::abs(StrideBPtrInt);
@@ -2029,7 +2036,6 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
   auto &[Dist, MaxStride, CommonStride, ShouldRetryWithRuntimeCheck,
          TypeByteSize, AIsWrite, BIsWrite] =
       std::get<DepDistanceStrideAndSizeInfo>(Res);
-  bool HasSameSize = TypeByteSize > 0;
 
   if (isa<SCEVCouldNotCompute>(Dist)) {
     // TODO: Relax requirement that there is a common unscaled stride to retry
@@ -2047,9 +2053,9 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
   // upper bound of the number of iterations), the accesses are independet, i.e.
   // they are far enough appart that accesses won't access the same location
   // across all loop ierations.
-  if (HasSameSize && isSafeDependenceDistance(
-                         DL, SE, *(PSE.getSymbolicMaxBackedgeTakenCount()),
-                         *Dist, MaxStride, TypeByteSize))
+  if (isSafeDependenceDistance(DL, SE,
+                               *(PSE.getSymbolicMaxBackedgeTakenCount()), *Dist,
+                               MaxStride, TypeByteSize))
     return Dependence::NoDep;
 
   const SCEVConstant *ConstDist = dyn_cast<SCEVConstant>(Dist);
@@ -2060,7 +2066,7 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
 
     // If the distance between accesses and their strides are known constants,
     // check whether the accesses interlace each other.
-    if (Distance > 0 && CommonStride && CommonStride > 1 && HasSameSize &&
+    if (Distance > 0 && CommonStride && CommonStride > 1 &&
         areStridedAccessesIndependent(Distance, *CommonStride, TypeByteSize)) {
       LLVM_DEBUG(dbgs() << "LAA: Strided accesses are independent\n");
       return Dependence::NoDep;
@@ -2074,15 +2080,9 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
 
   // Negative distances are not plausible dependencies.
   if (SE.isKnownNonPositive(Dist)) {
-    if (SE.isKnownNonNegative(Dist)) {
-      if (HasSameSize) {
-        // Write to the same location with the same size.
-        return Dependence::Forward;
-      }
-      LLVM_DEBUG(dbgs() << "LAA: possibly zero dependence difference but "
-                           "different type sizes\n");
-      return Dependence::Unknown;
-    }
+    if (SE.isKnownNonNegative(Dist))
+      // Write to the same location with the same size.
+      return Dependence::Forward;
 
     bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
     // Check if the first access writes to a location that is read in a later
@@ -2102,8 +2102,7 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
         FoundNonConstantDistanceDependence |= ShouldRetryWithRuntimeCheck;
         return Dependence::Unknown;
       }
-      if (!HasSameSize ||
-          couldPreventStoreLoadForward(
+      if (couldPreventStoreLoadForward(
               ConstDist->getAPInt().abs().getZExtValue(), TypeByteSize)) {
         LLVM_DEBUG(
             dbgs() << "LAA: Forward but may prevent st->ld forwarding\n");
@@ -2134,12 +2133,6 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
     FoundNonConstantDistanceDependence |= ShouldRetryWithRuntimeCheck;
   }
 
-  if (!HasSameSize) {
-    LLVM_DEBUG(dbgs() << "LAA: ReadWrite-Write positive dependency with "
-                         "different type sizes\n");
-    return Dependence::Unknown;
-  }
-
   if (!CommonStride)
     return Dependence::Unknown;
 
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
index adfd19923e921c..7837c20f003e24 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
@@ -70,10 +70,6 @@ define void @forward_different_access_sizes(ptr readnone %end, ptr %start) {
 ; CHECK-NEXT:            store i32 0, ptr %gep.2, align 4 ->
 ; CHECK-NEXT:            %l = load i24, ptr %gep.1, align 1
 ; CHECK-EMPTY:
-; CHECK-NEXT:        Forward:
-; CHECK-NEXT:            store i32 0, ptr %gep.2, align 4 ->
-; CHECK-NEXT:            store i24 %l, ptr %ptr.iv, align 1
-; CHECK-EMPTY:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll b/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
index 08e0bae7f05bac..c43b072b30a8ea 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/positive-dependence-distance-different-access-sizes.ll
@@ -3,26 +3,13 @@
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
-; TODO: No runtime checks should be needed, as the distance between accesses
-; is large enough to need runtime checks.
 define void @test_distance_positive_independent_via_trip_count(ptr %A) {
 ; CHECK-LABEL: 'test_distance_positive_independent_via_trip_count'
 ; CHECK-NEXT:    loop:
-; CHECK-NEXT:      Memory dependences are safe with run-time checks
+; CHECK-NEXT:      Memory dependences are safe
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
-; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
-; CHECK-NEXT:          %gep.A.400 = getelementptr inbounds i32, ptr %A.400, i64 %iv
-; CHECK-NEXT:        Against group ([[GRP2:0x[0-9a-f]+]]):
-; CHECK-NEXT:          %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP1]]:
-; CHECK-NEXT:          (Low: (400 + %A)<nuw> High: (804 + %A))
-; CHECK-NEXT:            Member: {(400 + %A)<nuw>,+,4}<nuw><%loop>
-; CHECK-NEXT:        Group [[GRP2]]:
-; CHECK-NEXT:          (Low: %A High: (101 + %A))
-; CHECK-NEXT:            Member: {%A,+,1}<nuw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
 ; CHECK-NEXT:      SCEV assumptions:
@@ -57,15 +44,15 @@ define void @test_distance_positive_backwards(ptr %A) {
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP1:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A.400 = getelementptr inbounds i32, ptr %A.1, i64 %iv
-; CHECK-NEXT:        Against group ([[GRP4:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP2:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP3]]:
+; CHECK-NEXT:        Group [[GRP1]]:
 ; CHECK-NEXT:          (Low: (1 + %A)<nuw> High: (405 + %A))
 ; CHECK-NEXT:            Member: {(1 + %A)<nuw>,+,4}<nuw><%loop>
-; CHECK-NEXT:        Group [[GRP4]]:
+; CHECK-NEXT:        Group [[GRP2]]:
 ; CHECK-NEXT:          (Low: %A High: (101 + %A))
 ; CHECK-NEXT:            Member: {%A,+,1}<nuw><%loop>
 ; CHECK-EMPTY:
@@ -100,15 +87,15 @@ define void @test_distance_positive_via_assume(ptr %A, i64 %off) {
 ; CHECK-NEXT:      Dependences:
 ; CHECK-NEXT:      Run-time memory checks:
 ; CHECK-NEXT:      Check 0:
-; CHECK-NEXT:        Comparing group ([[GRP5:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Comparing group ([[GRP3:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A.400 = getelementptr inbounds i32, ptr %A.off, i64 %iv
-; CHECK-NEXT:        Against group ([[GRP6:0x[0-9a-f]+]]):
+; CHECK-NEXT:        Against group ([[GRP4:0x[0-9a-f]+]]):
 ; CHECK-NEXT:          %gep.A = getelementptr inbounds i8, ptr %A, i64 %iv
 ; CHECK-NEXT:      Grouped accesses:
-; CHECK-NEXT:        Group [[GRP5]]:
+; CHECK-NEXT:        Group [[GRP3]]:
 ; CHECK-NEXT:          (Low: (%off + %A) High: (404 + %off + %A))
 ; CHECK-NEXT:            Member: {(%off + %A),+,4}<nw><%loop>
-; CHECK-NEXT:        Group [[GRP6]]:
+; CHECK-NEXT:        Group [[GRP4]]:
 ; CHECK-NEXT:          (Low: %A High: (101 + %A))
 ; CHECK-NEXT:            Member: {%A,+,1}<nuw><%loop>
 ; CHECK-EMPTY:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll b/llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll
index bd77f9779b680d..6451aa96b04da4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll
@@ -35,10 +35,10 @@ define void @pr58722_load_interleave_group(ptr %src, ptr %dst) {
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i64 1
 ; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 1
 ; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i64 1
-; CHECK-NEXT:    [[TMP13:%.*]] = load i24, ptr [[TMP9]], align 4, !alias.scope !0
-; CHECK-NEXT:    [[TMP14:%.*]] = load i24, ptr [[TMP10]], align 4, !alias.scope !0
-; CHECK-NEXT:    [[TMP15:%.*]] = load i24, ptr [[TMP11]], align 4, !alias.scope !0
-; CHECK-NEXT:    [[TMP16:%.*]] = load i24, ptr [[TMP12]], align 4, !alias.scope !0
+; CHECK-NEXT:    [[TMP13:%.*]] = load i24, ptr [[TMP9]], align 4, !alias.scope [[META0:![0-9]+]]
+; CHECK-NEXT:    [[TMP14:%.*]] = load i24, ptr [[TMP10]], align 4, !alias.scope [[META0]]
+; CHECK-NEXT:    [[TMP15:%.*]] = load i24, ptr [[TMP11]], align 4, !alias.scope [[META0]]
+; CHECK-NEXT:    [[TMP16:%.*]] = load i24, ptr [[TMP12]], align 4, !alias.scope [[META0]]
 ; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <4 x i24> poison, i24 [[TMP13]], i32 0
 ; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <4 x i24> [[TMP17]], i24 [[TMP14]], i32 1
 ; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <4 x i24> [[TMP18]], i24 [[TMP15]], i32 2
@@ -47,7 +47,7 @@ define void @pr58722_load_interleave_group(ptr %src, ptr %dst) {
 ; CHECK-NEXT:    [[TMP22:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[TMP21]]
 ; CHECK-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0
-; CHECK-NEXT:    store <4 x i32> [[TMP22]], ptr [[TMP24]], align 4, !alias.scope !3, !noalias !0
+; CHECK-NEXT:    store <4 x i32> [[TMP22]], ptr [[TMP24]], align 4, !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
 ; CHECK-NEXT:    br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
@@ -96,17 +96,42 @@ exit:
 define void @pr58722_store_interleave_group(ptr %src, ptr %dst) {
 ; CHECK-LABEL: @pr58722_store_interleave_group(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = mul i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[SRC:%.*]], i32 [[TMP0]]
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i32 [[TMP1]]
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[TMP2]], align 4
+; CHECK-NEXT:    store i32 [[TMP1]], ptr [[TMP3]], align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 1
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i64 1
+; CHECK-NEXT:    [[TMP6:%.*]] = trunc i32 [[TMP0]] to i24
+; CHECK-NEXT:    [[TMP7:%.*]] = trunc i32 [[TMP1]] to i24
+; CHECK-NEXT:    store i24 [[TMP6]], ptr [[TMP4]], align 4
+; CHECK-NEXT:    store i24 [[TMP7]], ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 5000
+; CHECK-NEXT:    br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 10000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[GEP_IV:%.*]] = getelementptr inbounds i64, ptr [[SRC:%.*]], i32 [[IV]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[GEP_IV:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i32 [[IV]]
 ; CHECK-NEXT:    store i32 [[IV]], ptr [[GEP_IV]], align 4
 ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds i64, ptr [[GEP_IV]], i64 1
 ; CHECK-NEXT:    [[TRUNC_IV:%.*]] = trunc i32 [[IV]] to i24
 ; CHECK-NEXT:    store i24 [[TRUNC_IV]], ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 2
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[IV]], 10000
-; CHECK-NEXT:    br i1 [[CMP]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
 ;

llvm/lib/Analysis/LoopAccessAnalysis.cpp

artagnon · 2025-01-16T12:30:29Z

Gentle ping.

artagnon · 2025-07-07T12:57:43Z

With all preparatory work being merged, this patch is looking much better, and is now ready for review.

artagnon · 2025-07-15T11:37:38Z

Gentle ping.

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-use-after-free.ll

llvm/lib/Analysis/LoopAccessAnalysis.cpp

igogo-x86 · 2025-07-21T15:53:26Z

I think I found a problematic case:

define void @one(ptr %dst) {
entry:
  %gep.10 = getelementptr nuw i8, ptr %dst, i64 10
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %gep.iv = getelementptr i8, ptr %dst, i64 %iv
  store i32 0, ptr %gep.iv, align 2
  %gep.10.iv = getelementptr i8, ptr %gep.10, i64 %iv
  store i32 1, ptr %gep.10.iv, align 1
  %iv.next = add i64 %iv, 8
  %ec = icmp eq i64 %iv.next, 64
  br i1 %ec, label %exit, label %loop

exit:                            ; preds = %loop
  ret void
}

Here both stores have the same size and this formula is correct and give 12:

uint64_t MinDistanceNeeded = MaxStride * (MinNumIter - 1) + TypeByteSize;

We have these accesses each iteration with VF == 2:

; dst[i+0..3,8..11] - first store
; dst[i+10..13,18..21] - second store

And LAA correctly determines that they overlap:

LAA: Failure because of positive minimum distance 10 but MinDistanceNeeded is 12

if we use both store type of i16, the code is vectorisable with VF=2:

LAA: No unsafe dependent memory operations in loop.  We don't need runtime memory checks.

However, if we take first store size as i32 (or i64) and second is i16, there should be a conflict, because ranges overlap:

define void @two(ptr %dst) {
entry:
  %gep.10 = getelementptr nuw i8, ptr %dst, i64 10
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %gep.iv = getelementptr i8, ptr %dst, i64 %iv
  store i32 0, ptr %gep.iv, align 2
  %gep.10.iv = getelementptr i8, ptr %gep.10, i64 %iv
  store i16 1, ptr %gep.10.iv, align 1
  %iv.next = add i64 %iv, 8
  %ec = icmp eq i64 %iv.next, 64
  br i1 %ec, label %exit, label %loop

exit:                            ; preds = %loop
  ret void
}

Accessed elements:

; dst[i+0..3,8..11]
; dst[i+10..11,18..19]

However we get this output:

LAA: No unsafe dependent memory operations in loop.  We don't need runtime memory checks.

fhahn · 2025-07-21T16:48:03Z

I think I found a problematic case:

define void @one(ptr %dst) {
entry:
  %gep.10 = getelementptr nuw i8, ptr %dst, i64 10
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %gep.iv = getelementptr i8, ptr %dst, i64 %iv
  store i32 0, ptr %gep.iv, align 2
  %gep.10.iv = getelementptr i8, ptr %gep.10, i64 %iv
  store i32 1, ptr %gep.10.iv, align 1
  %iv.next = add i64 %iv, 8
  %ec = icmp eq i64 %iv.next, 64
  br i1 %ec, label %exit, label %loop

exit:                            ; preds = %loop
  ret void
}

Here both stores have the same size and this formula is correct and give 12:

uint64_t MinDistanceNeeded = MaxStride * (MinNumIter - 1) + TypeByteSize;

We have these accesses each iteration with VF == 2:

; dst[i+0..3,8..11] - first store
; dst[i+10..13,18..21] - second store

And LAA correctly determines that they overlap:

LAA: Failure because of positive minimum distance 10 but MinDistanceNeeded is 12

if we use both store type of i16, the code is vectorisable with VF=2:

LAA: No unsafe dependent memory operations in loop.  We don't need runtime memory checks.

However, if we take first store size as i32 (or i64) and second is i16, there should be a conflict, because ranges overlap:

define void @two(ptr %dst) {
entry:
  %gep.10 = getelementptr nuw i8, ptr %dst, i64 10
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %gep.iv = getelementptr i8, ptr %dst, i64 %iv
  store i32 0, ptr %gep.iv, align 2
  %gep.10.iv = getelementptr i8, ptr %gep.10, i64 %iv
  store i16 1, ptr %gep.10.iv, align 1
  %iv.next = add i64 %iv, 8
  %ec = icmp eq i64 %iv.next, 64
  br i1 %ec, label %exit, label %loop

exit:                            ; preds = %loop
  ret void
}

Accessed elements:

; dst[i+0..3,8..11]
; dst[i+10..11,18..19]

However we get this output:

LAA: No unsafe dependent memory operations in loop.  We don't need runtime memory checks.

Yeah, this is probably related to #122318 (comment), as TypeByteSize will be zero in case they don't match. We definitely need to use a size. I've not checked, but probably using the max size should be a good upper bound

llvm/lib/Analysis/LoopAccessAnalysis.cpp

artagnon · 2025-07-21T17:26:22Z

I think I found a problematic case:

Thanks! This is an important test-case, and shows that we need to consider the size of the source as well.

llvm/lib/Analysis/LoopAccessAnalysis.cpp

igogo-x86

LGTM, but give 24 hours for @fhahn to have a final word for my review approvals.

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

artagnon · 2025-08-26T12:16:57Z

Gentle ping.

artagnon · 2025-09-01T08:23:52Z

Gentle ping.

artagnon · 2025-09-08T08:16:44Z

Gentle ping.

artagnon · 2025-09-16T10:29:56Z

I would be inclined to merge this, since the promised follow-ups are real improvements. @fhahn: if you have any objections, could you please raise them?

fhahn · 2025-09-16T10:32:52Z

Ah yes, I'll run some final testing today

llvm/lib/Analysis/LoopAccessAnalysis.cpp

As depend_diff_types shows, there are several places where the HasSameSize check can be relaxed for higher analysis precision. As a first step, return both the source size and the sink size from getDependenceDistanceStrideAndSize, along with a HasSameSize boolean for the moment.

fhahn

LGTM, thanks

llvm-ci · 2025-09-18T09:12:11Z

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/29755

Here is the relevant piece of the build log for the reference

Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
5.390 [0/1/0] Running the MLIR regression tests
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1205.876029

KennethHilmersson · 2025-09-24T13:32:52Z

We see a regression with this patch when we compile with -fno-strict-aliasing.
There is a runtime memory check block added and the scalar loop remains with the patch:
fno-strict-aliasing example

mikaelholmen · 2025-09-24T13:39:24Z

We see a regression with this patch when we compile with -fno-strict-aliasing. There is a runtime memory check block added and the scalar loop remains with the patch: fno-strict-aliasing example

Here's an opt/LAA-only reproducer for the same problem, showing that LAA now is different (I've verifed that I see a similar difference if I just revert this patch on trunk, so I don't think it's due to other changes on trunk compared to 21.1.0):
godbolt

artagnon · 2025-09-24T13:59:39Z

We see a regression with this patch when we compile with -fno-strict-aliasing. There is a runtime memory check block added and the scalar loop remains with the patch: fno-strict-aliasing example

Here's an opt/LAA-only reproducer for the same problem, showing that LAA now is different (I've verifed that I see a similar difference if I just revert this patch on trunk, so I don't think it's due to other changes on trunk compared to 21.1.0): godbolt

Hm, very strange. The distance is indeed possibly zero:

LAA: Distance for   %0 = load i32, ptr %arrayidx, align 1 to   store i16 %conv, ptr %arrayidx2, align 1: {0,+,-2}<%for.body>

The question is: how was the distance determined non-zero before this patch?

artagnon · 2025-09-24T14:07:11Z

Ah, it is known non-positive, but not known non-negative. This is not equivalent to !(known non-zero). I would, however, argue that !(known non-zero) is the correct check, and given the current limitations, we actually should generate RT checks?

KennethHilmersson · 2025-09-24T17:33:07Z

Attaching debug printouts when compiling with the patch and with the patch reverted. Note again it is from the previous godbolt example with -O3 -fno-strict-aliasing for both cases now for our out of tree target (byte size 16 bits and big-endian).
reverted-laa-nfc-patch.log
laa-nfc-patch.log

artagnon · 2025-09-24T18:04:08Z

@KennethHilmersson Thanks, your example seems to be quite similar to the one from @mikaelholmen -- the only difference is that the distance is {0, +, -1} instead of {0, +, -2}. The question is no longer about what causes the regression, but rather, if what we've done in this patch technically fixes the behavior.

artagnon · 2025-09-25T12:44:19Z

After some thought, I've put up #160701 to fix this. Thanks, both, for the regression report!

KennethHilmersson · 2025-09-26T08:08:29Z

Thanks for the fast response! The fix solved the regression.

artagnon requested review from david-arm, fhahn and huntergr-arm January 9, 2025 17:13

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jan 9, 2025

fhahn reviewed Jan 9, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

artagnon force-pushed the laa-typesz-precise branch from 423c690 to 2103c89 Compare January 13, 2025 13:29

artagnon mentioned this pull request Jan 31, 2025

[LAA] Scale strides using type-size (NFC) #124529

Merged

artagnon changed the title ~~LAA: be more precise on different store sizes~~ [LAA] Be more precise on different store sizes Mar 27, 2025

artagnon mentioned this pull request Mar 27, 2025

[LAA] Hoist setting condition for RT-checks #128045

Merged

artagnon force-pushed the laa-typesz-precise branch from 2103c89 to ac63260 Compare July 7, 2025 12:56

artagnon requested review from Meinersbur and paulwalker-arm July 7, 2025 12:58

artagnon force-pushed the laa-typesz-precise branch from ac63260 to 6b8c322 Compare July 15, 2025 11:37

fhahn reviewed Jul 15, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-use-after-free.ll Outdated Show resolved Hide resolved

artagnon force-pushed the laa-typesz-precise branch 2 times, most recently from b847945 to 14024bd Compare July 16, 2025 15:22

artagnon requested a review from igogo-x86 July 16, 2025 15:27

igogo-x86 reviewed Jul 17, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

igogo-x86 reviewed Jul 21, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

fhahn reviewed Jul 21, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

fhahn reviewed Jul 21, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

artagnon force-pushed the laa-typesz-precise branch from 00b7db8 to 048293b Compare July 21, 2025 17:32

igogo-x86 reviewed Aug 19, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

igogo-x86 approved these changes Aug 19, 2025

View reviewed changes

fhahn reviewed Aug 20, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/include/llvm/Analysis/LoopAccessAnalysis.h Outdated Show resolved Hide resolved

artagnon force-pushed the laa-typesz-precise branch 2 times, most recently from bcbd8f1 to f4d46b5 Compare August 26, 2025 12:16

artagnon force-pushed the laa-typesz-precise branch from f4d46b5 to e415c8b Compare September 4, 2025 10:23

artagnon force-pushed the laa-typesz-precise branch from e415c8b to 3a70169 Compare September 16, 2025 10:45

fhahn reviewed Sep 17, 2025

View reviewed changes

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/LoopAccessAnalysis.cpp Outdated Show resolved Hide resolved

artagnon added 2 commits September 17, 2025 11:59

[LAA] Address review

e92715a

artagnon force-pushed the laa-typesz-precise branch from 3a70169 to e92715a Compare September 17, 2025 11:20

fhahn approved these changes Sep 18, 2025

View reviewed changes

artagnon merged commit 1aded51 into llvm:main Sep 18, 2025
9 checks passed

artagnon deleted the laa-typesz-precise branch September 18, 2025 08:30

[LAA] Prepare to handle diff type sizes (NFC) #122318

[LAA] Prepare to handle diff type sizes (NFC) #122318

Uh oh!

Conversation

artagnon commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

artagnon commented Jan 16, 2025

Uh oh!

artagnon commented Jul 7, 2025

Uh oh!

artagnon commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

igogo-x86 commented Jul 21, 2025

Uh oh!

fhahn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

artagnon commented Jul 21, 2025

Uh oh!

Uh oh!

igogo-x86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

artagnon commented Aug 26, 2025

Uh oh!

artagnon commented Sep 1, 2025

Uh oh!

artagnon commented Sep 8, 2025

Uh oh!

artagnon commented Sep 16, 2025

Uh oh!

fhahn commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Sep 18, 2025

Uh oh!

KennethHilmersson commented Sep 24, 2025

Uh oh!

mikaelholmen commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artagnon commented Sep 24, 2025

Uh oh!

artagnon commented Sep 24, 2025

Uh oh!

KennethHilmersson commented Sep 24, 2025

Uh oh!

artagnon commented Sep 24, 2025

Uh oh!

artagnon commented Sep 25, 2025

Uh oh!

KennethHilmersson commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

artagnon commented Jan 9, 2025 •

edited

Loading

llvmbot commented Jan 9, 2025 •

edited

Loading

mikaelholmen commented Sep 24, 2025 •

edited

Loading