[DA] Check monotonicity for subscripts #154527

kasuga-fj · 2025-08-20T12:48:29Z

The memory access wrap hasn't been properly handled in DA. As a first step, this patch introduces the concept of "monotonicity" and applies it to validate the result of delinearization.

Related: #151326 (comment)

kasuga-fj · 2025-08-20T13:06:59Z

@Meinersbur I'd like to get your thoughts on whether this approach seems reasonable. What do you think about this?

Also, I believe similar checks are needed in other parts of the DA as well, for example this function.

Meinersbur

Some thoughts:

This seems to only support AddRecExpr directly, I was thinking about something recursive that may contain an AddRecExpr, such as an SignExtend/ZeroExtend/Division/... of an SCEVAddRecExpr. The most common operations (mul/add with invariant) may indeed usually be folded into the SCEVAddRecExpr (always?). SCEVTruncate maybe illustrates that the "monotonic" property not only applies to SCEVAddRecExpr.

trunc 0x101 to i8 -> 0x01
trunc 0x203 to i8 -> 0x03
trunc 0x302 to i8 -> 0x02

Say the values 0x101, 0x203, 0x302 are the iteration values a loop¹. The maximum of the trunc expression is 0x03, which is neither the value of the initial AddRecExpr loop iteration, nor its last. That is, the range of indices in an array subscipt expression A[(char)i] is not 0x01 to 0x02.

I would have thougth of getting the min/max of an expression as a separate operation. Ideally, the simplication of SMin/SMax expression could be done by ScalarEvolution itself through canonicalization/folding.

Not sure how it could be encoded as an SCEVAddRecExpr or combination of SCEVs, but the point is that it monotonically increasing ↩

Meinersbur · 2025-08-22T10:06:08Z

llvm/include/llvm/Analysis/DependenceAnalysis.h

+  /// is monotonic. The term "monotonic" means that all AddRec Exprs in \p Expr
+  /// doesn't wrap in signed sense. When it is monotonic, the minimum and
+  /// maximum values of \p Expr are stored in \p Min and \p Max, respectively.
+  bool isMonotonicSCEV(const SCEV *Expr, const SCEV *&Min, const SCEV *&Max,


Document parameters?

Consider adding "signed" to the name. Currently DA assumes everything is a signed SCEV, but maybe we change that one day.

Separate the implementation, and added "signed" to the name.

Meinersbur · 2025-08-22T10:07:12Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+                                     const SCEV *&Max, ScalarEvolution *SE,
+                                     const Loop *OutermostLoop, IntegerType *Ty,
+                                     const Value *Ptr) const {
+  const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(Expr);


Consider using the SCEVVisitor pattern. It was made for such uses.

Thanks, it completely slipped my mind.

github-actions · 2025-08-22T16:34:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

kasuga-fj · 2025-08-22T16:53:37Z

Thanks for the feedback!

This seems to only support AddRecExpr directly, I was thinking about something recursive that may contain an AddRecExpr, such as an SignExtend/ZeroExtend/Division/... of an SCEVAddRecExpr.

Changed the implementation to use the SCEVVisitor class, and now handled more cases.

SCEVTruncate maybe illustrates that the "monotonic" property not only applies to SCEVAddRecExpr.
trunc 0x101 to i8 -> 0x01
trunc 0x203 to i8 -> 0x03
trunc 0x302 to i8 -> 0x02
Say the values 0x101, 0x203, 0x302 are the iteration values a loop1. The maximum of the trunc expression is 0x03, which is neither the value of the initial AddRecExpr loop iteration, nor its last. That is, the range of indices in an array subscipt expression A[(char)i] is not 0x01 to 0x02.

It actually seems monotonic, but I'm not sure the correct way to handle such cases... Let me think about it for a moment.

I would have thougth of getting the min/max of an expression as a separate operation. Ideally, the simplication of SMin/SMax expression could be done by ScalarEvolution itself through canonicalization/folding.

Splitted it in another class SCEVMinMaxCalculator. Do you mean that this should be defined as an API of ScalarEvolution?

So, I believe the overall approach is not too far off, so for now I'll check the tests and clean up the code.

I’d like to ask one question: I feel like these processes (monotonic checks, validation for delinearization, etc.) should be tested separately from other parts of DA. What do you think? Would that be a bit excessive?

sushgokh · 2025-08-25T05:49:34Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+        return MonotonicityType::MaySignedWrap;
+      Result = MonotonicityType::Constant;
+      break;
+    }


I dont understand one thing here. If the entire SCEV is NSW, why do we need to check if its NSW for individual operands? Do you have specific case in mind?

Also, I am trying to understand what MonotonicityType::Monotonic really means. Just going by the mathematical definition of monotonicity,

why Monotonic + Monotonic should be MaySignedWrap? Shouldnt it be Monotonic?

why Monotonic + Constant should be Constant? Shouldnt it be Monotonic?

First of all, probably the name is misleading. MaySignedWrap should be renamed to something like Unknown.

If the entire SCEV is NSW, why do we need to check if its NSW for individual operands? Do you have specific case in mind?

I was imagining an example like {0,+,%m}<%loop> + {0,+,%n}<%loop>, but I'm not sure if such a representation can actually exist. If ScalarEvolution guarantees that this form is always folded into something like {0,+,(%m+%n)}<%loop>, then maybe this is unnecessary. But if not, I'm not confident it's safe when each operand can potentially overflow (although DA doesn't support this kind of format)

Just going by the mathematical definition of monotonicity

DA breaks exactly due to the gap between mathematical theory and LLVM IR semantics.

why Monotonic + Monotonic should be MaySignedWrap? Shouldnt it be Monotonic?

To clearly distinguish between Constant and Monotonic. I was considering a case like {0,+,1}<%loop> + {0,+,-1}<%loop>. This always seems to evaluate to 0 (i.e., a Constant), but again, I'm not sure if such a representation can exist in SCEV.

why Monotonic + Constant should be Constant? Shouldnt it be Monotonic?

I think this is just a simple implementation mistake. Thanks for pointing it out.

First of all, probably the name is misleading. MaySignedWrap should be renamed to something like Unknown

ok, please change to Unknown or CouldNotCompute. MaySignedWrap is confusing

example like {0,+,%m}<%loop> + {0,+,%n}<%loop>

If the entire expression is nuw/nsw then individual SCEVs must follow the same pattern but vice-versa cant be true(this may wrap).

that this form is always folded into something like {0,+,(%m+%n)}<%loop>

this is not true because when its split form, each AddRed can have different values . But with (%m+%n) , every itr is multiple of (%m+%n)

{0,+,1}<%loop> + {0,+,-1}<%loop>

this expr can have values 0(=0+0), -1(=0-1), 1(=1+0), 0(=1-1). This is definitely not a constant. So, this should be Unknown

What I'm not entirely sure about is whether, given the following IR, the SCEV corresponding to %mn_i is guaranteed to be {0,+,(%m + %n)}<%loop> rather than {0,+,%m}<%loop> + {0,+,%n}<%loop>

loop: %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ] %m_i = mul nsw i64 %m, %i %n_i = mul nsw i64 %n, %i %mn_i = add nsw i64 %m_i, %n_i ...

If not, then I don't think we can say "Monotonic + Monotonic = Monotonic", since %m could be equal to -1 * %n, in which case %mn_i would always be zero. I don't know whether a constant value is considered to "monotonic" in general mathematical theory, but it is a corner case in the context of DA.

if m is invariant in current loop then it should be {0,+,(%m + %n)}<%loop> and vice-versa if n comes from outer loop

Okay, then I think the logic can be simplified.

@kasuga-fj Expressions like {0,+,%m}<%loop> + {0,+,%n}<%loop> are possible if SCEV either hits the arithmetic depth limit, or the huge expression limit.

@nikic I see, thanks for letting me know.

sushgokh · 2025-08-25T06:00:51Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  if (StepRes != MonotonicityType::Constant || !SE->isKnownNonZero(Step))
+    return MonotonicityType::MaySignedWrap;
+
+  bool IsNSW = [&] {


should this be IsNoWrap ?

sushgokh · 2025-08-25T06:02:23Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+    return MonotonicityType::MaySignedWrap;
+
+  bool IsNSW = [&] {
+    if (Expr->hasNoSignedWrap())


shouldnt you check for NoWrap? The expression, if unsigned, may not fit into the signed range

This is intentional. DA currently mixes signed and unsigned interpretations, which can lead to incorrect results. One of the main goals of this PR (and future ones) is to unify all integer interpretations in DA under a signed one.

if it only has nuw, the analysis would go wrong.

Yes, and the expected behavior is to detect such expressions and bail out of the analysis. The subsequent checks attempt to prove properties similar to nsw when it's not explicitly attached, although I'm not sure they're truly necessary (I've not tested enough yet).

sushgokh · 2025-08-25T06:31:15Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+SCEVSignedMonotonicityChecker::visitUnknown(const SCEVUnknown *Expr) {
+  return SE->isLoopInvariant(Expr, OutermostLoop)
+             ? MonotonicityType::Constant
+             : MonotonicityType::MaySignedWrap;


this is SCEVUnknown in the context of current loop. Why do we need to evaluate in the scope of outermost loop?

Just to align with the current behavior.

llvm-project/llvm/lib/Analysis/DependenceAnalysis.cpp

Lines 854 to 867 in db02476

// Returns true if Expression is loop invariant in LoopNest.

bool DependenceInfo::isLoopInvariant(const SCEV *Expression,

const Loop *LoopNest) const {

// Unlike ScalarEvolution::isLoopInvariant() we consider an access outside of

// any loop as invariant, because we only consier expression evaluation at a

// specific position (where the array access takes place), and not across the

// entire function.

if (!LoopNest)

return true;

// If the expression is invariant in the outermost loop of the loop nest, it

// is invariant anywhere in the loop nest.

return SE->isLoopInvariant(Expression, LoopNest->getOutermostLoop());

}

But I'm not sure whether checking invariance in the current loop is sufficient or not.

think conversely. If this was MonotonicityType::Constant, would this function be ever called? It should still be MonotonicityType::MaySignedWrap

sushgokh · 2025-08-25T06:36:43Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+
+      MonotonicityType SrcMonotonicity =
+          SCEVSignedMonotonicityChecker(SE, OutermostLoop, SrcPtr)
+              .visit(SrcSubscripts[I]);


Same comment. Why evaluate only in the scope of Outermost loop? Shouldnt this be in context of current loop?

sushgokh · 2025-08-25T07:04:12Z

I was wondering that isnt checking only geps for wrapping (i.e. nusw) sufficient ?

Also, is MinMax calculator really needed? For cases like #149501, we just need delta and bounds, right? Are there any other cases where min/max are needed seperately?

kasuga-fj · 2025-08-25T09:09:11Z

I was wondering that isnt checking only geps for wrapping (i.e. nusw) sufficient ?

It's insufficient. Please refer to relevant test cases, such as https://github.com/llvm/llvm-project/blob/296163f85dfc6a7f85972f5385ff85e67738a956/llvm/test/Analysis/DependenceAnalysis/DADelin.ll. For example, in @t1, %arrayidx is computed as getelementptr inbounds i32, ptr %A, i32 %add11. But %add11 is delienealized into multiple subscripts, effectively something like %A[i][j][k]. DA assumes monotonicity for all subscripts, but nusw only guarantees the no-wrap for %add11 * 4 and its addition to %A.

Also, is MinMax calculator really needed?

Yes, it is necessary to validate the delinearization result. Please note that this PR aims to address the broader issue that "DA doesn't account for wrapping accesses", not just the specific case I gave in #149501.

Meinersbur · 2025-08-25T11:32:15Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  Constant,      ///< The expression is constant. If a SCEV is classified as
+                 ///< Constant, it also implies that it doesn't contain any
+                 ///< arithmetic operations that may cause signed wrap.


What is the utility of Constant? isa<SCEVConstant>(...) should be sufficient. Did you mean "invariant" to a specific loop?

Yes, "invariant" seems more reasonable here.

Meinersbur · 2025-08-25T11:39:54Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  Monotonic, ///< The expression is monotonically increasing or decreasing. This
+             ///< is exclusive of Constant. That is, we say an SCEV is Monotonic
+             ///< iff it contains at least one AddRec where its step reccurence
+             ///< value is non-zero.


I think the description should mention it is relative to a specific loop. If it does not contain an SCEVAddRec, it could still contain a SCEVUnkown that is non-invariant in that loop.

It becomes interesting if you have a SCEVAddRec of a nested loop in there. How do you think those should be handled? E.g. the specific loop is counting up but the nested loop is counting down.

for (int i = n; i >= 0; --i) { int j = 0; for (; j < m; ++j) A[i + j] = ...; B[i + j] = ...; }

Is the SCEV for A[i + j] monotonic? I think it for B[i + j] because at that point j==m is invariant, so the min is 0 + m and the max is n + m.

I'm not entirely sure whether the term "monotonic" is appropriate here, but I believe A[i + j] in the example is "valid" in the context of DA. I think what I'm trying to conceptualize is something like how multilinear relates to linear -- perhaps something like 'multimonotonic' for monotonic?

So, I intended these properties to be considered with respect to the outermost loop, rather than any specific loop within the loop nest. In any case, I'm now feeling strongly that I should add some tests to illustrate the expected behavior.

Added some cases to monotonic.ll to illustrate my thoughts. I used debug outputs for now, but I'd prefer to avoid if it possible...

Meinersbur · 2025-08-25T11:40:53Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+
+private:
+  ScalarEvolution *SE;
+  const Loop *OutermostLoop;


Possibly remove the "Outermost", does not necessarily need to be an outermost loop.

I feel it's safer to make this outermost. Otherwise, unexpected behavior might occur in cases like below

llvm-project/llvm/test/Analysis/DependenceAnalysis/monotonic.ll

Lines 336 to 379 in b5ca793

; The value of step reccurence is not invariant with respect to the outer most

; loop (the i-loop).

;

; offset_i = 0;

; for (int i = 0; i < 100; i++) {

; for (int j = 0; j < 100; j++)

; a[offset_i + j] = 0;

; offset_i += (i % 2 == 0) ? 0 : 3;

; }

;

define void @step_is_variant(ptr %a) {

; CHECK-LABEL: 'step_is_variant'

; CHECK: Failed to prove monotonicity for: %offset.i

; CHECK: Failed to prove monotonicity for: {%offset.i,+,1}<nuw><nsw><%loop.j>

; CHECK: Monotonicity: Unknown expr: {%offset.i,+,1}<nuw><nsw><%loop.j>

;

entry:

br label %loop.i.header

loop.i.header:

%i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ]

%offset.i = phi i64 [ 0, %entry ], [ %offset.i.next, %loop.i.latch ]

%step.i.0 = phi i64 [ 0, %entry ], [ %step.i.1, %loop.i.latch ]

%step.i.1 = phi i64 [ 3, %entry ], [ %step.i.0, %loop.i.latch ]

br label %loop.j

loop.j:

%j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j ]

%offset = add nsw i64 %offset.i, %j

%idx = getelementptr inbounds i8, ptr %a, i64 %offset

store i8 0, ptr %idx

%j.inc = add nsw i64 %j, 1

%exitcond.j = icmp eq i64 %j.inc, 100

br i1 %exitcond.j, label %loop.i.latch, label %loop.j

loop.i.latch:

%i.inc = add nsw i64 %i, 1

%offset.i.next = add nsw i64 %offset.i, %step.i.0

%exitcond.i = icmp eq i64 %i.inc, 100

br i1 %exitcond.i, label %exit, label %loop.i.header

exit:

ret void

}

sebpop · 2025-08-25T14:35:25Z

Two points:

monotonicity checker belongs to ScalarEvolution.[h|cpp] (there's nothing specific to DA.)
monotonic checker is redundant with (could be built on top of) SCEV's existing range analysis.

Instead of a separate monotonicity checker, enhance SCEV's existing getRange() methods to detect wrapping more accurately and integrate this into the existing DA wrapping checks.

sushgokh · 2025-08-25T16:57:24Z

I was wondering that isnt checking only geps for wrapping (i.e. nusw) sufficient ?

It's insufficient. Please refer to relevant test cases, such as https://github.com/llvm/llvm-project/blob/296163f85dfc6a7f85972f5385ff85e67738a956/llvm/test/Analysis/DependenceAnalysis/DADelin.ll. For example, in @t1, %arrayidx is computed as getelementptr inbounds i32, ptr %A, i32 %add11. But %add11 is delienealized into multiple subscripts, effectively something like %A[i][j][k]. DA assumes monotonicity for all subscripts, but nusw only guarantees the no-wrap for %add11 * 4 and its addition to %A.

Lets suppose that nusw on %add11 does not guarantee nusw on individual subscripts. But then you should be able to get back %add11 using combination of wrapped/unwrapped subscripts. This is definitely not possible because
(anything + UB)=UB != %add11.
Do you agree here ?

kasuga-fj · 2025-08-26T08:29:04Z

Two points:

monotonicity checker belongs to ScalarEvolution.[h|cpp] (there's nothing specific to DA.)

monotonic checker is redundant with (could be built on top of) SCEV's existing range analysis.

Instead of a separate monotonicity checker, enhance SCEV's existing getRange() methods to detect wrapping more accurately and integrate this into the existing DA wrapping checks.

I think these checks include some logic specific to DA, such as inferring properties from nusw on GEPs, similar to what LoopAccessAnalysis does. Moreover, even if there are no DA specific checks, I think getRange is not appropriate in this case, since we need to recursively check the monotonicity of the AddRec for each loop.

Lets suppose that nusw on %add11 does not guarantee nusw on individual subscripts. But then you should be able to get back %add11 using combination of wrapped/unwrapped subscripts. This is definitely not possible because (anything + UB)=UB != %add11. Do you agree here ?

No. I don't think that's true in DA. Delienarization can decompose the original offset into multiple subscripts that are not "equivalent" to how the offset actually computed, which makes the problem complicated. Consider the following case, which I raised in #152566:

; void f(char *a, unsigned long long d) {
;   if (d == UINT64_MAX)
;     for (unsigned long long i = 0; i != d; i++)
;       a[i * (d + 1)] = 42;
; }
define void @f(ptr %a, i64 %d) {
entry:
  %guard = icmp eq i64 %d, -1
  br i1 %guard, label %loop.preheader, label %exit

loop.preheader:
  %stride = add nsw i64 %d, 1  ; since %d is -1, %stride is 0
  br label %loop

loop:
  %i = phi i64 [ 0, %loop.preheader ], [ %i.next, %loop ]
  %offset = phi i64 [ 0, %loop.preheader ], [ %offset.next, %loop ]
  %idx = getelementptr inbounds i8, ptr %a, i64 %offset
  store i8 42, ptr %idx
  %i.next = add nuw i64 %i, 1
  %offset.next = add nsw nuw i64 %offset, %stride
  %cond = icmp eq i64 %i.next, %d
  br i1 %cond, label %exit, label %loop

exit:
  ret void
}

The %offset in the above case is delinearized into:

AccessFunction: {0,+,(1 + %d)}<nuw><nsw><%loop>
Base offset: %a
ArrayDecl[UnknownSize][%d] with elements of 1 bytes.
ArrayRef[{0,+,1}<nuw><nsw><%loop>][{0,+,1}<nuw><nsw><%loop>]

Yeah, the nsw on each subscript is incorrect. This is a bug in SCEVDivision, and I'm currently working on it. That said, it also implies that even if the GEP has nusw, each subscript can still wrap without introducing UB.

sushgokh · 2025-08-26T10:43:35Z

Two points:

monotonicity checker belongs to ScalarEvolution.[h|cpp] (there's nothing specific to DA.)

monotonic checker is redundant with (could be built on top of) SCEV's existing range analysis.

Instead of a separate monotonicity checker, enhance SCEV's existing getRange() methods to detect wrapping more accurately and integrate this into the existing DA wrapping checks.

I think these checks include some logic specific to DA, such as inferring properties from nusw on GEPs, similar to what LoopAccessAnalysis does. Moreover, even if there are no DA specific checks, I think getRange is not appropriate in this case, since we need to recursively check the monotonicity of the AddRec for each loop.

Lets suppose that nusw on %add11 does not guarantee nusw on individual subscripts. But then you should be able to get back %add11 using combination of wrapped/unwrapped subscripts. This is definitely not possible because (anything + UB)=UB != %add11. Do you agree here ?

No. I don't think that's true in DA. Delienarization can decompose the original offset into multiple subscripts that are not "equivalent" to how the offset actually computed, which makes the problem complicated. Consider the following case, which I raised in #152566:
; void f(char *a, unsigned long long d) {
;   if (d == UINT64_MAX)
;     for (unsigned long long i = 0; i != d; i++)
;       a[i * (d + 1)] = 42;
; }
define void @f(ptr %a, i64 %d) {
entry:
  %guard = icmp eq i64 %d, -1
  br i1 %guard, label %loop.preheader, label %exit

loop.preheader:
  %stride = add nsw i64 %d, 1  ; since %d is -1, %stride is 0
  br label %loop

loop:
  %i = phi i64 [ 0, %loop.preheader ], [ %i.next, %loop ]
  %offset = phi i64 [ 0, %loop.preheader ], [ %offset.next, %loop ]
  %idx = getelementptr inbounds i8, ptr %a, i64 %offset
  store i8 42, ptr %idx
  %i.next = add nuw i64 %i, 1
  %offset.next = add nsw nuw i64 %offset, %stride
  %cond = icmp eq i64 %i.next, %d
  br i1 %cond, label %exit, label %loop

exit:
  ret void
}
The %offset in the above case is delinearized into:
AccessFunction: {0,+,(1 + %d)}<nuw><nsw><%loop>
Base offset: %a
ArrayDecl[UnknownSize][%d] with elements of 1 bytes.
ArrayRef[{0,+,1}<nuw><nsw><%loop>][{0,+,1}<nuw><nsw><%loop>]
Yeah, the nsw on each subscript is incorrect. This is a bug in SCEVDivision, and I'm currently working on it. That said, it also implies that even if the GEP has nusw, each subscript can still wrap without introducing UB.

Firstly, this entire program is UB since d+1 wraps around. So, I am not much concerned about this specific example.

Secondly, the array dimensions should be A[d+1][d]. It has deduced the delinearized dimensions as
ArrayRef[{0,+,1}<nuw><nsw><%loop>][{0,+,1}<nuw><nsw><%loop>]
Three points here:

We shouldnt have AddRec in the first dimension i.e. {0,+,1}<nuw><nsw><%loop>. It should be constant d+1
Addrec in the second dimension is correct. I would just expect nuw by looking at the program but need to check how d is being compared with -1
GEP doesnt have any wrap flags. I dont know how nuw/nsw was derived on AccessFunction: {0,+,(1 + %d)}<nuw><nsw><%loop>. But basically this says that even gep access value may wrap.

So, I think this is not the right example that says if you have nuw/nsw flags on GEP, you cant guarantee nuw/nsw flags on individual subscripts

kasuga-fj · 2025-08-26T10:54:31Z

Firstly, this entire program is UB since d+1 wraps around. So, I am not much concerned about this specific example.

I don't think that's correct. In the C language, wrapping behavior for unsigned integers is well-defined. In LLVM IR, unsigned wrapping is also permitted unless the add is marked as nuw (see LangRef).

EDIT: Even if the add has nuw, unsigned wrapping doesn't mean immediate UB (see LangRef).

Secondly, the array dimensions should be A[d+1][d]. It has deduced the delinearized dimensions as
ArrayRef[{0,+,1}<nuw><nsw><%loop>][{0,+,1}<nuw><nsw><%loop>]

The deduced dimensions are ArrayDecl. The ArrayRef shows inferred subscripts.

3. GEP doesnt have any wrap flags. I dont know how nuw/nsw was derived on AccessFunction: {0,+,(1 + %d)}<nuw><nsw><%loop>. But basically this says that even gep access value may wrap

The inbounds implies nusw (again, see LangRef).

sushgokh · 2025-08-26T12:48:31Z

I don't think that's correct. In the C language, wrapping behavior for unsigned integers is well-defined. In LLVM IR, unsigned wrapping is also permitted unless the add is marked as nuw (see LangRef).

Ah, my bad about wrapping behavior for unsigned .

The inbounds implies nusw

Does it always need to imply nsw ? I am slightly skeptical. Consider this. Its already accessing UINT64_MAX which is beyond signed representation. Or am I misinterpreting something here?

Anyway, coming back to the example, array dimensions can be expressed as A[d+1][d].
d+1 is nsw from %stride = add nsw i64 %d, 1 .
d is nuw from %i.next = add nuw i64 %i, 1 I assume.
But GEP has nusw. And now I understand why GEP and individual subscripts may have different wrap behavior.
Just for curiosity:

How did it derive nusw on the %offset.next = add nsw nuw i64 %offset, %stride and on GEP? Basically, compiler has some information about the values?
Didnt debug this, but any idea why did it delinearize to 2 dimensions?

kasuga-fj · 2025-08-27T11:33:59Z

I'll separate the SCEVMinMaxCalculator part into another PR.

UPDATE: Removed

kasuga-fj · 2025-08-27T12:02:19Z

@sushgokh

Does it always need to imply nsw ? I am slightly skeptical. Consider this. Its already accessing UINT64_MAX which is beyond signed representation. Or am I misinterpreting something here?

Sorry, I didn't catch what you meant. As far as I can tell, LLVM IR uses two's complement representation for integers. UINT64_MAX is 2^64 - 1 when interpreted as unsigned, and -1 when interpreted as signed. Also, I don't know whether the array access like a[UINT64_MAX] is well-defined in the C language.

How did it derive nusw on the %offset.next = add nsw nuw i64 %offset, %stride and on GEP? Basically, compiler has some information about the values?

The IR I presented was handwritten. I don't know whether the actual compiler infers those flags.

Didnt debug this, but any idea why did it delinearize to 2 dimensions?

Please refer to the delinearization function. Just to clarify, this is a heuristic based on the input IR and does not reflect how the original program accesses the array.

Meinersbur · 2025-08-28T15:15:57Z

monotonic checker is redundant with (could be built on top of) SCEV's existing range analysis.

SCEV's range analysis only returns ConstantRange. If you expect to have not just INT_MAX/INT_MIN returned, and contains an SCEVAddRecExpr, the min/max could be any of the values described by the SCEVAddRecExpr. It is impossible to enumerate all values if the trip count is not a constant. On the other side, if SCEVAddRecExpr does not wrap and the expression is on that SCEVAddRecExpr is monotonic, you can assume that the entire range falls between the evalution of the SCEVAddRecExpr's start value, and the loop's last iteration.

But I found related functionality in SE which is ScalarEvolution::getRangeForAffineNoSelfWrappingAR

kasuga-fj · 2025-08-28T16:02:22Z

But I found related functionality in SE which is ScalarEvolution::getRangeForAffineNoSelfWrappingAR

It seems to be a private function, and at a glance, we have to set UseExpensiveRangeSharpening to true in order to run it, which is false by default. Well, we might be able to allow controlling it through an argument, though.

kasuga-fj · 2025-08-29T12:10:45Z

I found a case where getRange is insufficient even if the backedge-taken count is constant. Consider the following:

; for (int i = 0; i < 100; i++)
;   a[i & 1] = 0;
define void @f(ptr %a) {
entry:
  br label %loop

loop:
  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
  %i.1 = phi i1 [ false, %entry ], [ %i.1.inc, %loop ]
  %offset = sext i1 %i.1 to i64
  %idx = getelementptr i8, ptr %a, i64 %offset
  store i8 0, ptr %idx
  %i.inc = add i64 %i, 1
  %i.1.inc = add i1 %i.1, true
  %exitcond = icmp eq i64 %i.inc, 100
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}

As I tested locally, ScalarEvolution::getSignedRange returned [-1, 1) for (sext i1 {false,+,true}<%loop> to i64) (which is the SCEV expression for %offset), and ConstantRange::isWrappedSignedSet returned false.

kasuga-fj

Most of the changes in test results are due to a failure to detect the lack of dependencies in patterns like the ones below:

for (int i = 0; i < n; i++)
  for (int j = 0; j < m; j++)
    *B++ = ...;

At a glance, handling this case seems to require non-trivial effort, so if I do address it, I'd prefer to do it in a separate PR.

I've commented on the other cases individually.

kasuga-fj · 2025-09-01T12:26:44Z

llvm/test/Analysis/DependenceAnalysis/DADelin.ll

-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
-; CHECK-NEXT:    da analyze - output [*|<]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i8 42, ptr %idx.1, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
-; CHECK-NEXT:    da analyze - none!


I believe this is a case where the test results were correctly updated. We cannot prove %k is non-zero from the IR.

kasuga-fj · 2025-09-01T12:29:36Z

llvm/test/Analysis/DependenceAnalysis/DADelin.ll

-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
-; CHECK-NEXT:    da analyze - output [*|<]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i8 42, ptr %idx.1, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
-; CHECK-NEXT:    da analyze - none!


Regarding this case, I think the original results are also correct. But to reason correctly, we need to infer that %k is non-zero, which seems to require a fair amount of effort.

kasuga-fj · 2025-09-01T12:34:20Z

llvm/test/Analysis/DependenceAnalysis/BasePtrBug.ll

-; CHECK-NEXT:    da analyze - input [*|<]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: %0 = load i32, ptr %gep.0, align 4 --> Dst: store i32 %add, ptr %gep.B, align 4
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: %1 = load i32, ptr %gep.1, align 4 --> Dst: %1 = load i32, ptr %gep.1, align 4
-; CHECK-NEXT:    da analyze - none!


Adding inbounds to the GEPs avoids these changes, but I'm not sure if they are intentionally omitted.

kasuga-fj · 2025-09-01T12:36:52Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+
+private:
+  ScalarEvolution *SE;
+  const Loop *OutermostLoop;


I feel it's safer to make this outermost. Otherwise, unexpected behavior might occur in cases like below

llvm-project/llvm/test/Analysis/DependenceAnalysis/monotonic.ll

Lines 336 to 379 in b5ca793

; The value of step reccurence is not invariant with respect to the outer most

; loop (the i-loop).

;

; offset_i = 0;

; for (int i = 0; i < 100; i++) {

; for (int j = 0; j < 100; j++)

; a[offset_i + j] = 0;

; offset_i += (i % 2 == 0) ? 0 : 3;

; }

;

define void @step_is_variant(ptr %a) {

; CHECK-LABEL: 'step_is_variant'

; CHECK: Failed to prove monotonicity for: %offset.i

; CHECK: Failed to prove monotonicity for: {%offset.i,+,1}<nuw><nsw><%loop.j>

; CHECK: Monotonicity: Unknown expr: {%offset.i,+,1}<nuw><nsw><%loop.j>

;

entry:

br label %loop.i.header

loop.i.header:

%i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ]

%offset.i = phi i64 [ 0, %entry ], [ %offset.i.next, %loop.i.latch ]

%step.i.0 = phi i64 [ 0, %entry ], [ %step.i.1, %loop.i.latch ]

%step.i.1 = phi i64 [ 3, %entry ], [ %step.i.0, %loop.i.latch ]

br label %loop.j

loop.j:

%j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j ]

%offset = add nsw i64 %offset.i, %j

%idx = getelementptr inbounds i8, ptr %a, i64 %offset

store i8 0, ptr %idx

%j.inc = add nsw i64 %j, 1

%exitcond.j = icmp eq i64 %j.inc, 100

br i1 %exitcond.j, label %loop.i.latch, label %loop.j

loop.i.latch:

%i.inc = add nsw i64 %i, 1

%offset.i.next = add nsw i64 %offset.i, %step.i.0

%exitcond.i = icmp eq i64 %i.inc, 100

br i1 %exitcond.i, label %exit, label %loop.i.header

exit:

ret void

}

kasuga-fj · 2025-09-01T12:38:48Z

llvm/test/Analysis/DependenceAnalysis/PR21585.ll

-; CHECK-NEXT:    da analyze - input [*]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: %0 = load i64, ptr %idx, align 4 --> Dst: store i64 %1, ptr %idx, align 4
-; CHECK-NEXT:    da analyze - anti [*|<]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i64 %1, ptr %idx, align 4 --> Dst: store i64 %1, ptr %idx, align 4
-; CHECK-NEXT:    da analyze - output [*]!


These changes seems reasonable.

kasuga-fj · 2025-09-01T12:42:45Z

llvm/test/Analysis/DependenceAnalysis/GCD.ll

-; CHECK-NEXT:    da analyze - output [* *]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i32 %7, ptr %arrayidx6, align 4 --> Dst: %11 = load i32, ptr %arrayidx12, align 4
-; CHECK-NEXT:    da analyze - flow [* *|<]!
+; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i32 %7, ptr %arrayidx6, align 4 --> Dst: store i32 %11, ptr %B.addr.12, align 4
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: %11 = load i32, ptr %arrayidx12, align 4 --> Dst: %11 = load i32, ptr %arrayidx12, align 4
-; CHECK-NEXT:    da analyze - input [* *]!


"I'm not entirely sure whether these changes are reasonable. Several sext instructions have been inserted, which seems to complicate things.

kasuga-fj · 2025-09-01T12:55:29Z

llvm/test/Analysis/DependenceAnalysis/monotonic.ll

+; `step` can be zero, so `step*j` can be either Invariant or Monotonic. This
+; propagates to `i + step*j`.
+;
+; for (int i = 0; i < n; i++)
+;   for (int j = 0; j < m; j++)
+;     a[i + step*j] = 0;
+;


I'm unsure how to handle cases like this, i.e., we cannot prove the value of the step recurrence being non-zero. I'd prefer to bail out early in such situations, but it might be better to proceed with the analysis and account for the possibility that the coefficient is zero in each dependence test.

kasuga-fj · 2025-09-01T13:07:03Z

~~The test failure in LoopInterchange seems to the omission of inbound to GEPs. I'll submit another PR to address this.~~
EDIT: The failed tests were fixed now.

Since all DA tests are passing, let me mark this as ready for now.

llvmbot · 2025-09-01T13:07:39Z

@llvm/pr-subscribers-llvm-analysis

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The memory access wrap hasn't been properly handled in DA. As a first step, this patch introduces the concept of "monotonicity" and applies it to validate the result of delinearization.

Related: #151326 (comment)

Patch is 66.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154527.diff

17 Files Affected:

(modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+331)
(modified) llvm/test/Analysis/DependenceAnalysis/Banerjee.ll (+15-15)
(modified) llvm/test/Analysis/DependenceAnalysis/BasePtrBug.ll (+2-2)
(modified) llvm/test/Analysis/DependenceAnalysis/DADelin.ll (+6-6)
(modified) llvm/test/Analysis/DependenceAnalysis/ExactRDIV.ll (+4-4)
(modified) llvm/test/Analysis/DependenceAnalysis/GCD.ll (+12-12)
(modified) llvm/test/Analysis/DependenceAnalysis/PR21585.ll (+3-3)
(modified) llvm/test/Analysis/DependenceAnalysis/Preliminary.ll (+4-4)
(modified) llvm/test/Analysis/DependenceAnalysis/Propagating.ll (+10-10)
(modified) llvm/test/Analysis/DependenceAnalysis/Separability.ll (+4-4)
(modified) llvm/test/Analysis/DependenceAnalysis/StrongSIV.ll (+2-2)
(modified) llvm/test/Analysis/DependenceAnalysis/SymbolicRDIV.ll (+6-6)
(modified) llvm/test/Analysis/DependenceAnalysis/SymbolicSIV.ll (+9-9)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll (+3-3)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroDstSIV.ll (+2-2)
(modified) llvm/test/Analysis/DependenceAnalysis/WeakZeroSrcSIV.ll (+2-2)
(added) llvm/test/Analysis/DependenceAnalysis/monotonic.ll (+379)

diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index f33e04e804e3d..75a59c9553e18 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -3308,6 +3308,300 @@ void DependenceInfo::updateDirection(Dependence::DVEntry &Level,
     llvm_unreachable("constraint has unexpected kind");
 }
 
+namespace {
+
+/// The type of signed monotonicity of a SCEV expression. This property is
+/// defined with respect to the outermost loop that DA is analyzing. Invariant
+/// and MultiMonotonic mutually exclusive, and both imply NoSignedWrap.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop invariants `A`
+/// and `B`, `A + B` is treated as Invariant even if the addition may wrap. On
+/// the other hand, if either `A` or `B` is an AddRec and we cannot prove the
+/// addition doesn't wrap, the result is classified as Unknown.
+enum class MonotonicityType {
+  Unknown, ///< The expression contains some non loop-invariant SCEVUnknown or
+           ///< arithmetic operation that has some AddRec as its subexpression
+           ///< and may cause signed wrap.
+  NoSignedWrap, ///< The expression doesn't contain any AddRecs that may wrap.
+                ///< This is a weaker property than Invariant or MultiMonotonic.
+                ///< Invariant and MultiMonotonic imply NoSignedWrap.
+  Invariant,    ///< The expression is a loop-invariant.
+  MultiMonotonic, ///< The expression is monotonically increasing or decreasing
+                  ///< with respect to each loop. This is exclusive of
+                  ///< Invariant. That is, we say an SCEV is MultiMonotonic only
+                  ///< if it contains at least one AddRec where its step
+                  ///< reccurence value is non-zero. Monotonicity is checked
+                  ///< independently for each loop. It is allowed to contain
+                  ///< both increasing and decreasing AddRecs.
+};
+
+/// A visitor that checks the signed monotonicity of SCEVs.
+struct SCEVSignedMonotonicityChecker
+    : public SCEVVisitor<SCEVSignedMonotonicityChecker, MonotonicityType> {
+
+  /// \p Ptr is the pointer that the SCEV is associated with, if any. It may be
+  /// used for the inferrence.
+  static MonotonicityType checkMonotonicity(ScalarEvolution *SE,
+                                            const SCEV *Expr,
+                                            const Loop *OutermostLoop,
+                                            const Value *Ptr = nullptr);
+
+  MonotonicityType visitAddRecExpr(const SCEVAddRecExpr *Expr);
+  MonotonicityType visitUnknown(const SCEVUnknown *Expr);
+  MonotonicityType visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr);
+  MonotonicityType visitSignExtendExpr(const SCEVSignExtendExpr *Expr);
+
+  MonotonicityType visitAddExpr(const SCEVAddExpr *Expr) {
+    return visitNAryHelper(Expr);
+  }
+  MonotonicityType visitMulExpr(const SCEVMulExpr *Expr) {
+    return visitNAryHelper(Expr);
+  }
+
+  MonotonicityType visitConstant(const SCEVConstant *) {
+    return MonotonicityType::Invariant;
+  }
+  MonotonicityType visitVScale(const SCEVVScale *) {
+    return MonotonicityType::Invariant;
+  }
+
+  // TODO: Handle more cases.
+  MonotonicityType visitPtrToIntExpr(const SCEVPtrToIntExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitTruncateExpr(const SCEVTruncateExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitUDivExpr(const SCEVUDivExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitSMaxExpr(const SCEVSMaxExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitUMaxExpr(const SCEVUMaxExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitSMinExpr(const SCEVSMinExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitUMinExpr(const SCEVUMinExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitSequentialUMinExpr(const SCEVSequentialUMinExpr *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+  MonotonicityType visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
+    return unknownMonotonicity(Expr);
+  }
+
+private:
+  ScalarEvolution *SE;
+  const Loop *OutermostLoop;
+  bool NoWrapFromGEP = false;
+
+  SCEVSignedMonotonicityChecker(ScalarEvolution *SE, const Loop *OutermostLoop,
+                                const Value *Ptr);
+
+  MonotonicityType visitNAryHelper(const SCEVNAryExpr *Expr);
+  MonotonicityType unknownMonotonicity(const SCEV *Expr);
+  bool isLoopInvariant(const SCEV *Expr) const;
+};
+
+} // anonymous namespace
+
+SCEVSignedMonotonicityChecker::SCEVSignedMonotonicityChecker(
+    ScalarEvolution *SE, const Loop *OutermostLoop, const Value *Ptr)
+    : SE(SE), OutermostLoop(OutermostLoop) {
+  if (Ptr) {
+    // Perform reasoning similar to LoopAccessAnalysis. If an AddRec would wrap
+    // and the GEP would have nusw, the wrapped memory location would become
+    // like as follows (in the mathmatical sense, assuming the step recurrence
+    // is positive):
+    //
+    //   (previously accessed location) + (step recurrence) - 2^N
+    //
+    // where N is the size of the pointer index type. Since the value of step
+    // recurrence is less than 2^(N-1), the distance between the previously
+    // accessed location and the wrapped location will be greater than 2^(N-1),
+    // which is larger than half the pointer index type space. The size of
+    // allocated object must not exceed the largest signed integer that fits
+    // into the index type, so the GEP value would be poison and any memory
+    // access using it would be immediate UB when executed.
+    //
+    // TODO: We don't check if the result of the GEP is always used. Maybe we
+    // should check the reachability from the GEP to the target instruction.
+    // E.g., in the following case, no-wrap would not trigger immediate UB:
+    //
+    //  entry:
+    //    ...
+    //    %gep = getelementptr inbounds i32, ptr %ptr, i32 %addrec
+    //    br i1 %cond, label %store, label %sink
+    //
+    //   store:
+    //     store i32 42, ptr %ptr
+    //     br label %sink
+    //
+    //   sink:
+    //     ...
+    //
+    auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+    if (GEP && GEP->hasNoUnsignedSignedWrap())
+      NoWrapFromGEP = true;
+  }
+}
+
+MonotonicityType SCEVSignedMonotonicityChecker::checkMonotonicity(
+    ScalarEvolution *SE, const SCEV *Expr, const Loop *OutermostLoop,
+    const Value *Ptr) {
+  SCEVSignedMonotonicityChecker Checker(SE, OutermostLoop, Ptr);
+  MonotonicityType MT = Checker.visit(Expr);
+
+#ifndef NDEBUG
+  switch (MT) {
+  case MonotonicityType::Unknown:
+    LLVM_DEBUG(dbgs() << "Monotonicity: Unknown expr: " << *Expr << "\n");
+    break;
+  case MonotonicityType::NoSignedWrap:
+    LLVM_DEBUG(dbgs() << "Monotonicity: No signed wrap expr: " << *Expr
+                      << "\n");
+    break;
+  case MonotonicityType::Invariant:
+    LLVM_DEBUG(dbgs() << "Monotonicity: Invariant expr: " << *Expr << "\n");
+    break;
+  case MonotonicityType::MultiMonotonic:
+    LLVM_DEBUG(dbgs() << "Monotonicity: Monotonic expr: " << *Expr << "\n");
+    break;
+  }
+#endif
+  return MT;
+}
+
+MonotonicityType
+SCEVSignedMonotonicityChecker::visitNAryHelper(const SCEVNAryExpr *Expr) {
+  assert((isa<SCEVAddExpr>(Expr) || isa<SCEVMulExpr>(Expr)) &&
+         "Unexpected SCEV");
+
+  if (isLoopInvariant(Expr))
+    return MonotonicityType::Invariant;
+
+  MonotonicityType Result = MonotonicityType::Invariant;
+  for (const SCEV *Op : Expr->operands()) {
+    assert(Result != MonotonicityType::Unknown && "Unexpected state");
+    switch (visit(Op)) {
+    case MonotonicityType::Unknown:
+      return unknownMonotonicity(Expr);
+    case MonotonicityType::NoSignedWrap:
+      Result = MonotonicityType::NoSignedWrap;
+      break;
+    case MonotonicityType::Invariant:
+      break;
+    case MonotonicityType::MultiMonotonic: {
+      switch (Result) {
+      case MonotonicityType::Unknown:
+        llvm_unreachable("should have been handled above");
+      case MonotonicityType::NoSignedWrap:
+        break;
+      case MonotonicityType::Invariant:
+        if (!Expr->hasNoSignedWrap())
+          return unknownMonotonicity(Expr);
+        Result = MonotonicityType::MultiMonotonic;
+        break;
+      case MonotonicityType::MultiMonotonic:
+        if (!Expr->hasNoSignedWrap())
+          return unknownMonotonicity(Expr);
+        if (!isa<SCEVAddExpr>(Expr))
+          return unknownMonotonicity(Expr);
+        // Monotonic + Monotonic might be a loop invariant, e.g., the following
+        // SCEV:
+        //
+        //   {0,+,1}<%loop> + {0,+,-1}<%loop>
+        //
+        // In that case, relax the property to NoSignedWrap.
+        Result = MonotonicityType::NoSignedWrap;
+        break;
+      }
+    } break;
+    }
+  }
+  return Result;
+}
+
+MonotonicityType
+SCEVSignedMonotonicityChecker::unknownMonotonicity(const SCEV *Expr) {
+  LLVM_DEBUG(dbgs() << "Failed to prove monotonicity for: " << *Expr << "\n");
+  return MonotonicityType::Unknown;
+}
+
+bool SCEVSignedMonotonicityChecker::isLoopInvariant(const SCEV *Expr) const {
+  return !OutermostLoop || SE->isLoopInvariant(Expr, OutermostLoop);
+}
+
+MonotonicityType
+SCEVSignedMonotonicityChecker::visitAddRecExpr(const SCEVAddRecExpr *Expr) {
+  if (!Expr->isAffine())
+    return unknownMonotonicity(Expr);
+
+  const SCEV *Start = Expr->getStart();
+  const SCEV *Step = Expr->getStepRecurrence(*SE);
+
+  MonotonicityType StartRes = visit(Start);
+  if (StartRes == MonotonicityType::Unknown)
+    return unknownMonotonicity(Expr);
+
+  MonotonicityType StepRes = visit(Step);
+  if (StepRes != MonotonicityType::Invariant)
+    return unknownMonotonicity(Expr);
+
+  // TODO: Enhance the inference here.
+  if (!Expr->hasNoSignedWrap() && !NoWrapFromGEP) {
+    if (!SE->isKnownNegative(Step))
+      // If the coefficient can be positive value, ensure that the AddRec is
+      // monotonically increasing.
+      if (!SE->isKnownOnEveryIteration(ICmpInst::ICMP_SGE, Expr, Start))
+        return unknownMonotonicity(Expr);
+
+    if (!SE->isKnownPositive(Step))
+      // If the coefficient can be positive value, ensure that the AddRec is
+      // monotonically decreasing.
+      if (!SE->isKnownOnEveryIteration(ICmpInst::ICMP_SLE, Expr, Start))
+        return unknownMonotonicity(Expr);
+  }
+
+  bool IsKnownNonZero = SE->isKnownNonZero(Step);
+  switch (StartRes) {
+  case MonotonicityType::Unknown:
+    llvm_unreachable("should have been handled above");
+  case MonotonicityType::NoSignedWrap:
+    return MonotonicityType::NoSignedWrap;
+  case MonotonicityType::Invariant:
+    return IsKnownNonZero ? MonotonicityType::MultiMonotonic
+                          : MonotonicityType::NoSignedWrap;
+  case MonotonicityType::MultiMonotonic:
+    // TODO: Should handle SCEV like `{{0,+,-1}<%loop>,+,1}<%loop>`?
+    return IsKnownNonZero ? MonotonicityType::MultiMonotonic
+                          : MonotonicityType::NoSignedWrap;
+  }
+  llvm_unreachable("unhandled MonotonicityType");
+}
+
+MonotonicityType SCEVSignedMonotonicityChecker::visitZeroExtendExpr(
+    const SCEVZeroExtendExpr *Expr) {
+  return visit(Expr->getOperand());
+}
+
+MonotonicityType SCEVSignedMonotonicityChecker::visitSignExtendExpr(
+    const SCEVSignExtendExpr *Expr) {
+  return visit(Expr->getOperand());
+}
+
+MonotonicityType
+SCEVSignedMonotonicityChecker::visitUnknown(const SCEVUnknown *Expr) {
+  if (!isLoopInvariant(Expr))
+    return unknownMonotonicity(Expr);
+  return MonotonicityType::Invariant;
+}
+
 /// Check if we can delinearize the subscripts. If the SCEVs representing the
 /// source and destination array references are recurrences on a nested loop,
 /// this function flattens the nested recurrences into separate recurrences
@@ -3500,12 +3794,37 @@ bool DependenceInfo::tryDelinearizeParametricSize(
   // to the dependency checks.
   if (!DisableDelinearizationChecks)
     for (size_t I = 1; I < Size; ++I) {
+      const Loop *OutermostLoop =
+          LI->getLoopFor(Src->getParent())->getOutermostLoop();
+
+      // TODO: In general, reasoning about monotonicity of a subscript from the
+      // base pointer would lead incorrect result. Probably we need to check
+      // the loops associated with this subscript are disjoint from those
+      // associated with the other subscripts. The validation would be
+      // something like:
+      //
+      //   LoopsI = collectCommonLoops(SrcSubscripts[I])
+      //   LoopsOthers = collectCommonLoops(SrcSCEV - SrcSubscripts[I])
+      //   CanUsePtr = (LoopsI intersect LoopsOthers) is empty.
+      //
+      MonotonicityType SrcMonotonicity =
+          SCEVSignedMonotonicityChecker::checkMonotonicity(
+              SE, SrcSubscripts[I], OutermostLoop, SrcPtr);
+      if (SrcMonotonicity == MonotonicityType::Unknown)
+        return false;
+
       if (!isKnownNonNegative(SrcSubscripts[I], SrcPtr))
         return false;
 
       if (!isKnownLessThan(SrcSubscripts[I], Sizes[I - 1]))
         return false;
 
+      MonotonicityType DstMonotonicity =
+          SCEVSignedMonotonicityChecker::checkMonotonicity(
+              SE, DstSubscripts[I], OutermostLoop, DstPtr);
+      if (DstMonotonicity == MonotonicityType::Unknown)
+        return false;
+
       if (!isKnownNonNegative(DstSubscripts[I], DstPtr))
         return false;
 
@@ -3679,11 +3998,23 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
   Pair[0].Src = SrcSCEV;
   Pair[0].Dst = DstSCEV;
 
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
+  if (SCEVSignedMonotonicityChecker::checkMonotonicity(
+          SE, SrcEv, OutermostLoop, SrcPtr) == MonotonicityType::Unknown)
+    return std::make_unique<Dependence>(Src, Dst,
+                                        SCEVUnionPredicate(Assume, *SE));
+  if (SCEVSignedMonotonicityChecker::checkMonotonicity(
+          SE, DstEv, OutermostLoop, DstPtr) == MonotonicityType::Unknown)
+    return std::make_unique<Dependence>(Src, Dst,
+                                        SCEVUnionPredicate(Assume, *SE));
+
   if (Delinearize) {
     if (tryDelinearize(Src, Dst, Pair)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
       Pairs = Pair.size();
     }
+    // TODO: Check that the original offsets are monotonic when delinearization
+    // fails.
   }
 
   for (unsigned P = 0; P < Pairs; ++P) {
diff --git a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
index e0def901d1759..c96f0dfa1f5d5 100644
--- a/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/Banerjee.ll
@@ -28,7 +28,7 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; CHECK-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ;
 ; NORMALIZE-LABEL: 'banerjee0'
 ; NORMALIZE-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -42,7 +42,7 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; NORMALIZE-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; NORMALIZE-NEXT:    da analyze - confused!
 ; NORMALIZE-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
-; NORMALIZE-NEXT:    da analyze - none!
+; NORMALIZE-NEXT:    da analyze - confused!
 ;
 ; DELIN-LABEL: 'banerjee0'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -56,7 +56,7 @@ define void @banerjee0(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - confused!
 ;
 entry:
   br label %for.cond1.preheader
@@ -810,7 +810,7 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; CHECK-NEXT:  Src: %1 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ;
 ; NORMALIZE-LABEL: 'banerjee9'
 ; NORMALIZE-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -824,7 +824,7 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; NORMALIZE-NEXT:  Src: %1 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; NORMALIZE-NEXT:    da analyze - confused!
 ; NORMALIZE-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; NORMALIZE-NEXT:    da analyze - none!
+; NORMALIZE-NEXT:    da analyze - confused!
 ;
 ; DELIN-LABEL: 'banerjee9'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -838,7 +838,7 @@ define void @banerjee9(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: %1 = load i64, ptr %arrayidx7, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - confused!
 ;
 entry:
   br label %for.cond1.preheader
@@ -896,7 +896,7 @@ define void @banerjee10(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; CHECK-NEXT:  Src: %1 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ;
 ; NORMALIZE-LABEL: 'banerjee10'
 ; NORMALIZE-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -910,7 +910,7 @@ define void @banerjee10(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; NORMALIZE-NEXT:  Src: %1 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; NORMALIZE-NEXT:    da analyze - confused!
 ; NORMALIZE-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; NORMALIZE-NEXT:    da analyze - none!
+; NORMALIZE-NEXT:    da analyze - confused!
 ;
 ; DELIN-LABEL: 'banerjee10'
 ; DELIN-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -924,7 +924,7 @@ define void @banerjee10(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; DELIN-NEXT:  Src: %1 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
 ; DELIN-NEXT:    da analyze - confused!
 ; DELIN-NEXT:  Src: store i64 %1, ptr %B.addr.11, align 8 --> Dst: store i64 %1, ptr %B.addr.11, align 8
-; DELIN-NEXT:    da analyze - none!
+; DELIN-NEXT:    da analyze - confused!
 ;
 entry:
   br label %for.cond1.preheader
@@ -981,7 +981,7 @@ define void @banerjee11(ptr %A, ptr %B, i64 %m, i64 %n) nounwind uwtable ssp {
 ; CHECK-NEXT:  Src: %0 = load i64, ptr %arrayidx6, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
 ; CHECK-NEXT:    da analyze - confused!
 ; CHECK-NEXT:  Src: store i64 %0, ptr %B.addr.11, align 8 --> Dst: store i64 %0, ptr %B.addr.11, align 8
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - confused!
 ;
 ; NORMALIZE-LABEL: 'banerjee11'
 ; NORMALIZE-NEXT:  Src: store i64 0, ptr %arrayidx, align 8 --> Dst: store i64 0, ptr %arrayidx, align 8
@@ -995,7 +995,7 @@ define void @banerjee11(ptr %A, ptr %B...
[truncated]

kasuga-fj · 2025-09-01T17:39:37Z

llvm/test/Analysis/DependenceAnalysis/Propagating.ll

(Totally not related to this PR)
When I was debugging, I found that no "constraint propagation" happens in this test...

kasuga-fj · 2025-09-08T10:11:11Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  MonotonicityType visitAddExpr(const SCEVAddExpr *Expr) {
+    return checkInvarianceOnly(Expr);
+  }
+  MonotonicityType visitMulExpr(const SCEVMulExpr *Expr) {
+    return checkInvarianceOnly(Expr);
+  }


Removed support for SCEVAddExpr and SCEVMulExpr in e83fdb8. The rationale is as follows:

This logic can conflict with inference based GEP attributes, which is the reused logic from LoopAccessAnalysis. For instance, if the given SCEV is %m * {0,+,%n}<%loop>, I think this reasoning is not valid (but if it is {0,+,(%m * %n)}<%loop>, it seems valid). Additionally, I found leveraging GEP attributes is beneficial -- removing this logic caused some tests to fail, which seems non-trivial at a glance. At least, as a first step, I feel it's reasonable to remove these handlings for add/mul and leave the reasoning to GEP attributes.

Even without the GEP based inference, I think handling SCEVAddExpr and SCEVMulExpr is non-trivial. For example, in the case of {0,+,1}<%loop> + {0,+,-1}<%loop> (apparently, this could happen), it is unclear whether this should be considered Monotonic or Invariant.

Removing the support caused some test regressions, but all failures were of the form like [* *] becoming confused. I believe the semantic impact of these changes is essentially negligible.

kasuga-fj · 2025-09-10T15:15:03Z

llvm/lib/Analysis/DependenceAnalysis.cpp

  // to the dependency checks.
  if (!DisableDelinearizationChecks)
    for (size_t I = 1; I < Size; ++I) {
+      const Loop *OutermostLoop =


Found a more serious issue (ref: #157859). To keep this PR simpler, I now think it is better to ignore the delinearized subscripts at the moment, and check monotonicity only when delinearization fails...

kasuga-fj · 2025-10-09T11:12:41Z

Since things were getting messy, I decided to split the PR. The first one is #162280

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from #154527.

* [flang] Fix standalone build regression from llvm#161179 (llvm#164309) Fix incorrect linking and dependencies introduced in llvm#161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]> * [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335) * [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes llvm#160501 Solves SC1-6454 * [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262) This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed. * [Hexagon] Add REQUIRES: asserts to test This test uses -debug-only, so needs an assertion-enabled build. * [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499) If we already have a dup(x) as part of the DAG along with a scalar_to_vec(x), we can re-use the result of the dup to the scalar_to_vec(x). * [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369) * Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850) This reapplication fixes the use after free caused by not properly updating the bucket list in one case. Original commit message: Instead of just calling the single element `erase` on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------------------------------------------------- std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0 457 ns 459 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32 995 ns 626 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024 18196 ns 7995 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192 124722 ns 70125 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0 456 ns 461 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32 1183 ns 769 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024 27827 ns 18614 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192 266681 ns 226107 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0 455 ns 462 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32 996 ns 659 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024 15963 ns 8108 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192 136493 ns 71848 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0 454 ns 455 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32 985 ns 703 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024 16277 ns 9085 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192 125736 ns 82710 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0 457 ns 454 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32 1091 ns 646 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024 17784 ns 7664 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192 127098 ns 72806 ns ``` This reverts commit acc3a62. * [TableGen] List the indices of sub-operands (llvm#163723) Some instances of the `Operand` class used in Tablegen instruction definitions expand to a cluster of multiple operands at the MC layer, such as complex addressing modes involving base + offset + shift, or clusters of operands describing conditional Arm instructions or predicated MVE instructions. There's currently no convenient way for C++ code to know the offset of one of those sub-operands from the start of the cluster: instead it just hard-codes magic numbers like `index+2`, which is hard to read and fragile. This patch adds an extra piece of output to `InstrInfoEmitter` to define those instruction offsets, based on the name of the `Operand` class instance in Tablegen, and the names assigned to the sub-operands in the `MIOperandInfo` field. For example, if target Foo were to define def Bar : Operand { let MIOperandInfo = (ops GPR:$first, i32imm:$second); // ... } then the new constants would be `Foo::SUBOP_Bar_first` and `Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively. As an example, I've converted some magic numbers related to the MVE predication operand types (`vpred_n` and its superset `vpred_r`) to use the new named constants in place of the integer literals they previously used. This is more verbose, but also clearer, because it explains why the integer is chosen instead of what its value is. * [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176) While debugging the tests for llvm#155000 I found it helpful to have both sides of the simulated gdb-rsp traffic rather than just the responses so I've extended the packetLog in MockGDBServerResponder to record traffic in both directions. Tests have been updated accordingly * [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841) ## Description Adds a new canonicalizer that folds `vector.from_elements(vector.transpose))` => `vector.from_elements`. This canonicalization reorders the input elements for `vector.from_elements`, adjusts the output shape to match the effect of the transpose op and eliminating its need. ## Testing Added a 2D vector lit test that verifies the working of the rewrite. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]> * [DA] Add initial support for monotonicity check (llvm#162280) The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527. * [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287) * [VPlan] Improve code using m_APInt (NFC) (llvm#161683) * [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378) Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227 * [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972. * [LV][NFC] Remove undef from phi incoming values (llvm#163762) Split off from PR llvm#163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github. * [DA] Add option to enable specific dependence test only (llvm#164245) PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV routines in DA. This PR replaces that option with a more fine-grained one that allows to select other than SIV routines as well. This option is useful for regression testing of individual DA routines. This patch also reorganizes regression tests that use `da-run-siv-routines-only`. * [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ``` * [BOLT] Check entry point address is not in constant island (llvm#163418) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user. * [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222) In both verbose and non-verbose mode we will now use the `llvm::dwarf::LanguageDescription` to turn the version into a human readable string. In verbose mode we also display the raw version code (similar to how we display addresses in verbose mode). To make the version code and prettified easier to distinguish, we print the prettified name in colour (if available), which is consistent with how `DW_AT_language` is printed in colour. Before: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112) ``` After: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112 C11) ``` --------- Signed-off-by: Michał Górny <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Michał Górny <[email protected]> Co-authored-by: Stanislav Mekhanoshin <[email protected]> Co-authored-by: Pierre van Houtryve <[email protected]> Co-authored-by: Nikita Popov <[email protected]> Co-authored-by: David Green <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]> Co-authored-by: Nikolas Klauser <[email protected]> Co-authored-by: Simon Tatham <[email protected]> Co-authored-by: Daniel Sanders <[email protected]> Co-authored-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Ryotaro Kasuga <[email protected]> Co-authored-by: Ramkumar Ramachandra <[email protected]> Co-authored-by: Antonio Frighetto <[email protected]> Co-authored-by: David Sherwood <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: Asher Dobrescu <[email protected]> Co-authored-by: Michael Buch <[email protected]>

[DA] Check monotonicity for subscripts

a0d05ec

kasuga-fj requested a review from Meinersbur August 20, 2025 13:07

Meinersbur reviewed Aug 22, 2025

View reviewed changes

Address review comments

75639b9

Fix format

5063efc

kasuga-fj force-pushed the da-monotonic branch from ac9bbc6 to 5063efc Compare August 22, 2025 16:38

This was referenced Aug 24, 2025

[DependenceAnalysis] Fix incorrect analysis of wrapping AddRec expressions #154982

Open

[DA] Does not account for wrapping accesses #149501

Open

sushgokh reviewed Aug 25, 2025

View reviewed changes

Meinersbur reviewed Aug 25, 2025

View reviewed changes

kasuga-fj added 2 commits August 27, 2025 07:44

Merge branch 'main' into feature/da-monotonic

4dad091

Revise impl and add test

8986cd4

remove minmax calculator

dcb3b19

Merge branch 'main' into feature/da-monotonic

e93be2e

kasuga-fj commented Sep 1, 2025

View reviewed changes

kasuga-fj marked this pull request as ready for review September 1, 2025 13:07

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Sep 1, 2025

kasuga-fj commented Sep 1, 2025

View reviewed changes

kasuga-fj requested a review from Meinersbur September 2, 2025 02:28

kasuga-fj added 4 commits September 2, 2025 10:41

Merge branch 'main' into feature/da-monotonic

65ccc61

Merge branch 'main' into da-monotonic

d113571

fix logic for SCEVNArayExpr

137c193

adjust comments

f9b2a4e

kasuga-fj force-pushed the da-monotonic branch from cb6bdf5 to f9b2a4e Compare September 6, 2025 20:14

kasuga-fj added 2 commits September 8, 2025 09:35

remove handlings for add/mul

e83fdb8

rename function name and add comments

a098743

kasuga-fj commented Sep 8, 2025

View reviewed changes

kasuga-fj added 3 commits September 8, 2025 13:17

improve test mechanism

b16cc64

fix handling for zero-extened value

98a5de4

delete SignedMonotonicityType::NoSignedWrap

8c4f97b

kasuga-fj force-pushed the da-monotonic branch from 85dd4ae to 8c4f97b Compare September 8, 2025 19:26

kasuga-fj commented Sep 10, 2025

View reviewed changes

kasuga-fj added 4 commits September 11, 2025 02:23

Merge branch 'main' into da-monotonic

c563a8f

format

55df30a

don't perform monotonicity check for delinearized subscripts

85129c6

add test cases to demonstrate incorrect inference based on nusw on GEP

e15715e

kasuga-fj mentioned this pull request Sep 19, 2025

[DependenceAnalysis] Extending SIV to handle fusable loops #128782

Merged

This was referenced Oct 6, 2025

[DA] Incorrect DA results likely due to mishandling of signed/unsigned comparisons #159846

Open

[DA] Add initial support for monotonicity check #162280

Merged

	// Returns true if Expression is loop invariant in LoopNest.
	bool DependenceInfo::isLoopInvariant(const SCEV *Expression,
	const Loop *LoopNest) const {
	// Unlike ScalarEvolution::isLoopInvariant() we consider an access outside of
	// any loop as invariant, because we only consier expression evaluation at a
	// specific position (where the array access takes place), and not across the
	// entire function.
	if (!LoopNest)
	return true;

	// If the expression is invariant in the outermost loop of the loop nest, it
	// is invariant anywhere in the loop nest.
	return SE->isLoopInvariant(Expression, LoopNest->getOutermostLoop());
	}

	; The value of step reccurence is not invariant with respect to the outer most
	; loop (the i-loop).
	;
	; offset_i = 0;
	; for (int i = 0; i < 100; i++) {
	; for (int j = 0; j < 100; j++)
	; a[offset_i + j] = 0;
	; offset_i += (i % 2 == 0) ? 0 : 3;
	; }
	;
	define void @step_is_variant(ptr %a) {
	; CHECK-LABEL: 'step_is_variant'
	; CHECK: Failed to prove monotonicity for: %offset.i
	; CHECK: Failed to prove monotonicity for: {%offset.i,+,1}<nuw><nsw><%loop.j>
	; CHECK: Monotonicity: Unknown expr: {%offset.i,+,1}<nuw><nsw><%loop.j>
	;
	entry:
	br label %loop.i.header

	loop.i.header:
	%i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ]
	%offset.i = phi i64 [ 0, %entry ], [ %offset.i.next, %loop.i.latch ]
	%step.i.0 = phi i64 [ 0, %entry ], [ %step.i.1, %loop.i.latch ]
	%step.i.1 = phi i64 [ 3, %entry ], [ %step.i.0, %loop.i.latch ]
	br label %loop.j

	loop.j:
	%j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j ]
	%offset = add nsw i64 %offset.i, %j
	%idx = getelementptr inbounds i8, ptr %a, i64 %offset
	store i8 0, ptr %idx
	%j.inc = add nsw i64 %j, 1
	%exitcond.j = icmp eq i64 %j.inc, 100
	br i1 %exitcond.j, label %loop.i.latch, label %loop.j

	loop.i.latch:
	%i.inc = add nsw i64 %i, 1
	%offset.i.next = add nsw i64 %offset.i, %step.i.0
	%exitcond.i = icmp eq i64 %i.inc, 100
	br i1 %exitcond.i, label %exit, label %loop.i.header

	exit:
	ret void
	}

[DA] Check monotonicity for subscripts #154527

Are you sure you want to change the base?

[DA] Check monotonicity for subscripts #154527

Uh oh!

Conversation

kasuga-fj commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Aug 20, 2025

Uh oh!

Meinersbur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Aug 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh commented Aug 25, 2025

Uh oh!

kasuga-fj commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kasuga-fj commented Aug 20, 2025 •

edited

Loading

Meinersbur left a comment •

edited

Loading

github-actions bot commented Aug 22, 2025 •

edited

Loading

kasuga-fj Aug 25, 2025 •

edited

Loading

kasuga-fj Aug 25, 2025 •

edited

Loading

kasuga-fj commented Aug 25, 2025 •

edited

Loading

Meinersbur Aug 25, 2025 •

edited

Loading

sebpop commented Aug 25, 2025 •

edited

Loading

kasuga-fj commented Aug 26, 2025 •

edited

Loading

kasuga-fj commented Aug 26, 2025 •

edited

Loading

kasuga-fj commented Aug 27, 2025 •

edited

Loading

Meinersbur commented Aug 28, 2025 •

edited

Loading