[LoopInterchange] Update the direction of undistributed loop to EQ #78951

ShivaChen · 2024-01-22T09:09:50Z

This commit enables loop-interchange for the case in #71519.
With the loop-interchange, the case can be vectorized.

  for (int nl = 0; nl < 10000000/256; nl++) // Level 1
    for (int i = 0; i < 256; ++i)           // Level 2
      for (int j = 1; j < 256; j++)         // Level 3
        aa[j][i] = aa[j - 1][i] + bb[j][i];

The case can't be interchanged without normalizaion.
normalizaion didn't occur because the direction of level 1 loop dependence between aa[j][i] and aa[j - 1][i] is default value '*'.

By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass can determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable the normalization.

llvmbot · 2024-01-22T09:10:20Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: None (ShivaChen)

Changes

It could be an alternative way of #77719 to enable interchange for the following case.

for (int nl = 0; nl &lt; 10000000/256; nl++) // Level 1
  for (int i = 0; i &lt; 256; ++i)                      // Level 2
    for (int j = 1; j &lt; 256; j++)                    // Level 3
      aa[j][i] = aa[j - 1][i] + bb[j][i];

isDirectionNegative() is the function to detect > occur and enable normalize ().
It bail out due to the DV for Level 1 is a scalar instead of =.
This commit increments the check to next level as = for scalar direction.

Full diff: https://github.com/llvm/llvm-project/pull/78951.diff

2 Files Affected:

(modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+1-1)
(added) llvm/test/Transforms/LoopInterchange/pr71519.ll (+64)

diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1bce9aae09bb26..7d25049e03692f 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -290,7 +290,7 @@ FullDependence::FullDependence(Instruction *Source, Instruction *Destination,
 bool FullDependence::isDirectionNegative() const {
   for (unsigned Level = 1; Level <= Levels; ++Level) {
     unsigned char Direction = DV[Level - 1].Direction;
-    if (Direction == Dependence::DVEntry::EQ)
+    if (Direction == Dependence::DVEntry::EQ || isScalar(Level))
       continue;
     if (Direction == Dependence::DVEntry::GT ||
         Direction == Dependence::DVEntry::GE)
diff --git a/llvm/test/Transforms/LoopInterchange/pr71519.ll b/llvm/test/Transforms/LoopInterchange/pr71519.ll
new file mode 100644
index 00000000000000..c364d89873a5dc
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/pr71519.ll
@@ -0,0 +1,64 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
+; RUN:     -S -debug 2>&1 | FileCheck %s
+
+@aa = global [256 x [256 x float]] zeroinitializer, align 64
+@bb = global [256 x [256 x float]] zeroinitializer, align 64
+
+;;  for (int nl = 0; nl < 10000000/256; nl++)
+;;    for (int i = 0; i < 256; ++i)
+;;      for (int j = 1; j < 256; j++)
+;;        aa[j][i] = aa[j - 1][i] + bb[j][i];
+
+; CHECK: Before normalizing negative direction vectors:
+; CHECK: consistent anti [S 0 -1]!
+; CHECK: After normalizing negative direction vectors:
+; CHECK: consistent flow [S 0 1]!
+; CHECK: Negative dependence vector normalized.
+; CHECK: Found flow dependency between Src and Dst
+; CHECK:  Src:  %1 = load float, ptr %arrayidx10, align 4
+; CHECK:  Dst:  store float %add, ptr %arrayidx18, align 4
+; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1
+; CHECK: Loops interchanged.
+
+define float @s231() {
+entry:
+  br label %for.cond1.preheader
+
+; Loop:
+for.cond1.preheader:                              ; preds = %entry, %for.cond.cleanup3
+  %nl.036 = phi i32 [ 0, %entry ], [ %inc23, %for.cond.cleanup3 ]
+  br label %for.cond5.preheader
+
+for.cond.cleanup3:                                ; preds = %for.cond.cleanup7
+  %inc23 = add nuw nsw i32 %nl.036, 1
+  %exitcond41 = icmp ne i32 %inc23, 39062
+  br i1 %exitcond41, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:                                ; preds = %for.body8
+  %indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
+  %exitcond40 = icmp ne i64 %indvars.iv.next39, 256
+  br i1 %exitcond40, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:                                        ; preds = %for.cond5.preheader, %for.body8
+  %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+  %0 = add nsw i64 %indvars.iv, -1
+  %arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %0, i64 %indvars.iv38
+  %1 = load float, ptr %arrayidx10, align 4
+  %arrayidx14 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+  %2 = load float, ptr %arrayidx14, align 4
+  %add = fadd fast float %2, %1
+  %arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+  store float %add, ptr %arrayidx18, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp ne i64 %indvars.iv.next, 256
+  br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond5.preheader:                              ; preds = %for.cond1.preheader, %for.cond.cleanup7
+  %indvars.iv38 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next39, %for.cond.cleanup7 ]
+  br label %for.body8
+
+; Exit blocks
+for.cond.cleanup:                                 ; preds = %for.cond.cleanup3
+  ret float undef
+}

Meinersbur · 2024-01-22T20:54:23Z

I would need a more convincing argument than "it makes my case work".

Suppose the scalar dependency is an implicit < dependencies.

for (int i = 0; i < n; ++i)
  a=a/i;

This example would not be reversed because there is no inner level but consider this to be the "either way goes" case.
Reversing the scalar dependence would make a >-dependence out of it, which is definitely wrong because it would make it depend on the future (which normalize() is supposed to normalize).

Possibly, normalize() would also need to reverse the direction of scalar dependencies, but I would need some more thoughts on why

bmahjour

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.

I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

ShivaChen · 2024-01-24T06:34:36Z

I would need a more convincing argument than "it makes my case work".

Suppose the scalar dependency is an implicit < dependencies.
for (int i = 0; i < n; ++i)
  a=a/i;
This example would not be reversed because there is no inner level but consider this to be the "either way goes" case. Reversing the scalar dependence would make a >-dependence out of it, which is definitely wrong because it would make it depend on the future (which normalize() is supposed to normalize).

Possibly, normalize() would also need to reverse the direction of scalar dependencies, but I would need some more thoughts on why

Thanks for the case. That make sense to me.

ShivaChen · 2024-01-24T06:38:29Z

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.

I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

bmahjour · 2024-01-24T18:35:52Z

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

ShivaChen · 2024-01-25T04:29:45Z

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL.
Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.

Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.

Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

bmahjour · 2024-01-25T21:17:28Z

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL. Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.

Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.

Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

What would the semantics of DVEntry::NONE be? It cannot be treated less pessimistically than ALL because the same memory locations get modified in every iteration of the outer loop! It's important for DenepdenceAnalysis to remain conservatively correct in the results in produces, regardless of the consumer of that analysis.

IMHO, how loop interchange deals with the "non-contributing" outer loops is an implementation issue inside loop-interchange itself. One solution might be to have its own normalize function, so you construct the dependence matrix, prune it (eg. by replacing the * with = in that entry), then normalize it again.

github-actions · 2024-01-26T09:56:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ShivaChen · 2024-01-26T10:07:22Z

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL. Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.
Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.
Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

What would the semantics of DVEntry::NONE be? It cannot be treated less pessimistically than ALL because the same memory locations get modified in every iteration of the outer loop! It's important for DenepdenceAnalysis to remain conservatively correct in the results in produces, regardless of the consumer of that analysis.

IMHO, how loop interchange deals with the "non-contributing" outer loops is an implementation issue inside loop-interchange itself. One solution might be to have its own normalize function, so you construct the dependence matrix, prune it (eg. by replacing the * with = in that entry), then normalize it again.

Have created normalize function in interchange. Thanks for the advice. :-)

The motivation is to introduce the custom functions for LoopInterchange.

This commit enables loop-interchange for the case in llvm#71519. With the loop-interchange, the case can be vectorized. for (int nl = 0; nl < 10000000/256; nl++) // Level 1 for (int i = 0; i < 256; ++i) // Level 2 for (int j = 1; j < 256; j++) // Level 3 aa[j][i] = aa[j - 1][i] + bb[j][i]; The case can't be interchanged without normalizaion. normalizaion didn't occur because the direction of level 1 loop dependence between aa[j][i] and aa[j - 1][i] is default value '*'. By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass and determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable the normalization.

ShivaChen · 2024-01-26T10:19:10Z

Fix clang-format and rebase.

madhur13490 · 2024-09-17T15:02:54Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+    }
+}
+
+static bool normalize(std::vector<Dependence::DVEntry> &DV, ScalarEvolution *SE,


I am learning this transformation and came across your patch.

This function is sort of duplicate of FullDependece::normalize() but does not update the distance vector. Do you think we should avoid duplication? and updating distance vector is not necessary?

madhur13490 · 2024-09-17T15:03:51Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

+static void dumpDirection(raw_ostream &OS,
+                          std::vector<Dependence::DVEntry> &DV) {
+  OS << " [";
+  for (unsigned II = 1; II <= DV.size(); ++II) {


I feel that we can start from 0 and avoid subtracting 1 each time we access DV but I understand that this is practice in DA.

ShivaChen requested review from Meinersbur and bmahjour January 22, 2024 09:09

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jan 22, 2024

ShivaChen requested a review from CongzheUalberta January 23, 2024 03:49

bmahjour requested changes Jan 23, 2024

View reviewed changes

ShivaChen force-pushed the loop-interchange-pr71519 branch from 33e5c6f to 05a8a81 Compare January 26, 2024 09:53

ShivaChen changed the title ~~[LoopInterchange] Increment isDirectionNegative check to next level for scalar direction~~ [LoopInterchange] Update the direction of undistributed loop to EQ Jan 26, 2024

ShivaChen added 4 commits January 26, 2024 10:17

Add pr71519.ll

4924203

[LoopInterchange] Add normalize function from DependenceInfo

cdc9378

The motivation is to introduce the custom functions for LoopInterchange.

Fix clang-format

1e6fd79

ShivaChen force-pushed the loop-interchange-pr71519 branch from 05a8a81 to 1e6fd79 Compare January 26, 2024 10:18

ShivaChen added 2 commits January 29, 2024 03:22

Add missing LLVM_DEBUG

514bd66

Replace Levels parameter by DV.size()

661c26f

ShivaChen mentioned this pull request Feb 6, 2024

[LoopInterchange] Fix depends() check parameters #77719

Closed

madhur13490 reviewed Sep 17, 2024

View reviewed changes

sebpop mentioned this pull request Sep 18, 2024

[LoopInterchange] vectorisation opportunity (tsvc, s231) #71519

Open

ShivaChen closed this May 5, 2025

[LoopInterchange] Update the direction of undistributed loop to EQ #78951

[LoopInterchange] Update the direction of undistributed loop to EQ #78951

Uh oh!

Conversation

ShivaChen commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Meinersbur commented Jan 22, 2024

Uh oh!

bmahjour left a comment

Choose a reason for hiding this comment

Uh oh!

ShivaChen commented Jan 24, 2024

Uh oh!

ShivaChen commented Jan 24, 2024

Uh oh!

bmahjour commented Jan 24, 2024

Uh oh!

ShivaChen commented Jan 25, 2024

Uh oh!

bmahjour commented Jan 25, 2024

Uh oh!

github-actions bot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShivaChen commented Jan 26, 2024

Uh oh!

ShivaChen commented Jan 26, 2024

Uh oh!

madhur13490 Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

madhur13490 Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ShivaChen commented Jan 22, 2024 •

edited

Loading

llvmbot commented Jan 22, 2024 •

edited

Loading

github-actions bot commented Jan 26, 2024 •

edited

Loading