Skip to content

Conversation

@ShivaChen
Copy link
Collaborator

@ShivaChen ShivaChen commented Jan 22, 2024

This commit enables loop-interchange for the case in #71519.
With the loop-interchange, the case can be vectorized.

  for (int nl = 0; nl < 10000000/256; nl++) // Level 1
    for (int i = 0; i < 256; ++i)           // Level 2
      for (int j = 1; j < 256; j++)         // Level 3
        aa[j][i] = aa[j - 1][i] + bb[j][i];

The case can't be interchanged without normalizaion.
normalizaion didn't occur because the direction of level 1 loop dependence between aa[j][i] and aa[j - 1][i] is default value '*'.

By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass can determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable the normalization.

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jan 22, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: None (ShivaChen)

Changes

It could be an alternative way of #77719 to enable interchange for the following case.

for (int nl = 0; nl &lt; 10000000/256; nl++) // Level 1
  for (int i = 0; i &lt; 256; ++i)                      // Level 2
    for (int j = 1; j &lt; 256; j++)                    // Level 3
      aa[j][i] = aa[j - 1][i] + bb[j][i];

isDirectionNegative() is the function to detect &gt; occur and enable normalize ().
It bail out due to the DV for Level 1 is a scalar instead of =.
This commit increments the check to next level as = for scalar direction.


Full diff: https://github.com/llvm/llvm-project/pull/78951.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+1-1)
  • (added) llvm/test/Transforms/LoopInterchange/pr71519.ll (+64)
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1bce9aae09bb26..7d25049e03692f 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -290,7 +290,7 @@ FullDependence::FullDependence(Instruction *Source, Instruction *Destination,
 bool FullDependence::isDirectionNegative() const {
   for (unsigned Level = 1; Level <= Levels; ++Level) {
     unsigned char Direction = DV[Level - 1].Direction;
-    if (Direction == Dependence::DVEntry::EQ)
+    if (Direction == Dependence::DVEntry::EQ || isScalar(Level))
       continue;
     if (Direction == Dependence::DVEntry::GT ||
         Direction == Dependence::DVEntry::GE)
diff --git a/llvm/test/Transforms/LoopInterchange/pr71519.ll b/llvm/test/Transforms/LoopInterchange/pr71519.ll
new file mode 100644
index 00000000000000..c364d89873a5dc
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/pr71519.ll
@@ -0,0 +1,64 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
+; RUN:     -S -debug 2>&1 | FileCheck %s
+
+@aa = global [256 x [256 x float]] zeroinitializer, align 64
+@bb = global [256 x [256 x float]] zeroinitializer, align 64
+
+;;  for (int nl = 0; nl < 10000000/256; nl++)
+;;    for (int i = 0; i < 256; ++i)
+;;      for (int j = 1; j < 256; j++)
+;;        aa[j][i] = aa[j - 1][i] + bb[j][i];
+
+; CHECK: Before normalizing negative direction vectors:
+; CHECK: consistent anti [S 0 -1]!
+; CHECK: After normalizing negative direction vectors:
+; CHECK: consistent flow [S 0 1]!
+; CHECK: Negative dependence vector normalized.
+; CHECK: Found flow dependency between Src and Dst
+; CHECK:  Src:  %1 = load float, ptr %arrayidx10, align 4
+; CHECK:  Dst:  store float %add, ptr %arrayidx18, align 4
+; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1
+; CHECK: Loops interchanged.
+
+define float @s231() {
+entry:
+  br label %for.cond1.preheader
+
+; Loop:
+for.cond1.preheader:                              ; preds = %entry, %for.cond.cleanup3
+  %nl.036 = phi i32 [ 0, %entry ], [ %inc23, %for.cond.cleanup3 ]
+  br label %for.cond5.preheader
+
+for.cond.cleanup3:                                ; preds = %for.cond.cleanup7
+  %inc23 = add nuw nsw i32 %nl.036, 1
+  %exitcond41 = icmp ne i32 %inc23, 39062
+  br i1 %exitcond41, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:                                ; preds = %for.body8
+  %indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
+  %exitcond40 = icmp ne i64 %indvars.iv.next39, 256
+  br i1 %exitcond40, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:                                        ; preds = %for.cond5.preheader, %for.body8
+  %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+  %0 = add nsw i64 %indvars.iv, -1
+  %arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %0, i64 %indvars.iv38
+  %1 = load float, ptr %arrayidx10, align 4
+  %arrayidx14 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+  %2 = load float, ptr %arrayidx14, align 4
+  %add = fadd fast float %2, %1
+  %arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+  store float %add, ptr %arrayidx18, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp ne i64 %indvars.iv.next, 256
+  br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond5.preheader:                              ; preds = %for.cond1.preheader, %for.cond.cleanup7
+  %indvars.iv38 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next39, %for.cond.cleanup7 ]
+  br label %for.body8
+
+; Exit blocks
+for.cond.cleanup:                                 ; preds = %for.cond.cleanup3
+  ret float undef
+}

@Meinersbur
Copy link
Member

I would need a more convincing argument than "it makes my case work".

Suppose the scalar dependency is an implicit < dependencies.

for (int i = 0; i < n; ++i)
  a=a/i;

This example would not be reversed because there is no inner level but consider this to be the "either way goes" case.
Reversing the scalar dependence would make a >-dependence out of it, which is definitely wrong because it would make it depend on the future (which normalize() is supposed to normalize).

Possibly, normalize() would also need to reverse the direction of scalar dependencies, but I would need some more thoughts on why

Copy link
Collaborator

@bmahjour bmahjour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.

I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

@ShivaChen
Copy link
Collaborator Author

I would need a more convincing argument than "it makes my case work".

Suppose the scalar dependency is an implicit < dependencies.

for (int i = 0; i < n; ++i)
  a=a/i;

This example would not be reversed because there is no inner level but consider this to be the "either way goes" case. Reversing the scalar dependence would make a >-dependence out of it, which is definitely wrong because it would make it depend on the future (which normalize() is supposed to normalize).

Possibly, normalize() would also need to reverse the direction of scalar dependencies, but I would need some more thoughts on why

Thanks for the case. That make sense to me.

@ShivaChen
Copy link
Collaborator Author

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.

I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

@bmahjour
Copy link
Collaborator

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

@ShivaChen
Copy link
Collaborator Author

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL.
Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.

Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.

Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

@bmahjour
Copy link
Collaborator

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL. Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.

Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.

Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

What would the semantics of DVEntry::NONE be? It cannot be treated less pessimistically than ALL because the same memory locations get modified in every iteration of the outer loop! It's important for DenepdenceAnalysis to remain conservatively correct in the results in produces, regardless of the consumer of that analysis.

IMHO, how loop interchange deals with the "non-contributing" outer loops is an implementation issue inside loop-interchange itself. One solution might be to have its own normalize function, so you construct the dependence matrix, prune it (eg. by replacing the * with = in that entry), then normalize it again.

@ShivaChen ShivaChen force-pushed the loop-interchange-pr71519 branch from 33e5c6f to 05a8a81 Compare January 26, 2024 09:53
@github-actions
Copy link

github-actions bot commented Jan 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ShivaChen ShivaChen changed the title [LoopInterchange] Increment isDirectionNegative check to next level for scalar direction [LoopInterchange] Update the direction of undistributed loop to EQ Jan 26, 2024
@ShivaChen
Copy link
Collaborator Author

Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.

Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected?

I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange.

I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL. Currently, normalize() is part of DependenceAnalysis which allow the function to update private DV variable.
Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis.
Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks.

What would the semantics of DVEntry::NONE be? It cannot be treated less pessimistically than ALL because the same memory locations get modified in every iteration of the outer loop! It's important for DenepdenceAnalysis to remain conservatively correct in the results in produces, regardless of the consumer of that analysis.

IMHO, how loop interchange deals with the "non-contributing" outer loops is an implementation issue inside loop-interchange itself. One solution might be to have its own normalize function, so you construct the dependence matrix, prune it (eg. by replacing the * with = in that entry), then normalize it again.

Have created normalize function in interchange. Thanks for the advice. :-)

The motivation is to introduce the custom functions for LoopInterchange.
This commit enables loop-interchange for the case in llvm#71519.
With the loop-interchange, the case can be vectorized.

  for (int nl = 0; nl < 10000000/256; nl++) // Level 1
    for (int i = 0; i < 256; ++i)           // Level 2
      for (int j = 1; j < 256; j++)         // Level 3
        aa[j][i] = aa[j - 1][i] + bb[j][i];

The case can't be interchanged without normalizaion.
normalizaion didn't occur because the direction of level 1 loop dependence
between aa[j][i] and aa[j - 1][i] is default value '*'.

By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass
and determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and
aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable
the normalization.
@ShivaChen ShivaChen force-pushed the loop-interchange-pr71519 branch from 05a8a81 to 1e6fd79 Compare January 26, 2024 10:18
@ShivaChen
Copy link
Collaborator Author

Fix clang-format and rebase.

}
}

static bool normalize(std::vector<Dependence::DVEntry> &DV, ScalarEvolution *SE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am learning this transformation and came across your patch.

This function is sort of duplicate of FullDependece::normalize() but does not update the distance vector. Do you think we should avoid duplication? and updating distance vector is not necessary?

static void dumpDirection(raw_ostream &OS,
std::vector<Dependence::DVEntry> &DV) {
OS << " [";
for (unsigned II = 1; II <= DV.size(); ++II) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we can start from 0 and avoid subtracting 1 each time we access DV but I understand that this is practice in DA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants