Skip to content

Conversation

@kasuga-fj
Copy link
Contributor

Profitability decisions with CacheCost sometimes gave strange results when the cache line size was zero. This patch prevents CacheCost from being used when the cache line size is zero, because it doesn't make sense. This patch also prevents the CacheCost from being calculated in this case, which may reduce compilation time.

As I tried in llvm-test-suite, the following loops are no longer interchanged. I have checked all the cases and think it is reasonable that they are not interchanged, except for the first two. The first two are subtle cases, and I am not sure if they should be interchanged.

  • llvm-test-suite/MultiSource/Applications/JM/ldecod/block.c:935:5
  • llvm-test-suite/MultiSource/Applications/JM/ldecod/macroblock.c:2594:5
  • llvm-test-suite/MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/spatscal.c:298:5
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1906:1
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1907:2
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:283:2
  • llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c:103:5
  • llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.c:81:5

Profitability decisions with `CacheCost` sometimes gave strange results
when the cache line size was zero. This patch prevents `CacheCost` from
being used when the cache line size is zero, because it doesn't make
sense. This patch also prevents the `CacheCost` from being calculated in
this case, which may reduce compilation time.
@llvmbot
Copy link
Member

llvmbot commented Feb 6, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

Profitability decisions with CacheCost sometimes gave strange results when the cache line size was zero. This patch prevents CacheCost from being used when the cache line size is zero, because it doesn't make sense. This patch also prevents the CacheCost from being calculated in this case, which may reduce compilation time.

As I tried in llvm-test-suite, the following loops are no longer interchanged. I have checked all the cases and think it is reasonable that they are not interchanged, except for the first two. The first two are subtle cases, and I am not sure if they should be interchanged.

  • llvm-test-suite/MultiSource/Applications/JM/ldecod/block.c:935:5
  • llvm-test-suite/MultiSource/Applications/JM/ldecod/macroblock.c:2594:5
  • llvm-test-suite/MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/spatscal.c:298:5
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1906:1
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1907:2
  • llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:283:2
  • llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c:103:5
  • llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.c:81:5

Full diff: https://github.com/llvm/llvm-project/pull/126021.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+12-2)
  • (added) llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll (+59)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index d88fdf41db7a8e9..adefad9285e42d9 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -1130,6 +1130,12 @@ std::optional<bool>
 LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
     const DenseMap<const Loop *, unsigned> &CostMap,
     std::unique_ptr<CacheCost> &CC) {
+  // The `CacheCost` is not calculated if it is not considered worthwhile to use
+  // it. In this case we leave the profitability decision to the subsequent
+  // processes.
+  if (CC == nullptr)
+    return std::nullopt;
+
   // This is the new cost model returned from loop cache analysis.
   // A smaller index means the loop should be placed an outer loop, and vice
   // versa.
@@ -1773,8 +1779,12 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
   });
 
   DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
-  std::unique_ptr<CacheCost> CC =
-      CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
+
+  std::unique_ptr<CacheCost> CC;
+  // If the cache line size is set to zero, it doesn't make sense to use
+  // `CacheCost` for profitability decisions. Avoid computing it in this case.
+  if (AR.TTI.getCacheLineSize() != 0)
+    CC = CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
 
   if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))
     return PreservedAnalyses::all();
diff --git a/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll b/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll
new file mode 100644
index 000000000000000..bce47bce5232535
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll
@@ -0,0 +1,59 @@
+; RUN: opt %s -passes=loop-interchange -cache-line-size=0 -pass-remarks-output=%t -verify-dom-info -verify-loop-info \
+; RUN:     -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange -disable-output
+; RUN: FileCheck -input-file %t %s
+
+;; In the following code, interchanging is unprofitable even if the cache line
+;; size is set to zero. There are cases where the default cache line size is
+;; zero, e.g., the target processor is not specified.
+;;
+;; #define N 100
+;; #define M 100
+;; 
+;; // Extracted from SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c
+;; // in llvm-test-suite
+;; void f(double data[N][M], double mean[M], double stddev[M]) {
+;;   for (int i = 0; i < N; i++) {
+;;     for (int j = 0; j < M; j++) {
+;;       data[i][j] -= mean[j];
+;;       data[i][j] /= stddev[j];
+;;     }
+;;   }
+;; }
+
+; CHECK:      --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK:      Name:            InterchangeNotProfitable
+; CHECK-NEXT: Function:        f
+
+define void @f(ptr noundef captures(none) %data, ptr noundef readonly captures(none) %mean, ptr noundef readonly captures(none) %stddev) {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %indvars.iv30 = phi i64 [ 0, %entry ], [ %indvars.iv.next31, %for.cond.cleanup3 ]
+  br label %for.body4
+
+for.cond.cleanup:
+  ret void
+
+for.cond.cleanup3:
+  %indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
+  %exitcond33 = icmp ne i64 %indvars.iv.next31, 100
+  br i1 %exitcond33, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.body4:
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
+  %arrayidx = getelementptr inbounds nuw double, ptr %mean, i64 %indvars.iv
+  %0 = load double, ptr %arrayidx, align 8
+  %arrayidx8 = getelementptr inbounds nuw [100 x double], ptr %data, i64 %indvars.iv30, i64 %indvars.iv
+  %1 = load double, ptr %arrayidx8, align 8
+  %sub = fsub double %1, %0
+  store double %sub, ptr %arrayidx8, align 8
+  %arrayidx10 = getelementptr inbounds nuw double, ptr %stddev, i64 %indvars.iv
+  %2 = load double, ptr %arrayidx10, align 8
+  %div = fdiv double %sub, %2
+  store double %div, ptr %arrayidx8, align 8
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp ne i64 %indvars.iv.next, 100
+  br i1 %exitcond, label %for.body4, label %for.cond.cleanup3
+}

@kasuga-fj
Copy link
Contributor Author

I also found that the cache line size is not defined for the Neoverse family, so the default value of zero is used.

@kasuga-fj
Copy link
Contributor Author

@madhur13490 Could you please measure the compilation time impact? A quick test on my local showed improvements in several cases.

@kasuga-fj
Copy link
Contributor Author

I have misunderstood the logic of the CacheCost. It still makes sense even if the cache line size is 0.

@kasuga-fj kasuga-fj closed this Feb 26, 2025
@kasuga-fj kasuga-fj deleted the interchange-cachelinesize-zero branch September 2, 2025 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants