[LoopInterchange] Avoid using CacheCost if cache line size is zero #126021

kasuga-fj · 2025-02-06T07:42:42Z

Profitability decisions with CacheCost sometimes gave strange results when the cache line size was zero. This patch prevents CacheCost from being used when the cache line size is zero, because it doesn't make sense. This patch also prevents the CacheCost from being calculated in this case, which may reduce compilation time.

As I tried in llvm-test-suite, the following loops are no longer interchanged. I have checked all the cases and think it is reasonable that they are not interchanged, except for the first two. The first two are subtle cases, and I am not sure if they should be interchanged.

llvm-test-suite/MultiSource/Applications/JM/ldecod/block.c:935:5
llvm-test-suite/MultiSource/Applications/JM/ldecod/macroblock.c:2594:5
llvm-test-suite/MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/spatscal.c:298:5
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1906:1
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1907:2
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:283:2
llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c:103:5
llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.c:81:5

Profitability decisions with `CacheCost` sometimes gave strange results when the cache line size was zero. This patch prevents `CacheCost` from being used when the cache line size is zero, because it doesn't make sense. This patch also prevents the `CacheCost` from being calculated in this case, which may reduce compilation time.

llvmbot · 2025-02-06T07:44:17Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

Profitability decisions with CacheCost sometimes gave strange results when the cache line size was zero. This patch prevents CacheCost from being used when the cache line size is zero, because it doesn't make sense. This patch also prevents the CacheCost from being calculated in this case, which may reduce compilation time.

As I tried in llvm-test-suite, the following loops are no longer interchanged. I have checked all the cases and think it is reasonable that they are not interchanged, except for the first two. The first two are subtle cases, and I am not sure if they should be interchanged.

llvm-test-suite/MultiSource/Applications/JM/ldecod/block.c:935:5
llvm-test-suite/MultiSource/Applications/JM/ldecod/macroblock.c:2594:5
llvm-test-suite/MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/spatscal.c:298:5
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1906:1
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:1907:2
llvm-test-suite/MultiSource/Benchmarks/nbench/nbench1.c:283:2
llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c:103:5
llvm-test-suite/SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.c:81:5

Full diff: https://github.com/llvm/llvm-project/pull/126021.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+12-2)
(added) llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll (+59)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index d88fdf41db7a8e9..adefad9285e42d9 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -1130,6 +1130,12 @@ std::optional<bool>
 LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
     const DenseMap<const Loop *, unsigned> &CostMap,
     std::unique_ptr<CacheCost> &CC) {
+  // The `CacheCost` is not calculated if it is not considered worthwhile to use
+  // it. In this case we leave the profitability decision to the subsequent
+  // processes.
+  if (CC == nullptr)
+    return std::nullopt;
+
   // This is the new cost model returned from loop cache analysis.
   // A smaller index means the loop should be placed an outer loop, and vice
   // versa.
@@ -1773,8 +1779,12 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
   });
 
   DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
-  std::unique_ptr<CacheCost> CC =
-      CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
+
+  std::unique_ptr<CacheCost> CC;
+  // If the cache line size is set to zero, it doesn't make sense to use
+  // `CacheCost` for profitability decisions. Avoid computing it in this case.
+  if (AR.TTI.getCacheLineSize() != 0)
+    CC = CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
 
   if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))
     return PreservedAnalyses::all();
diff --git a/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll b/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll
new file mode 100644
index 000000000000000..bce47bce5232535
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/cache-line-size-zero.ll
@@ -0,0 +1,59 @@
+; RUN: opt %s -passes=loop-interchange -cache-line-size=0 -pass-remarks-output=%t -verify-dom-info -verify-loop-info \
+; RUN:     -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange -disable-output
+; RUN: FileCheck -input-file %t %s
+
+;; In the following code, interchanging is unprofitable even if the cache line
+;; size is set to zero. There are cases where the default cache line size is
+;; zero, e.g., the target processor is not specified.
+;;
+;; #define N 100
+;; #define M 100
+;; 
+;; // Extracted from SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.c
+;; // in llvm-test-suite
+;; void f(double data[N][M], double mean[M], double stddev[M]) {
+;;   for (int i = 0; i < N; i++) {
+;;     for (int j = 0; j < M; j++) {
+;;       data[i][j] -= mean[j];
+;;       data[i][j] /= stddev[j];
+;;     }
+;;   }
+;; }
+
+; CHECK:      --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK:      Name:            InterchangeNotProfitable
+; CHECK-NEXT: Function:        f
+
+define void @f(ptr noundef captures(none) %data, ptr noundef readonly captures(none) %mean, ptr noundef readonly captures(none) %stddev) {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %indvars.iv30 = phi i64 [ 0, %entry ], [ %indvars.iv.next31, %for.cond.cleanup3 ]
+  br label %for.body4
+
+for.cond.cleanup:
+  ret void
+
+for.cond.cleanup3:
+  %indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
+  %exitcond33 = icmp ne i64 %indvars.iv.next31, 100
+  br i1 %exitcond33, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.body4:
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
+  %arrayidx = getelementptr inbounds nuw double, ptr %mean, i64 %indvars.iv
+  %0 = load double, ptr %arrayidx, align 8
+  %arrayidx8 = getelementptr inbounds nuw [100 x double], ptr %data, i64 %indvars.iv30, i64 %indvars.iv
+  %1 = load double, ptr %arrayidx8, align 8
+  %sub = fsub double %1, %0
+  store double %sub, ptr %arrayidx8, align 8
+  %arrayidx10 = getelementptr inbounds nuw double, ptr %stddev, i64 %indvars.iv
+  %2 = load double, ptr %arrayidx10, align 8
+  %div = fdiv double %sub, %2
+  store double %div, ptr %arrayidx8, align 8
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp ne i64 %indvars.iv.next, 100
+  br i1 %exitcond, label %for.body4, label %for.cond.cleanup3
+}

kasuga-fj · 2025-02-06T07:46:40Z

I also found that the cache line size is not defined for the Neoverse family, so the default value of zero is used.

kasuga-fj · 2025-02-06T07:50:09Z

@madhur13490 Could you please measure the compilation time impact? A quick test on my local showed improvements in several cases.

kasuga-fj · 2025-02-26T06:36:15Z

I have misunderstood the logic of the CacheCost. It still makes sense even if the cache line size is 0.

llvmbot added the llvm:transforms label Feb 6, 2025

kasuga-fj requested review from madhur13490 and sjoerdmeijer February 6, 2025 07:46

kasuga-fj closed this Feb 26, 2025

kasuga-fj deleted the interchange-cachelinesize-zero branch September 2, 2025 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopInterchange] Avoid using CacheCost if cache line size is zero #126021

[LoopInterchange] Avoid using CacheCost if cache line size is zero #126021

Uh oh!

kasuga-fj commented Feb 6, 2025

Uh oh!

llvmbot commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[LoopInterchange] Avoid using CacheCost if cache line size is zero #126021

[LoopInterchange] Avoid using CacheCost if cache line size is zero #126021

Uh oh!

Conversation

kasuga-fj commented Feb 6, 2025

Uh oh!

llvmbot commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 6, 2025

Uh oh!

kasuga-fj commented Feb 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants