Skip to content

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Jun 3, 2025

This patch adds the flag -fexperimental-loop-fuse to the clang and flang drivers. This is primarily useful for experiments as we envision to enable the pass one day.

The options are based on the same principles and reason on which we have floop-interchange.

@sebpop sebpop requested a review from sjoerdmeijer June 3, 2025 22:49
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. flang:driver flang Flang issues not falling into any other category labels Jun 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Sebastian Pop (sebpop)

Changes

This adds the flag -floop-fuse to the clang and flang drivers.


Full diff: https://github.com/llvm/llvm-project/pull/142686.diff

15 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+1)
  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+2)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+1)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/lib/Driver/ToolChains/Flang.cpp (+2)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+7)
  • (modified) clang/test/Driver/clang_f_opts.c (+7)
  • (modified) flang/docs/ReleaseNotes.md (+1)
  • (modified) flang/include/flang/Frontend/CodeGenOptions.def (+1)
  • (modified) flang/lib/Frontend/CompilerInvocation.cpp (+3)
  • (modified) flang/lib/Frontend/FrontendActions.cpp (+1)
  • (added) flang/test/Driver/loop-fuse.f90 (+17)
  • (modified) llvm/include/llvm/Passes/PassBuilder.h (+3)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+12-1)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index aad4e107cbeb3..de40564bc280f 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -326,6 +326,7 @@ CODEGENOPT(TimeTrace         , 1, 0) ///< Set when -ftime-trace is enabled.
 VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),
                                                ///< traced by time profiler
 CODEGENOPT(InterchangeLoops  , 1, 0) ///< Run loop-interchange.
+CODEGENOPT(FuseLoops         , 1, 0) ///< Run loop-fuse.
 CODEGENOPT(UnrollLoops       , 1, 0) ///< Control whether loops are unrolled.
 CODEGENOPT(RerollLoops       , 1, 0) ///< Control whether loops are rerolled.
 CODEGENOPT(NoUseJumpTables   , 1, 0) ///< Set when -fno-jump-tables is enabled.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5ca31c253ed8f..e5e2c0035e03f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4189,6 +4189,10 @@ def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
   HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
   HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def floop_fuse : Flag<["-"], "floop-fuse">, Group<f_Group>,
+  HelpText<"Enable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def fno_loop_fuse: Flag<["-"], "fno-loop-fuse">, Group<f_Group>,
+  HelpText<"Disable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
   HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index cd5fc48c4a22b..539e413e8c86e 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -898,6 +898,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   PipelineTuningOptions PTO;
   PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
   PTO.LoopInterchange = CodeGenOpts.InterchangeLoops;
+  PTO.LoopFuse = CodeGenOpts.FuseLoops;
   // For historical reasons, loop interleaving is set to mirror setting for loop
   // unrolling.
   PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
@@ -1339,6 +1340,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
   Conf.SampleProfile = std::move(SampleProfile);
   Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops;
   Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops;
+  Conf.PTO.LoopFuse = CGOpts.FuseLoops;
   // For historical reasons, loop interleaving is set to mirror setting for loop
   // unrolling.
   Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 13842b8cc2870..f3713e12cb5ef 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7030,6 +7030,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
                   options::OPT_fno_unroll_loops);
   Args.AddLastArg(CmdArgs, options::OPT_floop_interchange,
                   options::OPT_fno_loop_interchange);
+  Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
 
   Args.AddLastArg(CmdArgs, options::OPT_fstrict_flex_arrays_EQ);
 
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 937ee09cac7cc..4217b48a56b05 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3157,7 +3157,7 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
 
 void tools::handleInterchangeLoopsArgs(const ArgList &Args,
                                        ArgStringList &CmdArgs) {
-  // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+  // FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to
   // implement a separate function to infer loop interchange from opt level.
   // For now, enable loop-interchange at the same opt levels as loop-vectorize.
   bool EnableInterchange = shouldEnableVectorizerAtOLevel(Args, false);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index dcc46469df3e9..f34a58cab3ff3 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -151,6 +151,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
       !stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
     CmdArgs.push_back("-fstack-arrays");
 
+  Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
+
   handleInterchangeLoopsArgs(Args, CmdArgs);
   handleVectorizeLoopsArgs(Args, CmdArgs);
   handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 2c02719121c73..5506e11f51491 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1648,6 +1648,11 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const CodeGenOptions &Opts,
   else
     GenerateArg(Consumer, OPT_fno_loop_interchange);
 
+  if (Opts.FuseLoops)
+    GenerateArg(Consumer, OPT_floop_fuse);
+  else
+    GenerateArg(Consumer, OPT_fno_loop_fuse);
+
   if (!Opts.BinutilsVersion.empty())
     GenerateArg(Consumer, OPT_fbinutils_version_EQ, Opts.BinutilsVersion);
 
@@ -1963,6 +1968,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
                    (Opts.OptimizationLevel > 1));
   Opts.InterchangeLoops =
       Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
+  Opts.FuseLoops =
+      Args.hasFlag(OPT_floop_fuse, OPT_fno_loop_fuse, false);
   Opts.BinutilsVersion =
       std::string(Args.getLastArgValue(OPT_fbinutils_version_EQ));
 
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index ee7ded265769b..204d5e338190f 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -52,6 +52,13 @@
 // CHECK-INTERCHANGE-LOOPS: "-floop-interchange"
 // CHECK-NO-INTERCHANGE-LOOPS: "-fno-loop-interchange"
 
+// RUN: %clang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -floop-fuse -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// CHECK-FUSE-LOOPS: "-floop-fuse"
+// CHECK-NO-FUSE-LOOPS: "-fno-loop-fuse"
+
 // RUN: %clang -### -S -fprofile-sample-accurate %s 2>&1 | FileCheck -check-prefix=CHECK-PROFILE-SAMPLE-ACCURATE %s
 // CHECK-PROFILE-SAMPLE-ACCURATE: "-fprofile-sample-accurate"
 
diff --git a/flang/docs/ReleaseNotes.md b/flang/docs/ReleaseNotes.md
index 36be369595ffd..33795b7d1a55a 100644
--- a/flang/docs/ReleaseNotes.md
+++ b/flang/docs/ReleaseNotes.md
@@ -34,6 +34,7 @@ page](https://llvm.org/releases/).
 
 * -floop-interchange is now recognized by flang.
 * -floop-interchange is enabled by default at -O2 and above.
+* -floop-fuse is now recognized by flang.
 
 ## Windows Support
 
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index a697872836569..85bec620b54ba 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -36,6 +36,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
 CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
 CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
 CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
+CODEGENOPT(FuseLoops, 1, 0)        ///< Enable loop fuse.
 CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
 CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
 CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 90a002929eff0..c2b63819e9f77 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -273,6 +273,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
   if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
     opts.InterchangeLoops = 1;
 
+  if (args.getLastArg(clang::driver::options::OPT_floop_fuse))
+    opts.FuseLoops = 1;
+
   if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
     opts.VectorizeLoop = 1;
 
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 012d0fdfe645f..dc56cbe5c85bc 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -925,6 +925,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
     si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
   pto.LoopUnrolling = opts.UnrollLoops;
   pto.LoopInterchange = opts.InterchangeLoops;
+  pto.LoopFuse = opts.FuseLoops;
   pto.LoopInterleaving = opts.UnrollLoops;
   pto.LoopVectorization = opts.VectorizeLoop;
   pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/test/Driver/loop-fuse.f90 b/flang/test/Driver/loop-fuse.f90
new file mode 100644
index 0000000000000..240d00fdb62d7
--- /dev/null
+++ b/flang/test/Driver/loop-fuse.f90
@@ -0,0 +1,17 @@
+! RUN: %flang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE %s
+! RUN: %flang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! CHECK-LOOP-FUSE: "-floop-fuse"
+! CHECK-NO-LOOP-FUSE-NOT: "-floop-fuse"
+! RUN: %flang_fc1 -emit-llvm -O2 -floop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE-PASS %s
+! RUN: %flang_fc1 -emit-llvm -O2 -fno-loop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE-PASS %s
+! CHECK-LOOP-FUSE-PASS: loop-fusion
+! CHECK-NO-LOOP-FUSE-PASS-NOT: loop-fusion
+
+program test
+end program
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..287a57c7d4a7d 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -64,6 +64,9 @@ class PipelineTuningOptions {
   /// false.
   bool LoopInterchange;
 
+  /// Tuning option to enable/disable loop fuse. Its default value is false.
+  bool LoopFuse;
+
   /// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
   /// is that of the flag: `-forget-scev-loop-unroll`.
   bool ForgetAllSCEVInLoopUnroll;
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index f3654600c5abb..aaf43a9b535e9 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -104,6 +104,7 @@
 #include "llvm/Transforms/Scalar/LoopDeletion.h"
 #include "llvm/Transforms/Scalar/LoopDistribute.h"
 #include "llvm/Transforms/Scalar/LoopFlatten.h"
+#include "llvm/Transforms/Scalar/LoopFuse.h"
 #include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
 #include "llvm/Transforms/Scalar/LoopInstSimplify.h"
 #include "llvm/Transforms/Scalar/LoopInterchange.h"
@@ -205,6 +206,10 @@ static cl::opt<bool>
     EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden,
                           cl::desc("Enable the LoopInterchange Pass"));
 
+static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false),
+                                    cl::Hidden,
+                                    cl::desc("Enable the LoopFuse Pass"));
+
 static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
                                         cl::init(false), cl::Hidden,
                                         cl::desc("Enable Unroll And Jam Pass"));
@@ -314,6 +319,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
   SLPVectorization = false;
   LoopUnrolling = true;
   LoopInterchange = EnableLoopInterchange;
+  LoopFuse = EnableLoopFuse;
   ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
   LicmMssaOptCap = SetLicmMssaOptCap;
   LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
@@ -518,6 +524,9 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
 
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
+  if (PTO.LoopFuse)
+    FPM.addPass(LoopFusePass());
+
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
                                               /*UseMemorySSA=*/true,
                                               /*UseBlockFrequencyInfo=*/true));
@@ -709,6 +718,9 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
 
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
+  if (PTO.LoopFuse)
+    FPM.addPass(LoopFusePass());
+
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
                                               /*UseMemorySSA=*/true,
                                               /*UseBlockFrequencyInfo=*/true));
@@ -2112,7 +2124,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
     LPM.addPass(LoopFlattenPass());
   LPM.addPass(IndVarSimplifyPass());
   LPM.addPass(LoopDeletionPass());
-  // FIXME: Add loop interchange.
 
   // Unroll small loops and perform peeling.
   LPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),

@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-clang-driver

Author: Sebastian Pop (sebpop)

Changes

This adds the flag -floop-fuse to the clang and flang drivers.


Full diff: https://github.com/llvm/llvm-project/pull/142686.diff

15 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+1)
  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+2)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+1)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/lib/Driver/ToolChains/Flang.cpp (+2)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+7)
  • (modified) clang/test/Driver/clang_f_opts.c (+7)
  • (modified) flang/docs/ReleaseNotes.md (+1)
  • (modified) flang/include/flang/Frontend/CodeGenOptions.def (+1)
  • (modified) flang/lib/Frontend/CompilerInvocation.cpp (+3)
  • (modified) flang/lib/Frontend/FrontendActions.cpp (+1)
  • (added) flang/test/Driver/loop-fuse.f90 (+17)
  • (modified) llvm/include/llvm/Passes/PassBuilder.h (+3)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+12-1)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index aad4e107cbeb3..de40564bc280f 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -326,6 +326,7 @@ CODEGENOPT(TimeTrace         , 1, 0) ///< Set when -ftime-trace is enabled.
 VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),
                                                ///< traced by time profiler
 CODEGENOPT(InterchangeLoops  , 1, 0) ///< Run loop-interchange.
+CODEGENOPT(FuseLoops         , 1, 0) ///< Run loop-fuse.
 CODEGENOPT(UnrollLoops       , 1, 0) ///< Control whether loops are unrolled.
 CODEGENOPT(RerollLoops       , 1, 0) ///< Control whether loops are rerolled.
 CODEGENOPT(NoUseJumpTables   , 1, 0) ///< Set when -fno-jump-tables is enabled.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5ca31c253ed8f..e5e2c0035e03f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4189,6 +4189,10 @@ def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
   HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
   HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def floop_fuse : Flag<["-"], "floop-fuse">, Group<f_Group>,
+  HelpText<"Enable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def fno_loop_fuse: Flag<["-"], "fno-loop-fuse">, Group<f_Group>,
+  HelpText<"Disable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
   HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
 def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index cd5fc48c4a22b..539e413e8c86e 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -898,6 +898,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   PipelineTuningOptions PTO;
   PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
   PTO.LoopInterchange = CodeGenOpts.InterchangeLoops;
+  PTO.LoopFuse = CodeGenOpts.FuseLoops;
   // For historical reasons, loop interleaving is set to mirror setting for loop
   // unrolling.
   PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
@@ -1339,6 +1340,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
   Conf.SampleProfile = std::move(SampleProfile);
   Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops;
   Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops;
+  Conf.PTO.LoopFuse = CGOpts.FuseLoops;
   // For historical reasons, loop interleaving is set to mirror setting for loop
   // unrolling.
   Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 13842b8cc2870..f3713e12cb5ef 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7030,6 +7030,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
                   options::OPT_fno_unroll_loops);
   Args.AddLastArg(CmdArgs, options::OPT_floop_interchange,
                   options::OPT_fno_loop_interchange);
+  Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
 
   Args.AddLastArg(CmdArgs, options::OPT_fstrict_flex_arrays_EQ);
 
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 937ee09cac7cc..4217b48a56b05 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3157,7 +3157,7 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
 
 void tools::handleInterchangeLoopsArgs(const ArgList &Args,
                                        ArgStringList &CmdArgs) {
-  // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+  // FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to
   // implement a separate function to infer loop interchange from opt level.
   // For now, enable loop-interchange at the same opt levels as loop-vectorize.
   bool EnableInterchange = shouldEnableVectorizerAtOLevel(Args, false);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index dcc46469df3e9..f34a58cab3ff3 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -151,6 +151,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
       !stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
     CmdArgs.push_back("-fstack-arrays");
 
+  Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
+
   handleInterchangeLoopsArgs(Args, CmdArgs);
   handleVectorizeLoopsArgs(Args, CmdArgs);
   handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 2c02719121c73..5506e11f51491 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1648,6 +1648,11 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const CodeGenOptions &Opts,
   else
     GenerateArg(Consumer, OPT_fno_loop_interchange);
 
+  if (Opts.FuseLoops)
+    GenerateArg(Consumer, OPT_floop_fuse);
+  else
+    GenerateArg(Consumer, OPT_fno_loop_fuse);
+
   if (!Opts.BinutilsVersion.empty())
     GenerateArg(Consumer, OPT_fbinutils_version_EQ, Opts.BinutilsVersion);
 
@@ -1963,6 +1968,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
                    (Opts.OptimizationLevel > 1));
   Opts.InterchangeLoops =
       Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
+  Opts.FuseLoops =
+      Args.hasFlag(OPT_floop_fuse, OPT_fno_loop_fuse, false);
   Opts.BinutilsVersion =
       std::string(Args.getLastArgValue(OPT_fbinutils_version_EQ));
 
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index ee7ded265769b..204d5e338190f 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -52,6 +52,13 @@
 // CHECK-INTERCHANGE-LOOPS: "-floop-interchange"
 // CHECK-NO-INTERCHANGE-LOOPS: "-fno-loop-interchange"
 
+// RUN: %clang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -floop-fuse -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// CHECK-FUSE-LOOPS: "-floop-fuse"
+// CHECK-NO-FUSE-LOOPS: "-fno-loop-fuse"
+
 // RUN: %clang -### -S -fprofile-sample-accurate %s 2>&1 | FileCheck -check-prefix=CHECK-PROFILE-SAMPLE-ACCURATE %s
 // CHECK-PROFILE-SAMPLE-ACCURATE: "-fprofile-sample-accurate"
 
diff --git a/flang/docs/ReleaseNotes.md b/flang/docs/ReleaseNotes.md
index 36be369595ffd..33795b7d1a55a 100644
--- a/flang/docs/ReleaseNotes.md
+++ b/flang/docs/ReleaseNotes.md
@@ -34,6 +34,7 @@ page](https://llvm.org/releases/).
 
 * -floop-interchange is now recognized by flang.
 * -floop-interchange is enabled by default at -O2 and above.
+* -floop-fuse is now recognized by flang.
 
 ## Windows Support
 
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index a697872836569..85bec620b54ba 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -36,6 +36,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
 CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
 CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
 CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
+CODEGENOPT(FuseLoops, 1, 0)        ///< Enable loop fuse.
 CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
 CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
 CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 90a002929eff0..c2b63819e9f77 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -273,6 +273,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
   if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
     opts.InterchangeLoops = 1;
 
+  if (args.getLastArg(clang::driver::options::OPT_floop_fuse))
+    opts.FuseLoops = 1;
+
   if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
     opts.VectorizeLoop = 1;
 
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 012d0fdfe645f..dc56cbe5c85bc 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -925,6 +925,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
     si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
   pto.LoopUnrolling = opts.UnrollLoops;
   pto.LoopInterchange = opts.InterchangeLoops;
+  pto.LoopFuse = opts.FuseLoops;
   pto.LoopInterleaving = opts.UnrollLoops;
   pto.LoopVectorization = opts.VectorizeLoop;
   pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/test/Driver/loop-fuse.f90 b/flang/test/Driver/loop-fuse.f90
new file mode 100644
index 0000000000000..240d00fdb62d7
--- /dev/null
+++ b/flang/test/Driver/loop-fuse.f90
@@ -0,0 +1,17 @@
+! RUN: %flang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE %s
+! RUN: %flang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! CHECK-LOOP-FUSE: "-floop-fuse"
+! CHECK-NO-LOOP-FUSE-NOT: "-floop-fuse"
+! RUN: %flang_fc1 -emit-llvm -O2 -floop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE-PASS %s
+! RUN: %flang_fc1 -emit-llvm -O2 -fno-loop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE-PASS %s
+! CHECK-LOOP-FUSE-PASS: loop-fusion
+! CHECK-NO-LOOP-FUSE-PASS-NOT: loop-fusion
+
+program test
+end program
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..287a57c7d4a7d 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -64,6 +64,9 @@ class PipelineTuningOptions {
   /// false.
   bool LoopInterchange;
 
+  /// Tuning option to enable/disable loop fuse. Its default value is false.
+  bool LoopFuse;
+
   /// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
   /// is that of the flag: `-forget-scev-loop-unroll`.
   bool ForgetAllSCEVInLoopUnroll;
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index f3654600c5abb..aaf43a9b535e9 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -104,6 +104,7 @@
 #include "llvm/Transforms/Scalar/LoopDeletion.h"
 #include "llvm/Transforms/Scalar/LoopDistribute.h"
 #include "llvm/Transforms/Scalar/LoopFlatten.h"
+#include "llvm/Transforms/Scalar/LoopFuse.h"
 #include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
 #include "llvm/Transforms/Scalar/LoopInstSimplify.h"
 #include "llvm/Transforms/Scalar/LoopInterchange.h"
@@ -205,6 +206,10 @@ static cl::opt<bool>
     EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden,
                           cl::desc("Enable the LoopInterchange Pass"));
 
+static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false),
+                                    cl::Hidden,
+                                    cl::desc("Enable the LoopFuse Pass"));
+
 static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
                                         cl::init(false), cl::Hidden,
                                         cl::desc("Enable Unroll And Jam Pass"));
@@ -314,6 +319,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
   SLPVectorization = false;
   LoopUnrolling = true;
   LoopInterchange = EnableLoopInterchange;
+  LoopFuse = EnableLoopFuse;
   ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
   LicmMssaOptCap = SetLicmMssaOptCap;
   LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
@@ -518,6 +524,9 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
 
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
+  if (PTO.LoopFuse)
+    FPM.addPass(LoopFusePass());
+
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
                                               /*UseMemorySSA=*/true,
                                               /*UseBlockFrequencyInfo=*/true));
@@ -709,6 +718,9 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
 
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
+  if (PTO.LoopFuse)
+    FPM.addPass(LoopFusePass());
+
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
                                               /*UseMemorySSA=*/true,
                                               /*UseBlockFrequencyInfo=*/true));
@@ -2112,7 +2124,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
     LPM.addPass(LoopFlattenPass());
   LPM.addPass(IndVarSimplifyPass());
   LPM.addPass(LoopDeletionPass());
-  // FIXME: Add loop interchange.
 
   // Unroll small loops and perform peeling.
   LPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),

Copy link

github-actions bot commented Jun 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@efriedma-quic
Copy link
Collaborator

If the goal here is gcc compatibility, I'd suggest not hooking up the flag to anything; the existing LoopFusePass isn't used by anything and hasn't been touched in years, so it's very likely to have issues.

@sebpop
Copy link
Contributor Author

sebpop commented Jun 10, 2025

The goal is to make it easier to turn on loop fusion and to collect and fix bug reports against loop fusion. Once we are happy with the stability of the pass the goal is to enable by default in flang at -O2 and above.

GCC does not have an equivalent for -floop-fuse.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flang changes LGTM.

nit: I am no clang expert but I wonder if you could use MarshallingInfoFlag in Options.td to avoid having to set CGOpts.FuseLoops manually for clang (this is not supported for flang unfortunately). If people more familiar with clang are happy with it how it is then that is fine by me.

Please wait for somebody else to review this from a clang/ and llvm/ perspective.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose clang driver options for passes that are known to have significant issues.

If you want to add a cl::opt flag to allow scheduling this in the pipeline and accessible for early testing via -mllvm, that would be a different matter.

If we really want a clang driver option, then it should be named -fexperimental-fuse-loops.

@tarunprabhu
Copy link
Contributor

If we really want a clang driver option, then it should be named -fexperimental-fuse-loops.

I think this is reasonable even for flang.


* -floop-interchange is now recognized by flang.
* -floop-interchange is enabled by default at -O2 and above.
* -fexperimental-fuse-loops is now recognized by flang.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make a similar entry in clang/docs/ReleaseNotes.md.

Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this is useful for testing purposes.
If folks are happy with the 'experimental' prefix, let's do that, LGTM.

@kasuga-fj
Copy link
Contributor

So, why wouldn't adding a cl::opt flag work in the end? In fact, you've added enable-loopfuse flag in this patch. If it's enough, I personally think it might be better to avoid introducing additional flags unnecessarily.

LPM.addPass(LoopFlattenPass());
LPM.addPass(IndVarSimplifyPass());
LPM.addPass(LoopDeletionPass());
// FIXME: Add loop interchange.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this patch resolves this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 527 to 529
if (PTO.LoopFuse)
FPM.addPass(LoopFusePass());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an appropriate place? I think this is an optimization pass rather than simplification one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it to the function simplification pipeline. LoopFusion is a function pass. I am open to change the place if this doesn't feel alright.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoopFusion is a function pass

I don't think this implies that the pass should be part of the function simplification pipeline. Its position should be determined by what the pass actually does. The appropriate placement likely depends on what you're trying to achieve with it.

void tools::handleInterchangeLoopsArgs(const ArgList &Args,
ArgStringList &CmdArgs) {
// FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
// FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@madhur13490
Copy link
Contributor

I am taking over this change. Trying to address review comments.

@kasuga-fj
Copy link
Contributor

You probably made a mistake with the rebase...

@madhur13490 madhur13490 merged commit 895cda7 into llvm:main Sep 15, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 15, 2025

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building clang,flang,llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/16071

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
...
[121/400] Building CXX object tools/clang/tools/extra/include-cleaner/unittests/CMakeFiles/ClangIncludeCleanerTests.dir/FindHeadersTest.cpp.o
[122/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Lex/PPCallbacksTest.cpp.o
[123/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Parse/ParseHLSLRootSignatureTest.cpp.o
[124/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Driver/ToolChainTest.cpp.o
[125/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/CloneDetectionTest.cpp.o
[126/400] Building CXX object tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/ReadabilityModuleTest.cpp.o
[127/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/MacroExpansionContextTest.cpp.o
[128/400] Building CXX object tools/clang/tools/extra/unittests/clang-tidy/CMakeFiles/ClangTidyTests.dir/TransformerClangTidyCheckTest.cpp.o
[129/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/APSIntTypeTest.cpp.o
[130/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp.o
FAILED: tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp.o 
/usr/bin/c++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_STATIC -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/unittests -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/tools/clang/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/Tooling -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googletest/include -I/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/third-party/unittest/googlemock/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-dangling-reference -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -O3 -DNDEBUG -std=c++17  -Wno-variadic-macros -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -Wno-suggest-override -MD -MT tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp.o -MF tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp.o.d -o tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp.o -c /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/clang/unittests/Analysis/FlowSensitive/MultiVarConstantPropagationTest.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[131/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/BlockEntranceCallbackTest.cpp.o
[132/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/DeterminismTest.cpp.o
[133/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/CallEventTest.cpp.o
[134/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/DebugSupportTest.cpp.o
[135/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/CFGTest.cpp.o
[136/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/TransferBranchTest.cpp.o
[137/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/CFGMatchSwitchTest.cpp.o
[138/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/CFGDominatorTree.cpp.o
[139/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp.o
[140/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/CachedConstAccessorsLatticeTest.cpp.o
[141/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/ConflictingEvalCallsTest.cpp.o
[142/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp.o
[143/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/SimplifyConstraintsTest.cpp.o
[144/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/SmartPointerAccessorCachingTest.cpp.o
[145/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/LifetimeSafetyTest.cpp.o
[146/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/BugReportInterestingnessTest.cpp.o
[147/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/LoggerTest.cpp.o
[148/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/ASTOpsTest.cpp.o
[149/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/WatchedLiteralsSolverTest.cpp.o
[150/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/MatchSwitchTest.cpp.o
[151/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/TestingSupportTest.cpp.o
[152/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/TestingSupport.cpp.o
[153/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/RecordOpsTest.cpp.o
[154/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/SignAnalysisTest.cpp.o
[155/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/IntervalPartitionTest.cpp.o
[156/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp.o
[157/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/SingleVarConstantPropagationTest.cpp.o
[158/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/CallDescriptionTest.cpp.o
[159/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/ExprMutationAnalyzerTest.cpp.o
[160/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp.o
[161/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/ExprEngineVisitTest.cpp.o
[162/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp.o
[163/400] Building CXX object tools/clang/unittests/CMakeFiles/AllClangUnitTests.dir/Analysis/FlowSensitive/TransferTest.cpp.o
ninja: build stopped: subcommand failed.

bool LoopInterchange;

/// Tuning option to enable/disable loop fusion. Its default value is false.
bool LoopFusion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this default is never actually set anywhere. There is no assignment in the PipelineTuningOptions ctor. (Probably that ctor should be replaced with direct struct member initialization...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ThinLTOCodeGenerator.cpp, I need to do PTO.LoopFusion = False. Such change is also needed in mlir/lib/ExecutionEngine/OptUtils.cpp.

This is required at each site which is using PipelineTuningOptions. For LoopInterchange and others, initialization occurs in PassBuilderPipelines.cpp, but for LoopFusion, I can't do this because the definition of the LoopFusion variable is in NewPMDriver.cpp, as suggested by @kasuga-fj.

I am inclined to follow the way LoopInterchange is initialized (and thus move the definition of LoopFusion from NewPMDriver.cpp to PassBuilderPipelines.cpp)

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required at each site which is using PipelineTuningOptions. For LoopInterchange and others, initialization occurs in PassBuilderPipelines.cpp, but for LoopFusion, I can't do this because the definition of the LoopFusion variable is in NewPMDriver.cpp, as suggested by @kasuga-fj.

Why can't you do the initialization to false in PassBuilderPipelines? Isn't this just the default value that later gets overwritten by the code in NewPMDriver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is another option. #158844. Can you please have a quick look?

@nikic
Copy link
Contributor

nikic commented Sep 15, 2025

This change had significant impact on ThinLTO compile-time: https://llvm-compile-time-tracker.com/compare.php?from=e0a33cb599f8614b3d897ad2bd2f9fa8e1acbac5&to=895cda70a95529fd22aac05eee7c34f7624996af&stat=instructions:u I suspect this ended up accidentally enabling loop fuse by default when ThinLTO is used.

@vitalybuka
Copy link
Collaborator

vitalybuka added a commit that referenced this pull request Sep 16, 2025
…" (#158764)

This reverts commit 895cda7.
And fix attempt: 06f671e.

Performance regressions and broken sanitizers, see #142686.
@vitalybuka
Copy link
Collaborator

reverted here #158764

madhur13490 added a commit that referenced this pull request Sep 16, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category flang:driver flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.