Skip to content

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 18, 2025

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo with PGO, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI so as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

After some investigation on llvm-test-suite it turns out that the lossy BFI causes very significant deviations in block frequency, since when new loops are deleted/created during the loop pass manager it can return frequencies for different loops altogether.
This results in unswitchable loops being mistakenly skipped because they are thought to be cold.

This patch removes the use of BFI from SimpleLoopUnswitch and thus the last remaining use of BFI in a loop pass.

To recover the original intent of not unswitching cold code, PGOForceFunctionAttrs can be used to annotate functions which can be optimized for code size, since SimpleLoopUnswitch will respect OptSize: https://reviews.llvm.org/D94559

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level. We could expand PGOForceFunctionAttrs to be more granular at the loop level, #159595 tracks this idea.

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Sep 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

Stacked on #159516

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

  • It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
  • We no longer have to worry about lossy analysis results
  • Which in turn means the decision to avoid non-trivial switching will be more accurate
  • It brings the behaviour inline with other passes that respect OptSize
  • It respects the -pgo-cold-func-opt flag so users can control the behaviour
  • It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time

Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAnalysisManager.h (-4)
  • (modified) llvm/include/llvm/Transforms/Scalar/LoopPassManager.h (+6-16)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+14-22)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-20)
  • (modified) llvm/lib/Transforms/Scalar/LoopPassManager.cpp (-11)
  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+2-38)
  • (modified) llvm/test/Other/loop-pm-invalidation.ll (-30)
  • (modified) llvm/test/Other/new-pm-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (-1)
  • (renamed) llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll (+3-2)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll (-1)
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
 
 class AAResults;
 class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
 class DominatorTree;
 class Function;
 class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
   ScalarEvolution &SE;
   TargetLibraryInfo &TLI;
   TargetTransformInfo &TTI;
-  BlockFrequencyInfo *BFI;
-  BranchProbabilityInfo *BPI;
   MemorySSA *MSSA;
 };
 
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
 
   explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
                                      bool UseMemorySSA = false,
-                                     bool UseBlockFrequencyInfo = false,
-                                     bool UseBranchProbabilityInfo = false,
                                      bool LoopNestMode = false)
       : Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
-        UseBlockFrequencyInfo(UseBlockFrequencyInfo),
-        UseBranchProbabilityInfo(UseBranchProbabilityInfo),
         LoopNestMode(LoopNestMode) {
     LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
     LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
   FunctionPassManager LoopCanonicalizationFPM;
 
   bool UseMemorySSA = false;
-  bool UseBlockFrequencyInfo = false;
-  bool UseBranchProbabilityInfo = false;
   const bool LoopNestMode;
 };
 
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
 /// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
 template <typename LoopPassT>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
-                                bool UseBlockFrequencyInfo = false,
-                                bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
   if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
     using PassModelT =
         detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::forward<LoopPassT>(Pass))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+        UseMemorySSA, false);
   } else {
     LoopPassManager LPM;
     LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::move(LPM))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+        UseMemorySSA, true);
   }
 }
 
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
 /// be in loop-nest mode if the pass manager contains only loop-nest passes.
 template <>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
-    LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
-    bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+                                                 bool UseMemorySSA) {
   // Check if LPM contains any loop pass and if it does not, returns an adaptor
   // in loop-nest mode.
   using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
   return FunctionToLoopPassAdaptor(
       std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
           new PassModelT(std::move(LPM))),
-      UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
-      LoopNestMode);
+      UseMemorySSA, LoopNestMode);
 }
 
 /// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    MPM.addPass(                                                               \
-        createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(     \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    MPM.addPass(createModuleToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    CGPM.addPass(                                                              \
-        createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(      \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
         return Err;
       // Add the nested pass manager with the appropriate adaptor.
       bool UseMemorySSA = (Name == "loop-mssa");
-      bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name.contains("simple-loop-unswitch");
-      });
-      bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name == "loop-predication";
-      });
-      FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
-                                                  UseBFI, UseBPI));
+      FPM.addPass(
+          createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
       return Error::success();
     }
     if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
 //        The risk is that it may become obsolete if we're not careful.
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()),     \
-                                                false, false));                \
+    FPM.addPass(                                                               \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false));    \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
   // The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   // LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   FPM.addPass(CoroElidePass());
 
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
         createFunctionToLoopPassAdaptor(
             LoopRotatePass(EnableLoopHeaderDuplication ||
                            Level != OptimizationLevel::Oz),
-            /*UseMemorySSA=*/false,
-            /*UseBlockFrequencyInfo=*/false),
+            /*UseMemorySSA=*/false),
         PTO.EagerlyInvalidateAnalyses));
   }
 }
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
     LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
                                        OptimizationLevel::O3));
     ExtraPasses.addPass(
-        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
-                                        /*UseBlockFrequencyInfo=*/true));
+        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
     ExtraPasses.addPass(
         SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
     ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
     OptimizePM.addPass(createFunctionToLoopPassAdaptor(
         LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                  /*AllowSpeculation=*/true),
-        /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+        /*USeMemorySSA=*/true));
   }
 
   OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   if (PTO.LoopInterchange)
     LPM.addPass(LoopInterchangePass());
 
-  OptimizePM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+  OptimizePM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   // FIXME: This may not be the right place in the pipeline.
   // We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MainFPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*USeMemorySSA=*/true));
 
   if (RunNewGVN)
     MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
                                  PTO.ForgetAllSCEVInLoopUnroll));
   // The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
-  MainFPM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+  MainFPM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   MainFPM.addPass(LoopDistributePass());
 
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
 
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
   // Get the analysis results needed by loop passes.
   MemorySSA *MSSA =
       UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
-  BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
-                                ? (&AM.getResult<BlockFrequencyAnalysis>(F))
-                                : nullptr;
-  BranchProbabilityInfo *BPI =
-      UseBranchProbabilityInfo && F.hasProfileData()
-          ? (&AM.getResult<BranchProbabilityAnalysis>(F))
-          : nullptr;
   LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
                                      AM.getResult<AssumptionAnalysis>(F),
                                      AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
                                      AM.getResult<ScalarEvolutionAnalysis>(F),
                                      AM.getResult<TargetLibraryAnalysis>(F),
                                      AM.getResult<TargetIRAnalysis>(F),
-                                     BFI,
-                                     BPI,
                                      MSSA};
 
   // Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/MemorySSAUpdater.h"
 #include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
                          AssumptionCache &AC, AAResults &AA,
                          TargetTransformInfo &TTI, bool Trivial,
  ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Stacked on #159516

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch landed, PGOForceFunctionAttrsPass was added which automatically annotates cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to determine whether we should optimize for code size is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

  • It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
  • We no longer have to worry about lossy analysis results
  • Which in turn means the decision to avoid non-trivial switching will be more accurate
  • It brings the behaviour inline with other passes that respect OptSize
  • It respects the -pgo-cold-func-opt flag so users can control the behaviour
  • It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time

Patch is 31.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159522.diff

16 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAnalysisManager.h (-4)
  • (modified) llvm/include/llvm/Transforms/Scalar/LoopPassManager.h (+6-16)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+14-22)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-20)
  • (modified) llvm/lib/Transforms/Scalar/LoopPassManager.cpp (-11)
  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+2-38)
  • (modified) llvm/test/Other/loop-pm-invalidation.ll (-30)
  • (modified) llvm/test/Other/new-pm-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (-1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (-1)
  • (renamed) llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll (+3-2)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll (-1)
diff --git a/llvm/include/llvm/Analysis/LoopAnalysisManager.h b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
index a825ada05df11..1755257fe6c89 100644
--- a/llvm/include/llvm/Analysis/LoopAnalysisManager.h
+++ b/llvm/include/llvm/Analysis/LoopAnalysisManager.h
@@ -36,8 +36,6 @@ namespace llvm {
 
 class AAResults;
 class AssumptionCache;
-class BlockFrequencyInfo;
-class BranchProbabilityInfo;
 class DominatorTree;
 class Function;
 class Loop;
@@ -59,8 +57,6 @@ struct LoopStandardAnalysisResults {
   ScalarEvolution &SE;
   TargetLibraryInfo &TLI;
   TargetTransformInfo &TTI;
-  BlockFrequencyInfo *BFI;
-  BranchProbabilityInfo *BPI;
   MemorySSA *MSSA;
 };
 
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
index 79ccd63fd834c..1842d2dc5f05a 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
@@ -404,12 +404,8 @@ class FunctionToLoopPassAdaptor
 
   explicit FunctionToLoopPassAdaptor(std::unique_ptr<PassConceptT> Pass,
                                      bool UseMemorySSA = false,
-                                     bool UseBlockFrequencyInfo = false,
-                                     bool UseBranchProbabilityInfo = false,
                                      bool LoopNestMode = false)
       : Pass(std::move(Pass)), UseMemorySSA(UseMemorySSA),
-        UseBlockFrequencyInfo(UseBlockFrequencyInfo),
-        UseBranchProbabilityInfo(UseBranchProbabilityInfo),
         LoopNestMode(LoopNestMode) {
     LoopCanonicalizationFPM.addPass(LoopSimplifyPass());
     LoopCanonicalizationFPM.addPass(LCSSAPass());
@@ -431,8 +427,6 @@ class FunctionToLoopPassAdaptor
   FunctionPassManager LoopCanonicalizationFPM;
 
   bool UseMemorySSA = false;
-  bool UseBlockFrequencyInfo = false;
-  bool UseBranchProbabilityInfo = false;
   const bool LoopNestMode;
 };
 
@@ -445,9 +439,7 @@ class FunctionToLoopPassAdaptor
 /// \c LoopPassManager and the returned adaptor will be in loop-nest mode.
 template <typename LoopPassT>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
-                                bool UseBlockFrequencyInfo = false,
-                                bool UseBranchProbabilityInfo = false) {
+createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false) {
   if constexpr (is_detected<HasRunOnLoopT, LoopPassT>::value) {
     using PassModelT =
         detail::PassModel<Loop, LoopPassT, LoopAnalysisManager,
@@ -457,7 +449,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::forward<LoopPassT>(Pass))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, false);
+        UseMemorySSA, false);
   } else {
     LoopPassManager LPM;
     LPM.addPass(std::forward<LoopPassT>(Pass));
@@ -469,7 +461,7 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
     return FunctionToLoopPassAdaptor(
         std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
             new PassModelT(std::move(LPM))),
-        UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo, true);
+        UseMemorySSA, true);
   }
 }
 
@@ -477,9 +469,8 @@ createFunctionToLoopPassAdaptor(LoopPassT &&Pass, bool UseMemorySSA = false,
 /// be in loop-nest mode if the pass manager contains only loop-nest passes.
 template <>
 inline FunctionToLoopPassAdaptor
-createFunctionToLoopPassAdaptor<LoopPassManager>(
-    LoopPassManager &&LPM, bool UseMemorySSA, bool UseBlockFrequencyInfo,
-    bool UseBranchProbabilityInfo) {
+createFunctionToLoopPassAdaptor<LoopPassManager>(LoopPassManager &&LPM,
+                                                 bool UseMemorySSA) {
   // Check if LPM contains any loop pass and if it does not, returns an adaptor
   // in loop-nest mode.
   using PassModelT =
@@ -491,8 +482,7 @@ createFunctionToLoopPassAdaptor<LoopPassManager>(
   return FunctionToLoopPassAdaptor(
       std::unique_ptr<FunctionToLoopPassAdaptor::PassConceptT>(
           new PassModelT(std::move(LPM))),
-      UseMemorySSA, UseBlockFrequencyInfo, UseBranchProbabilityInfo,
-      LoopNestMode);
+      UseMemorySSA, LoopNestMode);
 }
 
 /// Pass for printing a loop's contents as textual IR.
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 8cf277657a54a..d5b44a013d4a7 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1931,13 +1931,13 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     MPM.addPass(createModuleToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -1945,9 +1945,8 @@ Error PassBuilder::parseModulePass(ModulePassManager &MPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    MPM.addPass(                                                               \
-        createModuleToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(     \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    MPM.addPass(createModuleToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2046,13 +2045,13 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
     CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
-        createFunctionToLoopPassAdaptor(CREATE_PASS, false, false)));          \
+        createFunctionToLoopPassAdaptor(CREATE_PASS, false)));                 \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2060,9 +2059,8 @@ Error PassBuilder::parseCGSCCPass(CGSCCPassManager &CGPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    CGPM.addPass(                                                              \
-        createCGSCCToFunctionPassAdaptor(createFunctionToLoopPassAdaptor(      \
-            CREATE_PASS(Params.get()), false, false)));                        \
+    CGPM.addPass(createCGSCCToFunctionPassAdaptor(                             \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false)));   \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
@@ -2095,14 +2093,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
         return Err;
       // Add the nested pass manager with the appropriate adaptor.
       bool UseMemorySSA = (Name == "loop-mssa");
-      bool UseBFI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name.contains("simple-loop-unswitch");
-      });
-      bool UseBPI = llvm::any_of(InnerPipeline, [](auto Pipeline) {
-        return Pipeline.Name == "loop-predication";
-      });
-      FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA,
-                                                  UseBFI, UseBPI));
+      FPM.addPass(
+          createFunctionToLoopPassAdaptor(std::move(LPM), UseMemorySSA));
       return Error::success();
     }
     if (Name == "machine-function") {
@@ -2155,12 +2147,12 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
 //        The risk is that it may become obsolete if we're not careful.
 #define LOOPNEST_PASS(NAME, CREATE_PASS)                                       \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS(NAME, CREATE_PASS)                                           \
   if (Name == NAME) {                                                          \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false, false));   \
+    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS, false));          \
     return Error::success();                                                   \
   }
 #define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)        \
@@ -2168,8 +2160,8 @@ Error PassBuilder::parseFunctionPass(FunctionPassManager &FPM,
     auto Params = parsePassParameters(PARSER, Name, NAME);                     \
     if (!Params)                                                               \
       return Params.takeError();                                               \
-    FPM.addPass(createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()),     \
-                                                false, false));                \
+    FPM.addPass(                                                               \
+        createFunctionToLoopPassAdaptor(CREATE_PASS(Params.get()), false));    \
     return Error::success();                                                   \
   }
 #include "PassRegistry.def"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index c3f35f0f5e7fa..3dfe9cf51865c 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -517,16 +517,14 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
   // The loop passes in LPM2 (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -706,8 +704,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
 
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
-                                              /*UseMemorySSA=*/true,
-                                              /*UseBlockFrequencyInfo=*/true));
+                                              /*UseMemorySSA=*/true));
   FPM.addPass(
       SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
   FPM.addPass(InstCombinePass());
@@ -715,8 +712,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   // LoopDeletionPass and LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
   FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM2),
-                                              /*UseMemorySSA=*/false,
-                                              /*UseBlockFrequencyInfo=*/false));
+                                              /*UseMemorySSA=*/false));
 
   // Delete small array after loop unroll.
   FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
@@ -769,7 +765,7 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   FPM.addPass(CoroElidePass());
 
@@ -837,8 +833,7 @@ void PassBuilder::addPostPGOLoopRotation(ModulePassManager &MPM,
         createFunctionToLoopPassAdaptor(
             LoopRotatePass(EnableLoopHeaderDuplication ||
                            Level != OptimizationLevel::Oz),
-            /*UseMemorySSA=*/false,
-            /*UseBlockFrequencyInfo=*/false),
+            /*UseMemorySSA=*/false),
         PTO.EagerlyInvalidateAnalyses));
   }
 }
@@ -1354,8 +1349,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
     LPM.addPass(SimpleLoopUnswitchPass(/* NonTrivial */ Level ==
                                        OptimizationLevel::O3));
     ExtraPasses.addPass(
-        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true,
-                                        /*UseBlockFrequencyInfo=*/true));
+        createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/true));
     ExtraPasses.addPass(
         SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(true)));
     ExtraPasses.addPass(InstCombinePass());
@@ -1433,7 +1427,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
   FPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*UseMemorySSA=*/true));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
@@ -1510,7 +1504,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
     OptimizePM.addPass(createFunctionToLoopPassAdaptor(
         LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                  /*AllowSpeculation=*/true),
-        /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+        /*USeMemorySSA=*/true));
   }
 
   OptimizePM.addPass(Float2IntPass());
@@ -1550,8 +1544,8 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   if (PTO.LoopInterchange)
     LPM.addPass(LoopInterchangePass());
 
-  OptimizePM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));
+  OptimizePM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   // FIXME: This may not be the right place in the pipeline.
   // We need to have the data to support the right place.
@@ -2100,7 +2094,7 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MainFPM.addPass(createFunctionToLoopPassAdaptor(
       LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
                /*AllowSpeculation=*/true),
-      /*USeMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+      /*USeMemorySSA=*/true));
 
   if (RunNewGVN)
     MainFPM.addPass(NewGVNPass());
@@ -2130,8 +2124,8 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
                                  PTO.ForgetAllSCEVInLoopUnroll));
   // The loop passes in LPM (LoopFullUnrollPass) do not preserve MemorySSA.
   // *All* loop passes must preserve it, in order to be able to use it.
-  MainFPM.addPass(createFunctionToLoopPassAdaptor(
-      std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/true));
+  MainFPM.addPass(
+      createFunctionToLoopPassAdaptor(std::move(LPM), /*UseMemorySSA=*/false));
 
   MainFPM.addPass(LoopDistributePass());
 
diff --git a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
index 4b26ce12a28db..47a1b95339186 100644
--- a/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopPassManager.cpp
@@ -8,8 +8,6 @@
 
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Analysis/AssumptionCache.h"
-#include "llvm/Analysis/BlockFrequencyInfo.h"
-#include "llvm/Analysis/BranchProbabilityInfo.h"
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
@@ -220,13 +218,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
   // Get the analysis results needed by loop passes.
   MemorySSA *MSSA =
       UseMemorySSA ? (&AM.getResult<MemorySSAAnalysis>(F).getMSSA()) : nullptr;
-  BlockFrequencyInfo *BFI = UseBlockFrequencyInfo && F.hasProfileData()
-                                ? (&AM.getResult<BlockFrequencyAnalysis>(F))
-                                : nullptr;
-  BranchProbabilityInfo *BPI =
-      UseBranchProbabilityInfo && F.hasProfileData()
-          ? (&AM.getResult<BranchProbabilityAnalysis>(F))
-          : nullptr;
   LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
                                      AM.getResult<AssumptionAnalysis>(F),
                                      AM.getResult<DominatorTreeAnalysis>(F),
@@ -234,8 +225,6 @@ PreservedAnalyses FunctionToLoopPassAdaptor::run(Function &F,
                                      AM.getResult<ScalarEvolutionAnalysis>(F),
                                      AM.getResult<TargetLibraryAnalysis>(F),
                                      AM.getResult<TargetIRAnalysis>(F),
-                                     BFI,
-                                     BPI,
                                      MSSA};
 
   // Setup the loop analysis manager from its proxy. It is important that
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index e4ba70d1bce16..5af6c96c56a06 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -27,7 +27,6 @@
 #include "llvm/Analysis/MemorySSA.h"
 #include "llvm/Analysis/MemorySSAUpdater.h"
 #include "llvm/Analysis/MustExecute.h"
-#include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -3611,8 +3610,7 @@ static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
                          AssumptionCache &AC, AAResults &AA,
                          TargetTransformInfo &TTI, bool Trivial,
  ...
[truncated]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the existing test to PhaseOrdering to show how using PGOForceFunctionAttrs also prevents unswitching this loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the test added in #157888 to make sure we invalidated BFI in the loop pass adaptor. But if we don't use BFI anymore then it doesn't get invalidated

@nikic
Copy link
Contributor

nikic commented Sep 18, 2025

This only works if the whole function is cold, not if there is a cold loop inside a hot function, right?

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

This only works if the whole function is cold, not if there is a cold loop inside a hot function, right?

Yeah I briefly touched on that in the PR description. I couldn't find anything in https://reviews.llvm.org/D129599 that explicitly mentioned cold loops in hot functions, so I hope that function level granularity is good enough for whatever case motivated that patch. At least it seems to be good enough for the test case that was included with it, llvm/test/Transforms/PhaseOrdering/unswitch-cold-func.ll

@nikic
Copy link
Contributor

nikic commented Sep 18, 2025

Okay, looking through the history of what happened here, we have:

  • f756f06 initially checked whether the loop is cold
  • fb45f3c switched this to check whether the function is cold (using isFunctionColdInCallGraph) to prevent SPEC regressions from the previous
  • dfb40d3 then switched to checking for a cold loop nest because of compile-time issues with the previous approach.

So with that history in mind, it does seem like just relying on the whole function being cold (via optsize) may be fine, as that was one of the intermediate solutions, which was only dropped due to compile-time issues. (Note that PGOForceFunctionAttrsPass uses isFunctionColdInCallGraph as well, so it should be equivalent to the second solution without the compile-time impact.)

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please leave some time for others to comment.

In https://reviews.llvm.org/D129599, non-trivial switching was disabled for cold loops in the interest of code size. This added a dependency on BlockFrequencyInfo, but in loop passes this is only available on a lossy basis: see https://reviews.llvm.org/D86156

LICM moved away from BFI and as of today SimpleLoopUnswitch is the only remaining loop pass that uses BFI, for the sole reason to prevent code size increases in PGO builds. It doesn't use BFI if there's no profile summary available.

However just before the BFI check, we also check to see if the function is marked as OptSize: https://reviews.llvm.org/D94559

And coincidentally sometime after the initial BFI patch PGOForceFunctionAttrsPass was added which will automatically annotate cold functions with OptSize: https://reviews.llvm.org/D149800

I think using PGOForceFunctionAttrs to add the OptSize is probably a more accurate and generalized way to prevent unwanted code size increases. So this patch proposes to remove the BFI check in SimpleLoopUnswitch.

This isn't 100% the same behaviour since the previous behaviour checked for coldness at the loop level and this is now at the function level, but I believe the benefits outweigh this:

- It allows us to remove BFI from the function to loop pass adapter, which was only enabled for certain stages in the LTO pipeline
- We no longer have to worry about lossy analysis results
- Which in turn means the decision to avoid non-trivial switching will be more accurate
- It brings the behaviour inline with other passes that respect OptSize
- It respects the -pgo-cold-func-opt flag so users can control the behaviour
- It prevents the need to run OuterAnalysisManagerProxy as often which is probably good for compile time
This was added to illustrate the lossy preservation of BPI in 452714f

But it fails now because we don't run OuterAnalysisManagerProxy beforehand. I'm not sure if it's still a useful thing to test given that llvm#159516 removes BPI from the adaptor, so I've gone ahead and deleted the test
@lukel97 lukel97 force-pushed the simpleloopunswitch-remove-bfi branch from 39d102c to 754f242 Compare September 18, 2025 12:57
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall checking for hot/cold loops in principle makes more sense to me; i.e. no point in unswitching a loop that's never executed in a hot function.

Not regressing SPEC is great, but still leaves a lot of room to regress other workloads. I don't really have any concrete suggestions at the moment; if we can't easily check for hotness of loops I guess this is fine for now and we can revisit if/when we discover regressions

@teresajohnson
Copy link
Contributor

Overall checking for hot/cold loops in principle makes more sense to me; i.e. no point in unswitching a loop that's never executed in a hot function.

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

Not regressing SPEC is great, but still leaves a lot of room to regress other workloads. I don't really have any concrete suggestions at the moment; if we can't easily check for hotness of loops I guess this is fine for now and we can revisit if/when we discover regressions

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

The main motivation is to remove the last use of BlockFrequencyInfo from within a loop pass, because I'm not sure if there's much value in relying on a potentially inaccurate result given that it's only preserved lossily.

I noticed this originally when some of the lossy results escaped into the loop vectorizer in #157888, causing BFI to return some bogus frequencies. Not off by one or two etc, but returning the frequencies for different blocks altogether since the CFG had been modified. I'm a bit worried that SimpleLoopUnswitch is also affected by this e.g. thinking a loop is cold when it's not.

There is a secondary motivation that avoiding code size increases should be handled by PGOForceFunctionAttrsPass. I think it's somewhat coincidental that the original patch checked that the loops were cold instead of the function. As Nikita mentions, fb45f3c seems to suggest that the original author was fine with checking the function.

@teresajohnson
Copy link
Contributor

Agree, this change isn't going to be equivalent. I'm a little unclear of the problem being solved by this PR. What is the main motivation? There's a list of benefits but I'm not sure how important they are.

The main motivation is to remove the last use of BlockFrequencyInfo from within a loop pass, because I'm not sure if there's much value in relying on a potentially inaccurate result given that it's only preserved lossily.

I noticed this originally when some of the lossy results escaped into the loop vectorizer in #157888, causing BFI to return some bogus frequencies. Not off by one or two etc, but returning the frequencies for different blocks altogether since the CFG had been modified. I'm a bit worried that SimpleLoopUnswitch is also affected by this e.g. thinking a loop is cold when it's not.

Isn't this a larger problem though - downstream function passes such as BB layout will also be affected by incorrect BFI. I'm unclear why disabling use of BFI in this particular pass is the right approach to this issue.

There is a secondary motivation that avoiding code size increases should be handled by PGOForceFunctionAttrsPass.

I'm not sure I agree here - why should code size increases from cold code only be gated at the function level? (I see you filed #159595, that seems like a good approach longer term.)

I think it's somewhat coincidental that the original patch checked that the loops were cold instead of the function. As Nikita mentions, fb45f3c seems to suggest that the original author was fine with checking the function.

These messages reference SPEC, though, which isn't always the best reflector of real world codes.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2025

Isn't this a larger problem though - downstream function passes such as BB layout will also be affected by incorrect BFI. I'm unclear why disabling use of BFI in this particular pass is the right approach to this issue.

Incorrect BFI leaking through to function passes was fixed in #157888 (wasn't exactly the easiest thing to debug 😅). AFAIK SimpleLoopUnswitch is now the last user of lossy BFI.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 23, 2025

@fhahn @teresajohnson I did some more experimenting and I think the usage of lossy BFI is a bigger problem than I had initially expected.

I changed SimpeLoopUnswitch to always invalidate BlockFrequencyAnalysis and BranchProbabilityAnalysis beforehand (not including other analyses), so that the BFI data is always fresh. On arm64-apple-darwin -O3 there's a large difference in the number of "cold" loops skipped, specifically a lot that previously were considered cold but no longer aren't:

lhs = lossy BFI
rhs = correct BFI

Program                                       simple-loop-unswitch.NumColdSkipped                
                                              lhs                                 rhs     diff
MultiSourc...ch/consumer-lame/consumer-lame    283.00                              282.00   -0.4%
MultiSourc...sumer-typeset/consumer-typeset    571.00                              564.00   -1.2%
MultiSource/Applications/siod/siod             176.00                              173.00   -1.7%
MultiSourc...e/Applications/SIBsim4/SIBsim4     56.00                               55.00   -1.8%
MultiSourc.../Applications/JM/lencod/lencod   1247.00                             1220.00   -2.2%
MultiSourc...arks/DOE-ProxyApps-C/CoMD/CoMD     75.00                               73.00   -2.7%
MultiSourc...xyApps-C/Pathfinder/PathFinder     65.00                               63.00   -3.1%
MultiSourc...OE-ProxyApps-C/XSBench/XSBench     20.00                               19.00   -5.0%
MultiSource/Applications/d/make_dparser        153.00                              143.00   -6.5%
MultiSource/Benchmarks/nbench/nbench            82.00                               76.00   -7.3%
MultiSource/Applications/oggenc/oggenc         472.00                              434.00   -8.1%
MultiSourc...hmarks/MallocBench/cfrac/cfrac     23.00                               21.00   -8.7%
MultiSourc...Benchmarks/SciMark2-C/scimark2     32.00                               29.00   -9.4%
MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk     89.00                               80.00  -10.1%
MultiSourc...rsaBench/beamformer/beamformer     19.00                               17.00  -10.5%
MultiSourc...e/Applications/obsequi/Obsequi     70.00                               62.00  -11.4%
MultiSource/Benchmarks/McCat/18-imp/imp         30.00                               26.00  -13.3%
MultiSourc...e-bitcount/automotive-bitcount      7.00                                6.00  -14.3%
MultiSource/Benchmarks/sim/sim                   7.00                                6.00  -14.3%
MultiSourc...SC_Sequoia/CrystalMk/CrystalMk     12.00                               10.00  -16.7%
MultiSource/Applications/sgefa/sgefa            48.00                               40.00  -16.7%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench      6.00                                5.00  -16.7%
MultiSourc...Benchmarks/Ptrdist/yacr2/yacr2     35.00                               28.00  -20.0%
MultiSourc...marks/Prolangs-C/bison/mybison     75.00                               60.00  -20.0%
MultiSource/Benchmarks/Olden/mst/mst            15.00                               11.00  -26.7%
MultiSourc.../Rodinia/pathfinder/pathfinder     11.00                                8.00  -27.3%
MultiSource/Benchmarks/Olden/em3d/em3d          20.00                               14.00  -30.0%
MultiSourc.../Benchmarks/VersaBench/bmm/bmm     26.00                               18.00  -30.8%
MultiSourc...hmarks/FreeBench/neural/neural     28.00                               19.00  -32.1%
MultiSourc...enchmarks/VersaBench/dbms/dbms      6.00                                4.00  -33.3%
MultiSourc.../telecomm-CRC32/telecomm-CRC32      3.00                                2.00  -33.3%
MultiSourc...chmarks/VersaBench/8b10b/8b10b      5.00                                3.00  -40.0%
MultiSource/Benchmarks/McCat/09-vor/vor          2.00                                1.00  -50.0%
MultiSourc...chmarks/McCat/04-bisect/bisect      6.00                                3.00  -50.0%
MultiSource/Benchmarks/Ptrdist/ks/ks             8.00                                4.00  -50.0%
MultiSourc...Benchmarks/Olden/health/health      2.00                                1.00  -50.0%
MultiSourc...marks/Trimaran/enc-rc4/enc-rc4     16.00                                8.00  -50.0%
MultiSourc...rks/Trimaran/enc-3des/enc-3des     12.00                                3.00  -75.0%
MultiSourc...rks/BitBench/uuencode/uuencode      5.00                                1.00  -80.0%
MultiSourc...arks/FreeBench/distray/distray      2.00                                0.00 -100.0%
MultiSourc...reeBench/pcompress2/pcompress2      1.00                                0.00 -100.0%

This translates to a significant number of loops that we are erroneously not unswitching because we mistakenly think they are cold, 4.3% geomean overall.

Metric: simple-loop-unswitch.NumBranches

Program                                       simple-loop-unswitch.NumBranches              
                                              lhs                              rhs    diff  
Applications/aha/aha                            1.00                             2.00 100.0%
Benchmarks/McCat/09-vor/vor                     1.00                             2.00 100.0%
Benchmarks/VersaBench/bmm/bmm                   6.00                            11.00  83.3%
Benchmarks/ASC_Sequoia/IRSmk/IRSmk              2.00                             3.00  50.0%
Benchmarks/nbench/nbench                        4.00                             6.00  50.0%
Benchmarks/ASC_Sequoia/AMGmk/AMGmk             11.00                            16.00  45.5%
Benchmarks...sumer-typeset/consumer-typeset    10.00                            14.00  40.0%
Benchmarks...iabench/g721/g721encode/encode     3.00                             4.00  33.3%
Benchmarks.../Rodinia/pathfinder/pathfinder     3.00                             4.00  33.3%
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD           13.00                            17.00  30.8%
Benchmarks/McCat/18-imp/imp                    13.00                            17.00  30.8%
Benchmarks/FreeBench/neural/neural             10.00                            13.00  30.0%
Benchmarks...nch/mpeg2/mpeg2dec/mpeg2decode     4.00                             5.00  25.0%
Benchmarks/Trimaran/enc-3des/enc-3des           4.00                             5.00  25.0%
Benchmarks.../DOE-ProxyApps-C++/HPCCG/HPCCG     4.00                             5.00  25.0%
                           Geomean difference                                           4.3%

For this reason alone I think we should remove the use of BlockFrequencyInfo as it's very likely that it's impeding performance. In the above results, the increase in loops unswitched all come from hot loops.

@teresajohnson
Copy link
Contributor

Thanks for the data. That does look pretty compelling. We may have some binary size or compile time increase for loops that still are cold in non-cold functions - do you have any data on how many cold loops (after forcing BFI/BPA invalidation before the pass as above) will be unnecessarily unswitched after this change due to being in non-cold functions?

Also can you check a couple of instances where the loop was cold until BFI/BPA were invalidated and see why they were cold - were they new loops created by previous loop transformations? With any eventual support for loop attributes we'd need to ensure they were copied or synthesized on such new loop nests appropriately.

But generally I think this change makes sense. However, I have a question on a test change - it isn't clear to me whether the effects of this PR are being captured in a test currently.

define void @_Z11hotFunctionbiiPiS_S_(i1 %cond, i32 %M, i32 %N, ptr %A, ptr %B, ptr %C) !prof !36 {
; CHECK-LABEL: define void @_Z11hotFunctionbiiPiS_S_
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) !prof [[PROF16:![0-9]+]] {
; CHECK-SAME: (i1 [[COND:%.*]], i32 [[M:%.*]], i32 [[N:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) #[[ATTR0:[0-9]+]] {{.*}}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATTR0 is never matched with anything later on - what is this trying to check? I'm a little unsure of what changed here as per the comments and the function name this is a cold loop nest in a hot function, so shouldn't we stop seeing it be unswitched with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the ATTR0 tag that seems to be some fallout from regenerating with UTC, I've separated that in 9393f23.

As for the function name that is indeed confusing. Looking at the profile data for it the entry count is only 1:

!36 = !{!"function_entry_count", i64 1}

So I think this function is actually cold and the name is just misleading. I've renamed it to reflect that in 50117ac

@lukel97
Copy link
Contributor Author

lukel97 commented Oct 16, 2025

Thanks for the review, apologies for the delay in getting back:

We may have some binary size or compile time increase for loops that still are cold in non-cold functions - do you have any data on how many cold loops (after forcing BFI/BPA invalidation before the pass as above) will be unnecessarily unswitched after this change due to being in non-cold functions?

There's slightly more cold loops unnecessarily unswitched but it only seems to result in a binary size increase in a few benchmarks, the geomean difference is +0.1%:

./utils/compare.py  results.arm64-apple-darwin-O3-pgo.159522.before.json vs results.arm64-apple-darwin-O3-pgo.159522.after.json -m size
Tests: 317
Metric: size

Program                                       size                       
                                              lhs        rhs        diff 
SingleSour...arks/Adobe-C++/stepanov_vector     72320.00   88832.00 22.8%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg    198160.00  214672.00  8.3%
MultiSourc...enchmarks/mafft/pairlocalalign    373840.00  390352.00  4.4%
MultiSource/Applications/SPASS/SPASS           648112.00  664624.00  2.5%
MultiSourc.../Applications/JM/lencod/lencod    860480.00  876992.00  1.9%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4   1018112.00 1034624.00  1.6%
MicroBench...on/LoopVectorizationBenchmarks    470368.00  470608.00  0.1%
MultiSourc...ALAC/decode/alacconvert-decode     75312.00   75328.00  0.0%
MultiSourc.../MallocBench/espresso/espresso    202608.00  202624.00  0.0%
SingleSour...enchmarks/Misc-C++/mandel-text     33472.00   33472.00  0.0%
SingleSour...ce/Benchmarks/Misc/ReedSolomon     33856.00   33856.00  0.0%
SingleSour...chmarks/Misc-C++/stepanov_v1p2     52432.00   52432.00  0.0%
SingleSour...ks/Misc-C++/stepanov_container     57232.00   57232.00  0.0%
SingleSour...enchmarks/Misc-C++/oopack_v1p8     54112.00   54112.00  0.0%
MicroBench...marks/Builtins/Int128/Builtins    353360.00  353360.00  0.0%
                           Geomean difference                        0.1%

With -mllvm -pgo-cold-func-opt=optsize to annotate the function as optsize this mostly disappears:

Tests: 297
Metric: size

Program                                       size                     
                                              lhs       rhs       diff 
SingleSour...arks/Adobe-C++/stepanov_vector    72320.00  88832.00 22.8%
MultiSourc...ALAC/decode/alacconvert-decode    75312.00  75328.00  0.0%
SingleSource/Benchmarks/Misc/ffbench           33744.00  33744.00  0.0%
SingleSource/Benchmarks/Misc/evalloop          33744.00  33744.00  0.0%
SingleSource/Benchmarks/Misc/dt                33616.00  33616.00  0.0%
SingleSour...Misc/aarch64-init-cpu-features    33856.00  33856.00  0.0%
SingleSour...ce/Benchmarks/Misc/ReedSolomon    33856.00  33856.00  0.0%
SingleSour...chmarks/Misc-C++/stepanov_v1p2    52432.00  52432.00  0.0%
SingleSour...ks/Misc-C++/stepanov_container    57232.00  57232.00  0.0%
SingleSour...enchmarks/Misc-C++/oopack_v1p8    54160.00  54160.00  0.0%
SingleSour...enchmarks/Misc-C++/mandel-text    33472.00  33472.00  0.0%
SingleSource/Benchmarks/Misc-C++/bigfib        58128.00  58128.00  0.0%
SingleSour...rks/Misc-C++/Large/sphereflake    36960.00  36960.00  0.0%
SingleSour.../Benchmarks/Misc-C++/Large/ray    54288.00  54288.00  0.0%
SingleSour.../Benchmarks/Misc-C++-EH/spirit   431248.00 431248.00  0.0%
                           Geomean difference                      0.1%

Also can you check a couple of instances where the loop was cold until BFI/BPA were invalidated and see why they were cold - were they new loops created by previous loop transformations? With any eventual support for loop attributes we'd need to ensure they were copied or synthesized on such new loop nests appropriately.

From what I can tell, without invalidation BFI returns the block frequencies for the wrong loop entirely, i.e. it's not that the block frequencies of the loop itself weren't being updated correctly. I'm not sure if it's feasible to keep the block frequencies updated for new loops introduced by loop passes, but agreed that for loop attributes we would need to propagate them.

lukel97 added a commit that referenced this pull request Oct 16, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 16, 2025
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this, lgtm

@lukel97 lukel97 enabled auto-merge (squash) October 17, 2025 23:00
@lukel97 lukel97 merged commit df89564 into llvm:main Oct 17, 2025
9 of 10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 17, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm,polly at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/25931

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: commands/help/TestHelp.py (191 of 2342)
PASS: lldb-api :: commands/platform/basic/TestPlatformPython.py (192 of 2342)
PASS: lldb-api :: commands/platform/basic/TestPlatformCommand.py (193 of 2342)
PASS: lldb-api :: commands/memory/write/TestMemoryWrite.py (194 of 2342)
PASS: lldb-api :: commands/platform/file/close/TestPlatformFileClose.py (195 of 2342)
PASS: lldb-api :: commands/platform/file/read/TestPlatformFileRead.py (196 of 2342)
PASS: lldb-api :: commands/memory/read/TestMemoryRead.py (197 of 2342)
UNSUPPORTED: lldb-api :: commands/platform/sdk/TestPlatformSDK.py (198 of 2342)
PASS: lldb-api :: commands/platform/connect/TestPlatformConnect.py (199 of 2342)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (200 of 2342)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision df8956442feda0171fda79393778f698eb0385cf)
  clang revision df8956442feda0171fda79393778f698eb0385cf
  llvm revision df8956442feda0171fda79393778f698eb0385cf
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
======================================================================
ERROR: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 156, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads/TestGuiSpawnThreads.py", line 44, in test_gui
    self.child.expect_exact(f"thread #{i + 2}: tid =")
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 122, in eof
    raise exc
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0xe054a5ee5420>
command: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb
args: ['/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear --all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.disable-aslr false', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'settings set show-statusline false', '--file', '/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out']
buffer (last 100 chars): b''
before (last 100 chars): b'thread_create.c:442:8\n#29 0x0000fe2f25039e9c ./misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:82:0\n'
after: <class 'pexpect.exceptions.EOF'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants