Skip to content

Conversation

@usha1830
Copy link
Contributor

When there is an indirect call in a function F which is not resolved until whole program devirtualization, the function attribute cannot infer norecurse attribute for function F.

GlobalOpt depends on the norecurse attribute for localizing some global variable which are stored once in function F and used only within F.

However, in the current pass pipeline, GlobalOpt is invoked before instance of FunctionAttr pass after whole program devirtualization and hence some optimization opportunities are missed.

Current pass order for default LTO pipeline:

    CGSCCPassManager CGPM;
   CGPM.addPass(PostOrderFunctionAttrsPass());  // At this point,  some indirect calls are not resolved
    ... (more passes for CGPM)
    MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
    ... (more MPM passes)
    MPM.addPass(WholeProgramDevirtPass(ExportSummary, nullptr));  // Indirect call resolved
    .. (more MPM passes)                                           
    MPM.addPass(GlobalOptPass());
    ...( more MPM and FPM passes) 
    MPM.addPass(GlobalOptPass());
    ... (more passes )
    MPM.addPass(
      createModuleToPostOrderCGSCCPassAdaptor(PostOrderFunctionAttrsPass()));   // GlobalOpt called earlier.

This PR moves the last instance of PostOrderFunctionAttrsPass before the preceding GlobalOpt Pass to realize the optimization based on norecurse attribute.

…ion of variables based on norecurse attribute
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Usha Gupta (usha1830)

Changes

When there is an indirect call in a function F which is not resolved until whole program devirtualization, the function attribute cannot infer norecurse attribute for function F.

GlobalOpt depends on the norecurse attribute for localizing some global variable which are stored once in function F and used only within F.

However, in the current pass pipeline, GlobalOpt is invoked before instance of FunctionAttr pass after whole program devirtualization and hence some optimization opportunities are missed.

Current pass order for default LTO pipeline:

    CGSCCPassManager CGPM;
   CGPM.addPass(PostOrderFunctionAttrsPass());  // At this point,  some indirect calls are not resolved
    ... (more passes for CGPM)
    MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
    ... (more MPM passes)
    MPM.addPass(WholeProgramDevirtPass(ExportSummary, nullptr));  // Indirect call resolved
    .. (more MPM passes)                                           
    MPM.addPass(GlobalOptPass());
    ...( more MPM and FPM passes) 
    MPM.addPass(GlobalOptPass());
    ... (more passes )
    MPM.addPass(
      createModuleToPostOrderCGSCCPassAdaptor(PostOrderFunctionAttrsPass()));   // GlobalOpt called earlier.

This PR moves the last instance of PostOrderFunctionAttrsPass before the preceding GlobalOpt Pass to realize the optimization based on norecurse attribute.


Full diff: https://github.com/llvm/llvm-project/pull/143535.diff

3 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+5-4)
  • (modified) llvm/test/Other/new-pm-lto-defaults.ll (+1-1)
  • (added) llvm/test/Transforms/GlobalOpt/passorder_wpd_funcattrs_globalopt.ll (+91)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index a99146d5eaa34..60ec777841799 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -2022,7 +2022,11 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
         /*Summary=*/nullptr,
         PGOOpt && PGOOpt->Action == PGOOptions::SampleUse));
 
-  // Optimize globals again after we ran the inliner.
+  MPM.addPass(
+      createModuleToPostOrderCGSCCPassAdaptor(PostOrderFunctionAttrsPass()));
+
+  // Optimize globals again after we ran the inliner and function attribute
+  // inference.
   MPM.addPass(GlobalOptPass());
 
   // Run the OpenMPOpt pass again after global optimizations.
@@ -2075,9 +2079,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM),
                                                 PTO.EagerlyInvalidateAnalyses));
 
-  MPM.addPass(
-      createModuleToPostOrderCGSCCPassAdaptor(PostOrderFunctionAttrsPass()));
-
   // Require the GlobalsAA analysis for the module so we can query it within
   // MainFPM.
   if (EnableGlobalAnalyses) {
diff --git a/llvm/test/Other/new-pm-lto-defaults.ll b/llvm/test/Other/new-pm-lto-defaults.ll
index 3aea0f2061f3e..cc0d1c166d723 100644
--- a/llvm/test/Other/new-pm-lto-defaults.ll
+++ b/llvm/test/Other/new-pm-lto-defaults.ll
@@ -83,6 +83,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: InlinerPass
 ; CHECK-O23SZ-NEXT: Running pass: InlinerPass
 ; CHECK-O23SZ-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O23SZ-NEXT: Running pass: PostOrderFunctionAttrsPass on (foo)
 ; CHECK-O23SZ-NEXT: Running pass: GlobalOptPass
 ; CHECK-O23SZ-NEXT: Running pass: OpenMPOptPass
 ; CHECK-O23SZ-NEXT: Running pass: GlobalDCEPass
@@ -98,7 +99,6 @@
 ; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
 ; CHECK-O23SZ-NEXT: Running pass: SROAPass on foo
 ; CHECK-O23SZ-NEXT: Running pass: TailCallElimPass on foo
-; CHECK-O23SZ-NEXT: Running pass: PostOrderFunctionAttrsPass on (foo)
 ; CHECK-O23SZ-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
 ; CHECK-O23SZ-NEXT: Running analysis: GlobalsAA on [module]
 ; CHECK-O23SZ-NEXT: Running analysis: CallGraphAnalysis on [module]
diff --git a/llvm/test/Transforms/GlobalOpt/passorder_wpd_funcattrs_globalopt.ll b/llvm/test/Transforms/GlobalOpt/passorder_wpd_funcattrs_globalopt.ll
new file mode 100644
index 0000000000000..77ffff911ae68
--- /dev/null
+++ b/llvm/test/Transforms/GlobalOpt/passorder_wpd_funcattrs_globalopt.ll
@@ -0,0 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 5
+; RUN: opt -passes="wholeprogramdevirt,globalopt" -whole-program-visibility -S %s -o - | FileCheck %s --check-prefix=CHECK-BEFORE
+; RUN: opt -passes="wholeprogramdevirt,function-attrs,globalopt" -whole-program-visibility -S %s -o - | FileCheck %s --check-prefix=CHECK-AFTER
+
+; This test demonstrates a case simulating the pass order in LTO default
+; pass pipeline. In this case, there is an indirect call in a function 
+; which is only resolved by whole program devirtualization.
+; This call resolution leads Function Attr pass to infer norecurse attribute
+; for this function. This function stores a global variable (only once) and
+; there is a use of this global variable within this function.
+; Based on the norecurse attribute, GlobalOpt pass can localize this global
+; variable. However, in current pass pipeline, the pass sequence after whole
+; program devirtualization is as follows:
+; Whole Program Devirtualization -> (more passes) -> GlobalOpt -> (more passes)
+; -> PostOrderFunctionAttr 
+; This test verifies that adding Function Attribute pass between Whole program
+; devirtualization and GlobalOpt would help in this kind of cases.
+
+; Global variable which should become unused after GlobalOpt
+@testptr = internal unnamed_addr global double 0.0, align 8
+
+declare double @sqrt(double) readnone nounwind nocallback
+declare { ptr, i1 } @llvm.type.checked.load(ptr, i32, metadata)
+@llvm.compiler.used = appending global [1 x ptr] [ptr @call_through_fp], section "llvm.metadata"
+
+; Indirect call target
+define dso_local void @target(ptr %a) !type !0 {
+; CHECK-BEFORE-LABEL: define dso_local void @target(
+; CHECK-BEFORE-SAME: ptr [[A:%.*]]) !type [[META0:![0-9]+]] {
+; CHECK-BEFORE-NEXT:    ret void
+;
+; CHECK-AFTER: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+; CHECK-AFTER-LABEL: define dso_local void @target(
+; CHECK-AFTER-SAME: ptr readnone captures(none) [[A:%.*]]) #[[ATTR1:[0-9]+]] !type [[META0:![0-9]+]] {
+; CHECK-AFTER-NEXT:    ret void
+;
+  ret void
+}
+
+@vtable_for_target = constant { ptr } { ptr @target }, !type !0
+
+define dso_local void @call_through_fp() {
+; CHECK-BEFORE-LABEL: define dso_local void @call_through_fp() {
+; CHECK-BEFORE-NEXT:  [[ENTRY:.*:]]
+; CHECK-BEFORE-NEXT:    [[VTABLE_ENTRY:%.*]] = getelementptr { ptr }, ptr @vtable_for_target, i32 0, i32 0
+; CHECK-BEFORE-NEXT:    [[TMP0:%.*]] = getelementptr i8, ptr [[VTABLE_ENTRY]], i32 0
+; CHECK-BEFORE-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8
+; CHECK-BEFORE-NEXT:    call void @target(ptr null)
+; CHECK-BEFORE-NEXT:    [[VAL:%.*]] = fadd double 1.000000e+00, 2.000000e+00
+; CHECK-BEFORE-NEXT:    store double [[VAL]], ptr @testptr, align 8
+; CHECK-BEFORE-NEXT:    [[X:%.*]] = load double, ptr @testptr, align 8
+; CHECK-BEFORE-NEXT:    [[RES2:%.*]] = call double @sqrt(double [[X]])
+; CHECK-BEFORE-NEXT:    ret void
+;
+; CHECK-AFTER: Function Attrs: norecurse nounwind memory(readwrite, argmem: none, inaccessiblemem: none)
+; CHECK-AFTER-LABEL: define dso_local void @call_through_fp(
+; CHECK-AFTER-SAME: ) #[[ATTR2:[0-9]+]] {
+; CHECK-AFTER-NEXT:  [[ENTRY:.*:]]
+; CHECK-AFTER-NEXT:    [[TESTPTR:%.*]] = alloca double, align 8
+; CHECK-AFTER-NEXT:    store double 0.000000e+00, ptr [[TESTPTR]], align 8
+; CHECK-AFTER-NEXT:    [[VTABLE_ENTRY:%.*]] = getelementptr { ptr }, ptr @vtable_for_target, i32 0, i32 0
+; CHECK-AFTER-NEXT:    [[TMP0:%.*]] = getelementptr i8, ptr [[VTABLE_ENTRY]], i32 0
+; CHECK-AFTER-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[TMP0]], align 8
+; CHECK-AFTER-NEXT:    call void @target(ptr null)
+; CHECK-AFTER-NEXT:    [[VAL:%.*]] = fadd double 1.000000e+00, 2.000000e+00
+; CHECK-AFTER-NEXT:    store double [[VAL]], ptr [[TESTPTR]], align 8
+; CHECK-AFTER-NEXT:    [[X:%.*]] = load double, ptr [[TESTPTR]], align 8
+; CHECK-AFTER-NEXT:    [[RES2:%.*]] = call double @sqrt(double [[X]])
+; CHECK-AFTER-NEXT:    ret void
+;
+entry:
+  %vtable_entry = getelementptr { ptr }, ptr @vtable_for_target, i32 0, i32 0
+  %res = call { ptr, i1 } @llvm.type.checked.load(ptr %vtable_entry, i32 0, metadata !"typeid")
+  %fptr = extractvalue { ptr, i1 } %res, 0
+  call void %fptr(ptr null)
+  %val = fadd double 1.0, 2.0
+  store double %val, ptr @testptr
+  %x = load double, ptr @testptr
+  %res2 = call double @sqrt(double %x)
+  ret void
+}
+
+!0 = !{i32 0, !"typeid", i32 0}
+!llvm.type.metadata = !{!1}
+!1 = !{ptr @vtable_for_target, !"typeid", i64 0}
+
+;.
+; CHECK-BEFORE: [[META0]] = !{i32 0, !"typeid", i32 0}
+;.
+; CHECK-AFTER: [[META0]] = !{i32 0, !"typeid", i32 0}
+;.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a PhaseOrdering test using the full LTO pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic Thanks for reviewing.
Sorry, I didn't understand this clearly.
Do you mean it should be moved to PhaseOrdering test directory or replicated there? Or modified to use full LTO pipeline instead of just the 3 passes that I am currently using.

@usha1830 usha1830 marked this pull request as draft June 24, 2025 15:51
@usha1830 usha1830 closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants