Skip to content

Conversation

@wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Sep 2, 2025

Add a new option (under --mark-all-context-preinlined) that marks all function samples with the ContextShouldBeInlined attribute during post-processing to make the profile as preinlined. This can be useful for experiments outside of the CS preinliner, e.g. to fully replay the inlining for a given profile.

@WenleiHe
Copy link
Member

WenleiHe commented Sep 2, 2025

surprised that we don't have flag for this already. how did we do probe only (implies replay inlining) experiments earlier?

@wlei-llvm
Copy link
Contributor Author

surprised that we don't have flag for this already. how did we do probe only (implies replay inlining) experiments earlier?

Good question! Let me first confirm we’re aligned on term: the previous "probe-only" refers to using -ignore-stack-samples on llvm-profgen and no extra flags on compiler side, right? If so:

yes, we have done something to make it behave like "replay inlining", there are some settings on compiler side to remove the limits: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2053-L2056, set the max/min inlinelimit to int::max, so it then works as it will replay without any limit.

However, there are still some important and subtle differences:

one major is the inlining callsite threshold: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1386-L1406

  • In the previous probe-only, the cost is compared against a SampleThreshold which can be SampleColdCallSiteThreshold(45) and SampleHotCallSiteThreshold(3000) depending on the hotness.

  • In preinlined setting(this PR), it use the "always_inline" cost(getAlways("preinliner")).

Another one is for the external/importing functions: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1060-L1062
In preinliner setting, the threshold is zero while in previous probe-only, it's the HotCountThreshold.

Therefore, the previous "probe-only" was not a pure replay.

I don't remember whether this is a "bug"(that we intentionally want a pure reply but missed the above setting) or it's just by design as we may just want it to do more like the classic line-number based AutoFDO which has those limits.

Yeah, alternatively we can have a "fix" in compiler to remove the limits for all pseudo-probe based profile if it's a "bug".

@WenleiHe
Copy link
Member

WenleiHe commented Sep 8, 2025

surprised that we don't have flag for this already. how did we do probe only (implies replay inlining) experiments earlier?

Good question! Let me first confirm we’re aligned on term: the previous "probe-only" refers to using -ignore-stack-samples on llvm-profgen and no extra flags on compiler side, right? If so:

yes, we have done something to make it behave like "replay inlining", there are some settings on compiler side to remove the limits: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2053-L2056, set the max/min inlinelimit to int::max, so it then works as it will replay without any limit.

However, there are still some important and subtle differences:

one major is the inlining callsite threshold: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1386-L1406

  • In the previous probe-only, the cost is compared against a SampleThreshold which can be SampleColdCallSiteThreshold(45) and SampleHotCallSiteThreshold(3000) depending on the hotness.
  • In preinlined setting(this PR), it use the "always_inline" cost(getAlways("preinliner")).

Another one is for the external/importing functions: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1060-L1062 In preinliner setting, the threshold is zero while in previous probe-only, it's the HotCountThreshold.

Therefore, the previous "probe-only" was not a pure replay.

I don't remember whether this is a "bug"(that we intentionally want a pure reply but missed the above setting) or it's just by design as we may just want it to do more like the classic line-number based AutoFDO which has those limits.

Yeah, alternatively we can have a "fix" in compiler to remove the limits for all pseudo-probe based profile if it's a "bug".

I was thinking about csspgo-replay-preinline: https://github.com/llvm/llvm-project/blame/main/llvm/tools/llvm-profgen/CSPreInliner.cpp#L50

I thought we run pre-inliner as long as csspgo-preinliner is on, and that flag is on by default and also not affected by whether we ignore stack samples?

Judging from your change, it seems like we don't run pre-inliner for probe-only (-ignore-stack-samples)?

@wlei-llvm
Copy link
Contributor Author

I was thinking about csspgo-replay-preinline: https://github.com/llvm/llvm-project/blame/main/llvm/tools/llvm-profgen/CSPreInliner.cpp#L50

I see, thanks for the check! Seems like it's indeed equivalent, IIRC, the ContextWasInlined is from the leaf frame's dwarf/probe context so it should be same to the context directly from the length-1 call-stack (-ignore-stack-samples).

I thought we run pre-inliner as long as csspgo-preinliner is on, and that flag is on by default and also not affected by whether we ignore stack samples?

Judging from your change, it seems like we don't run pre-inliner for probe-only (-ignore-stack-samples)?

IIRC, csspgo-preinliner is only available in CS mode: https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/ProfileGenerator.cpp#L1109C7-L1115, so we don't run/have this for -ignore-stack-samples. For my previous experiments, I always use the -ignore-stack-samples mode because it consumes significantly less memory and running time.

@WenleiHe
Copy link
Member

WenleiHe commented Sep 9, 2025

IIRC, csspgo-preinliner is only available in CS mode: https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/ProfileGenerator.cpp#L1109C7-L1115, so we don't run/have this for -ignore-stack-samples. For my previous experiments, I always use the -ignore-stack-samples mode because it consumes significantly less memory and running time.

Ok.. that explains it. In this case the change looks good. Good fine, thanks!

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a nit on naming.

cl::cat(ProfGenCategory));

static cl::opt<bool> ForceProfilePreinlined(
"force-profile-preinlined",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mark-all-context-preinlined? we are not force doing actual reinline, but only marking preinlined flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion!

@llvmbot llvmbot added the PGO Profile Guided Optimizations label Sep 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

Add a new flag (--force-profile-preinlined) that marks all function samples with the ContextShouldBeInlined attribute during post-processing. This can be useful for experiments outside of the preinliner, e.g. to fully replay the inlining for a given profile.


Full diff: https://github.com/llvm/llvm-project/pull/156501.diff

6 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/SampleProf.h (+11)
  • (modified) llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test (+24-1)
  • (modified) llvm/test/tools/llvm-profgen/inline-noprobe.test (+36-14)
  • (modified) llvm/test/tools/llvm-profgen/inline-pseudoprobe.test (+22)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+17)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.h (+2)
diff --git a/llvm/include/llvm/ProfileData/SampleProf.h b/llvm/include/llvm/ProfileData/SampleProf.h
index fb2d4d3cc50ed..a626071d23915 100644
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@@ -868,6 +868,17 @@ class FunctionSamples {
     }
   }
 
+  // Propagate the given attribute to this profile context and all callee
+  // contexts.
+  void setContextAttribute(ContextAttributeMask Attr) {
+    Context.setAttribute(Attr);
+    for (auto &I : CallsiteSamples) {
+      for (auto &CS : I.second) {
+        CS.second.setContextAttribute(Attr);
+      }
+    }
+  }
+
   // Query the stale profile matching results and remap the location.
   const LineLocation &mapIRLocToProfileLoc(const LineLocation &IRLoc) const {
     // There is no remapping if the profile is not stale or the matching gives
diff --git a/llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test b/llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
index 205e467091352..5d0d3d3f3a9ca 100644
--- a/llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
+++ b/llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
@@ -6,6 +6,9 @@
 ; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER-OFFSET2
 ; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0 --csspgo-preinliner=0 --gen-cs-nested-profile=0
 ; RUN: FileCheck %s --input-file %t
+; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0 --csspgo-preinliner=0 --gen-cs-nested-profile=0 --mark-all-context-preinlined
+; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-MARK-PREINLINED
+
 
 ; CHECK:     [main:2 @ foo]:74:0
 ; CHECK-NEXT: 1: 0
@@ -18,10 +21,30 @@
 ; CHECK-NEXT: 8: 14 bar:14
 ; CHECK-NEXT: 9: 0
 ; CHECK-NEXT: !CFGChecksum: 563088904013236
-; CHECK:[main:2 @ foo:8 @ bar]:28:14
+; CHECK-NEXT: !Attributes: 1
+; CHECK-NEXT:[main:2 @ foo:8 @ bar]:28:14
 ; CHECK-NEXT: 1: 14
 ; CHECK-NEXT: 4: 14
 ; CHECK-NEXT: !CFGChecksum: 72617220756
+; CHECK-NEXT: !Attributes: 1
+
+; CHECK-MARK-PREINLINED:     [main:2 @ foo]:74:0
+; CHECK-MARK-PREINLINED-NEXT: 1: 0
+; CHECK-MARK-PREINLINED-NEXT: 2: 15
+; CHECK-MARK-PREINLINED-NEXT: 3: 15
+; CHECK-MARK-PREINLINED-NEXT: 4: 14
+; CHECK-MARK-PREINLINED-NEXT: 5: 1
+; CHECK-MARK-PREINLINED-NEXT: 6: 15
+; CHECK-MARK-PREINLINED-NEXT: 7: 0
+; CHECK-MARK-PREINLINED-NEXT: 8: 14 bar:14
+; CHECK-MARK-PREINLINED-NEXT: 9: 0
+; CHECK-MARK-PREINLINED-NEXT: !CFGChecksum: 563088904013236
+; CHECK-MARK-PREINLINED-NEXT: !Attributes: 3
+; CHECK-MARK-PREINLINED-NEXT:[main:2 @ foo:8 @ bar]:28:14
+; CHECK-MARK-PREINLINED-NEXT: 1: 14
+; CHECK-MARK-PREINLINED-NEXT: 4: 14
+; CHECK-MARK-PREINLINED-NEXT: !CFGChecksum: 72617220756
+; CHECK-MARK-PREINLINED-NEXT: !Attributes: 3
 
 ; CHECK-UNWINDER:      3
 ; CHECK-UNWINDER-NEXT: 201800-201858:1
diff --git a/llvm/test/tools/llvm-profgen/inline-noprobe.test b/llvm/test/tools/llvm-profgen/inline-noprobe.test
index f960f074fefff..421dfcdb78e33 100644
--- a/llvm/test/tools/llvm-profgen/inline-noprobe.test
+++ b/llvm/test/tools/llvm-profgen/inline-noprobe.test
@@ -8,26 +8,48 @@
 ; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK-UPDATE-TOTAL-SAMPLE
 ; RUN: llvm-profgen --format=text --use-dwarf-correlation --perfscript=%S/Inputs/inline-noprobe.perfscript --binary=%S/Inputs/inline-noprobe.perfbin --output=%t
 ; RUN: FileCheck %s --input-file %t --check-prefix=CHECK
+; RUN: llvm-profgen --format=text --use-dwarf-correlation --perfscript=%S/Inputs/inline-noprobe.perfscript --binary=%S/Inputs/inline-noprobe.perfbin --output=%t --mark-all-context-preinlined
+; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-MARK-PREINLINED
+
 ; RUN: echo -e "0\n0" > %t
 ; RUN: llvm-profgen --format=text --unsymbolized-profile=%t --binary=%S/Inputs/inline-noprobe.perfbin --output=%t1 --fill-zero-for-all-funcs
 ; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK-ALL-ZERO
 ; RUN: llvm-profgen --format=text --unsymbolized-profile=%S/Inputs/out-of-bounds.raw.prof --binary=%S/Inputs/inline-noprobe.perfbin --output=%t1
 ; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK-OB
 
-CHECK: main:2609:0
-CHECK:  0: 0
-CHECK:  2: 0
-CHECK:  1: foo:2609
-CHECK:   2.1: 42
-CHECK:   3: 62
-CHECK:   3.2: 21
-CHECK:   4: 0
-CHECK:   65526: 62
-CHECK:   3.1: bar:546
-CHECK:    1: 42
-CHECK:    65533: 42
-CHECK:   3.2: bar:189
-CHECK:    1: 21
+CHECK:      main:2609:0
+CHECK-NEXT:  0: 0
+CHECK-NEXT:  2: 0
+CHECK-NEXT:  1: foo:2609
+CHECK-NEXT:   2.1: 42
+CHECK-NEXT:   3: 62
+CHECK-NEXT:   3.2: 21
+CHECK-NEXT:   4: 0
+CHECK-NEXT:   65526: 62
+CHECK-NEXT:   3.1: bar:546
+CHECK-NEXT:    1: 42
+CHECK-NEXT:    65533: 42
+CHECK-NEXT:   3.2: bar:189
+CHECK-NEXT:    1: 21
+
+CHECK-MARK-PREINLINED:      main:2609:0
+CHECK-MARK-PREINLINED-NEXT:  0: 0
+CHECK-MARK-PREINLINED-NEXT:  2: 0
+CHECK-MARK-PREINLINED-NEXT:  1: foo:2609
+CHECK-MARK-PREINLINED-NEXT:   2.1: 42
+CHECK-MARK-PREINLINED-NEXT:   3: 62
+CHECK-MARK-PREINLINED-NEXT:   3.2: 21
+CHECK-MARK-PREINLINED-NEXT:   4: 0
+CHECK-MARK-PREINLINED-NEXT:   65526: 62
+CHECK-MARK-PREINLINED-NEXT:   3.1: bar:546
+CHECK-MARK-PREINLINED-NEXT:    1: 42
+CHECK-MARK-PREINLINED-NEXT:    65533: 42
+CHECK-MARK-PREINLINED-NEXT:    !Attributes: 2
+CHECK-MARK-PREINLINED-NEXT:   3.2: bar:189
+CHECK-MARK-PREINLINED-NEXT:    1: 21
+CHECK-MARK-PREINLINED-NEXT:    !Attributes: 2
+CHECK-MARK-PREINLINED-NEXT:   !Attributes: 2
+CHECK-MARK-PREINLINED-NEXT:  !Attributes: 2
 
 CHECK-UPDATE-TOTAL-SAMPLE: main:292:0
 CHECK-UPDATE-TOTAL-SAMPLE:  0: 0
diff --git a/llvm/test/tools/llvm-profgen/inline-pseudoprobe.test b/llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
index 2dacf7fef8f13..c417b6c7f47c6 100644
--- a/llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
+++ b/llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
@@ -1,5 +1,7 @@
 ; RUN: llvm-profgen --format=text --ignore-stack-samples --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0
 ; RUN: FileCheck %s --input-file %t
+; RUN: llvm-profgen --format=text --ignore-stack-samples --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t1 --profile-summary-cold-count=0 --mark-all-context-preinlined
+; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK-MARK-PREINLINED
 
 ; CHECK:     main:88:0
 ; CHECK-NEXT: 1: 0
@@ -19,6 +21,26 @@
 ; CHECK-NEXT:  !CFGChecksum: 563088904013236
 ; CHECK-NEXT: !CFGChecksum: 281479271677951
 
+; CHECK-MARK-PREINLINED:     main:88:0
+; CHECK-MARK-PREINLINED-NEXT: 1: 0
+; CHECK-MARK-PREINLINED-NEXT: 2: foo:88
+; CHECK-MARK-PREINLINED-NEXT:  1: 0
+; CHECK-MARK-PREINLINED-NEXT:  2: 15
+; CHECK-MARK-PREINLINED-NEXT:  3: 15
+; CHECK-MARK-PREINLINED-NEXT:  4: 14
+; CHECK-MARK-PREINLINED-NEXT:  5: 1
+; CHECK-MARK-PREINLINED-NEXT:  6: 15
+; CHECK-MARK-PREINLINED-NEXT:  7: 0
+; CHECK-MARK-PREINLINED-NEXT:  9: 0
+; CHECK-MARK-PREINLINED-NEXT:  8: bar:28
+; CHECK-MARK-PREINLINED-NEXT:   1: 14
+; CHECK-MARK-PREINLINED-NEXT:   4: 14
+; CHECK-MARK-PREINLINED-NEXT:   !CFGChecksum: 72617220756
+; CHECK-MARK-PREINLINED-NEXT:   !Attributes: 2 
+; CHECK-MARK-PREINLINED-NEXT:  !CFGChecksum: 563088904013236
+; CHECK-MARK-PREINLINED-NEXT:  !Attributes: 2
+; CHECK-MARK-PREINLINED-NEXT: !CFGChecksum: 281479271677951
+; CHECK-MARK-PREINLINED-NEXT: !Attributes: 2
 
 ; clang -O3 -fuse-ld=lld -fpseudo-probe-for-profiling
 ; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 33575b9c67625..c611e410ebd58 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -64,6 +64,12 @@ static cl::opt<bool>
                              "than threshold, it will be trimmed."),
                     cl::cat(ProfGenCategory));
 
+static cl::opt<bool> MarkAllContextPreinlined(
+    "mark-all-context-preinlined",
+    cl::desc("Mark all function samples as preinlined(set "
+             "ContextShouldBeInlined attribute)."),
+    cl::init(false));
+
 static cl::opt<bool> CSProfMergeColdContext(
     "csprof-merge-cold-context", cl::init(true),
     cl::desc("If the total count of context profile is smaller than "
@@ -511,10 +517,19 @@ void ProfileGenerator::generateProfile() {
   postProcessProfiles();
 }
 
+void ProfileGeneratorBase::markAllContextPreinlined(
+    SampleProfileMap &ProfileMap) {
+  for (auto &I : ProfileMap)
+    I.second.setContextAttribute(ContextShouldBeInlined);
+  FunctionSamples::ProfileIsPreInlined = true;
+}
+
 void ProfileGenerator::postProcessProfiles() {
   computeSummaryAndThreshold(ProfileMap);
   trimColdProfiles(ProfileMap, ColdCountThreshold);
   filterAmbiguousProfile(ProfileMap);
+  if (MarkAllContextPreinlined)
+    markAllContextPreinlined(ProfileMap);
   calculateAndShowDensity(ProfileMap);
 }
 
@@ -1130,6 +1145,8 @@ void CSProfileGenerator::postProcessProfiles() {
     FunctionSamples::ProfileIsCS = false;
   }
   filterAmbiguousProfile(ProfileMap);
+  if (MarkAllContextPreinlined)
+    markAllContextPreinlined(ProfileMap);
   ProfileGeneratorBase::calculateAndShowDensity(ProfileMap);
 }
 
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.h b/llvm/tools/llvm-profgen/ProfileGenerator.h
index d3e04563a81c2..dbf9d469063e1 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.h
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.h
@@ -123,6 +123,8 @@ class ProfileGeneratorBase {
 
   void showDensitySuggestion(double Density);
 
+  void markAllContextPreinlined(SampleProfileMap &ProfileMap);
+
   void collectProfiledFunctions();
 
   bool collectFunctionsFromRawProfile(

@wlei-llvm wlei-llvm changed the title [llvm-profgen] Add an option to force marking the profile as preinlined [llvm-profgen] Add an option to mark all the profile context as preinlined Sep 9, 2025
@wlei-llvm wlei-llvm merged commit 9bc4200 into llvm:main Sep 9, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants