Skip to content

Conversation

@teresajohnson
Copy link
Contributor

Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.

Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Jul 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.


Full diff: https://github.com/llvm/llvm-project/pull/148001.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+9)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp-recursive.ll (+44-7)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index c0f84456d2b27..3a7c3cd773b0a 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -181,6 +181,12 @@ static cl::opt<bool> AllowRecursiveContexts(
     "memprof-allow-recursive-contexts", cl::init(true), cl::Hidden,
     cl::desc("Allow cloning of contexts having recursive cycles"));
 
+// Set the minimum absolute count threshold for inlining of indirect calls
+// promoted during cloning.
+static cl::opt<unsigned> ICPInlineMinimumCountThreshold(
+    "memprof-icp-inline-minimum-count-threshold", cl::init(0), cl::Hidden,
+    cl::desc("Minimum absolute count for promoted target to be inlinable"));
+
 namespace llvm {
 cl::opt<bool> EnableMemProfContextDisambiguation(
     "enable-memprof-context-disambiguation", cl::init(false), cl::Hidden,
@@ -5573,6 +5579,9 @@ void MemProfContextDisambiguation::performICP(
                                  .getCallee());
         }
         DirectCall.setCalledFunction(TargetToUse);
+        if (ICPInlineMinimumCountThreshold &&
+            Candidate.Count < ICPInlineMinimumCountThreshold)
+          DirectCall.setIsNoInline();
         ORE.emit(OptimizationRemark(DEBUG_TYPE, "MemprofCall", CBClone)
                  << ore::NV("Call", CBClone) << " in clone "
                  << ore::NV("Caller", CBClone->getFunction())
diff --git a/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll b/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
index f8dcd80d4e141..9e2f6e6130829 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
@@ -54,7 +54,39 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
 ; RUN:  --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefixes=IR,IR-INLINE
+
+;; Next, add a threshold to prevent inlining of small count promoted calls.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-memprof-icp-inline-minimum-count-threshold=3 \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -memprof-allow-recursive-callsites \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes -stats \
+; RUN:  -pass-remarks=. -save-temps \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
+; RUN:  --check-prefix=REMARKS
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefixes=IR,IR-NOINLINE
 
 ; REMARKS: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
@@ -98,12 +130,14 @@
 ; IR:   %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
 ; IR:   br i1 %[[R1]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-NOINLINE:   call {{.*}} @_ZN1B3barEj(ptr null, i32 0) #[[NOINLINE:[0-9]+]]
 ; IR: if.false.orig_indirect:
 ; IR:   %[[R2:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
 ; IR:   br i1 %[[R2]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-NOINLINE:   call {{.*}} @_ZN2B03barEj(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
@@ -114,17 +148,20 @@
 ; IR:   %[[R3:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
 ; IR:   br i1 %[[R3]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-NOINLINE:   call {{.*}} @_ZN1B3barEj.memprof.1(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect:
 ; IR:   %[[R4:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
 ; IR:   br i1 %[[R4]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-NOINLINE:   call {{.*}} @_ZN2B03barEj.memprof.1(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
-; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
-; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-INLINE: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
+; IR-INLINE: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-NOINLINE: attributes #[[NOINLINE]] = { noinline }
 
 ;--- foo.ll
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-lto

Author: Teresa Johnson (teresajohnson)

Changes

Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.


Full diff: https://github.com/llvm/llvm-project/pull/148001.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+9)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp-recursive.ll (+44-7)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index c0f84456d2b27..3a7c3cd773b0a 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -181,6 +181,12 @@ static cl::opt<bool> AllowRecursiveContexts(
     "memprof-allow-recursive-contexts", cl::init(true), cl::Hidden,
     cl::desc("Allow cloning of contexts having recursive cycles"));
 
+// Set the minimum absolute count threshold for inlining of indirect calls
+// promoted during cloning.
+static cl::opt<unsigned> ICPInlineMinimumCountThreshold(
+    "memprof-icp-inline-minimum-count-threshold", cl::init(0), cl::Hidden,
+    cl::desc("Minimum absolute count for promoted target to be inlinable"));
+
 namespace llvm {
 cl::opt<bool> EnableMemProfContextDisambiguation(
     "enable-memprof-context-disambiguation", cl::init(false), cl::Hidden,
@@ -5573,6 +5579,9 @@ void MemProfContextDisambiguation::performICP(
                                  .getCallee());
         }
         DirectCall.setCalledFunction(TargetToUse);
+        if (ICPInlineMinimumCountThreshold &&
+            Candidate.Count < ICPInlineMinimumCountThreshold)
+          DirectCall.setIsNoInline();
         ORE.emit(OptimizationRemark(DEBUG_TYPE, "MemprofCall", CBClone)
                  << ore::NV("Call", CBClone) << " in clone "
                  << ore::NV("Caller", CBClone->getFunction())
diff --git a/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll b/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
index f8dcd80d4e141..9e2f6e6130829 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp-recursive.ll
@@ -54,7 +54,39 @@
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
 ; RUN:  --check-prefix=REMARKS
 
-; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefixes=IR,IR-INLINE
+
+;; Next, add a threshold to prevent inlining of small count promoted calls.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-memprof-icp-inline-minimum-count-threshold=3 \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -memprof-allow-recursive-callsites \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes -stats \
+; RUN:  -pass-remarks=. -save-temps \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
+; RUN:  --check-prefix=REMARKS
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefixes=IR,IR-NOINLINE
 
 ; REMARKS: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
@@ -98,12 +130,14 @@
 ; IR:   %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
 ; IR:   br i1 %[[R1]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-NOINLINE:   call {{.*}} @_ZN1B3barEj(ptr null, i32 0) #[[NOINLINE:[0-9]+]]
 ; IR: if.false.orig_indirect:
 ; IR:   %[[R2:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
 ; IR:   br i1 %[[R2]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-NOINLINE:   call {{.*}} @_ZN2B03barEj(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
@@ -114,17 +148,20 @@
 ; IR:   %[[R3:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
 ; IR:   br i1 %[[R3]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-NOINLINE:   call {{.*}} @_ZN1B3barEj.memprof.1(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect:
 ; IR:   %[[R4:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
 ; IR:   br i1 %[[R4]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-INLINE:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-NOINLINE:   call {{.*}} @_ZN2B03barEj.memprof.1(ptr null, i32 0) #[[NOINLINE]]
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
-; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
-; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-INLINE: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
+; IR-INLINE: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-NOINLINE: attributes #[[NOINLINE]] = { noinline }
 
 ;--- foo.ll
 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

@teresajohnson teresajohnson requested a review from snehasish July 10, 2025 16:59

// Set the minimum absolute count threshold for inlining of indirect calls
// promoted during cloning.
static cl::opt<unsigned> ICPInlineMinimumCountThreshold(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a var name with MemProf in there somewhere to distinguish it from general ICP knobs. Also this var allows inlining but doesn't guarantee it. How about renaming it to -- MemProfICPNoInlineThreshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Set the minimum absolute count threshold for inlining of indirect calls
// promoted during cloning.
static cl::opt<unsigned> ICPInlineMinimumCountThreshold(
"memprof-icp-inline-minimum-count-threshold", cl::init(0), cl::Hidden,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the default value to 2 so that the synthetic calls we insert don't get inlined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done

Copy link

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

.getCallee());
}
DirectCall.setCalledFunction(TargetToUse);
if (MemProfICPNoInlineThreshold &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here to document why we need to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@teresajohnson teresajohnson merged commit 838701a into llvm:main Jul 10, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants