Skip to content

Conversation

@melver
Copy link
Contributor

@melver melver commented Nov 24, 2025

Remove explicit insertion of the AllocTokenPass, which is now handled by
the PassBuilder. Emit AllocToken configuration as LLVM module flags to
persist into the backend.

Specifically, this also means it will now be handled by LTO backend
phases; this avoids interference with other optimizations (e.g. PGHO)
and enable late heap-allocation optimizations with LTO enabled.


This change is part of the following series:

  1. [InstCombine][MemProf] Preserve all metadata #169242
  2. [Clang][MemProf] Add end-to-end test for PGHO rewriting #169243
  3. [LTO][AllocToken] Support AllocToken instrumentation in backend #169358
  4. [Clang][CodeGen] Remove explicit insertion of AllocToken pass #169360

Created using spr 1.3.8-beta.1
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.8-beta.1
@melver melver marked this pull request as ready for review November 24, 2025 16:49
@llvmbot llvmbot added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen IR generation bugs: mangling, exceptions, etc. LTO Link time optimization (regular/full LTO or ThinLTO) labels Nov 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-lto

@llvm/pr-subscribers-clang-codegen

Author: Marco Elver (melver)

Changes

With LTO enabled, move AllocTokenPass from the pre-link phase to the
backend LTO phase to avoid interference with other optimizations (e.g.
PGHO) and enable late heap-allocation optimizations.

For FullLTO and normal ThinLTO, pass AllocToken options to the linker
plugin.

For distributed ThinLTO, we need a new PassBuilderCallback in
lto::Config to inject passes into the LTO backend pipeline when
running in Clang.


This change is part of the following series:

  1. [InstCombine][MemProf] Preserve all metadata #169242
  2. [Clang][MemProf] Add end-to-end test for PGHO rewriting #169243
  3. [LTO][AllocToken] Support AllocToken instrumentation in backend #169358
  4. [Clang] Make -falloc-token-mode a hidden frontend option #169359
  5. [Clang][CodeGen] Remove explicit insertion of AllocToken pass #169360

Full diff: https://github.com/llvm/llvm-project/pull/169360.diff

8 Files Affected:

  • (modified) clang/include/clang/Driver/SanitizerArgs.h (+4)
  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+12-1)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+18)
  • (added) clang/test/CodeGen/distributed-thin-lto/memprof-pgho.cpp (+65)
  • (modified) clang/test/CodeGen/lto-newpm-pipeline.c (+2-6)
  • (modified) clang/test/Driver/fsanitize-alloc-token.c (+24)
  • (modified) llvm/include/llvm/LTO/Config.h (+2)
  • (modified) llvm/lib/LTO/LTOBackend.cpp (+2)
diff --git a/clang/include/clang/Driver/SanitizerArgs.h b/clang/include/clang/Driver/SanitizerArgs.h
index 08e3c147d0557..4cc3811bb547a 100644
--- a/clang/include/clang/Driver/SanitizerArgs.h
+++ b/clang/include/clang/Driver/SanitizerArgs.h
@@ -140,6 +140,10 @@ class SanitizerArgs {
     return Sanitizers.has(SanitizerKind::ShadowCallStack);
   }
 
+  bool hasAllocToken() const {
+    return Sanitizers.has(SanitizerKind::AllocToken);
+  }
+
   bool requiresPIE() const;
   bool needsUnwindTables() const;
   bool needsLTO() const;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 5590d217e96ff..4317ea7c0a7a5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -1144,7 +1144,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     if (!IsThinLTOPostLink) {
       addSanitizers(TargetTriple, CodeGenOpts, LangOpts, PB);
       addKCFIPass(TargetTriple, LangOpts, PB);
-      addAllocTokenPass(TargetTriple, CodeGenOpts, LangOpts, PB);
+
+      // On ThinLTO or FullLTO pre-link, skip AllocTokenPass; it runs during the
+      // LTO backend compile phase to enable late heap-allocation optimizations
+      // to remain compatible with AllocToken instrumentation.
+      if (!PrepareForThinLTO && !PrepareForLTO)
+        addAllocTokenPass(TargetTriple, CodeGenOpts, LangOpts, PB);
     }
 
     if (std::optional<GCOVOptions> Options =
@@ -1425,6 +1430,12 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
   Conf.RemarksFormat = CGOpts.OptRecordFormat;
   Conf.SplitDwarfFile = CGOpts.SplitDwarfFile;
   Conf.SplitDwarfOutput = CGOpts.SplitDwarfOutput;
+  Conf.PassBuilderCallback = [&](PassBuilder &PB) {
+    // Skipped during pre-link phase to avoid instrumentation interfering with
+    // backend LTO optimizations, and instead we run it as late as possible.
+    // This case handles distributed ThinLTO.
+    addAllocTokenPass(CI.getTarget().getTriple(), CGOpts, CI.getLangOpts(), PB);
+  };
   switch (Action) {
   case Backend_EmitNothing:
     Conf.PreCodeGenModuleHook = [](size_t Task, const llvm::Module &Mod) {
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index d3539a594df11..8ea84c72b6a2e 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1385,6 +1385,24 @@ void tools::addLTOOptions(const ToolChain &ToolChain, const ArgList &Args,
     CmdArgs.push_back(
         Args.MakeArgString(Twine(PluginOptPrefix) + "-time-passes"));
 
+  const SanitizerArgs &SanArgs = ToolChain.getSanitizerArgs(Args);
+  if (SanArgs.hasAllocToken()) {
+    StringRef Mode = "default";
+    if (Arg *A = Args.getLastArg(options::OPT_falloc_token_mode_EQ))
+      Mode = A->getValue();
+    CmdArgs.push_back(Args.MakeArgString(Twine(PluginOptPrefix) +
+                                         "-lto-alloc-token-mode=" + Mode));
+    if (Args.hasArg(options::OPT_fsanitize_alloc_token_fast_abi))
+      CmdArgs.push_back(
+          Args.MakeArgString(Twine(PluginOptPrefix) + "-alloc-token-fast-abi"));
+    if (Args.hasArg(options::OPT_fsanitize_alloc_token_extended))
+      CmdArgs.push_back(
+          Args.MakeArgString(Twine(PluginOptPrefix) + "-alloc-token-extended"));
+    if (Arg *A = Args.getLastArg(options::OPT_falloc_token_max_EQ))
+      CmdArgs.push_back(Args.MakeArgString(
+          Twine(PluginOptPrefix) + "-alloc-token-max=" + A->getValue()));
+  }
+
   addDTLTOOptions(ToolChain, Args, CmdArgs);
 }
 
diff --git a/clang/test/CodeGen/distributed-thin-lto/memprof-pgho.cpp b/clang/test/CodeGen/distributed-thin-lto/memprof-pgho.cpp
new file mode 100644
index 0000000000000..3cde4a25349d4
--- /dev/null
+++ b/clang/test/CodeGen/distributed-thin-lto/memprof-pgho.cpp
@@ -0,0 +1,65 @@
+// Test end-to-end ThinLTO optimization pipeline with PGHO, that it does not
+// interfere with other allocation instrumentation features.
+//
+// RUN: split-file %s %t
+// RUN: llvm-profdata merge %t/memprof.yaml -o %t/use.memprofdata
+//
+// RUN: %clangxx -O2 -flto=thin -g -fmemory-profile-use=%t/use.memprofdata %t/src.cpp -c -o %t.o
+// RUN: llvm-lto2 run %t.o -thinlto-distributed-indexes -supports-hot-cold-new -r=%t.o,main,plx -r=%t.o,_Z3foov,plx -r=%t.o,_Znam, -o %t.out
+// RUN: %clang_cc1 -O2 -x ir %t.o -fthinlto-index=%t.o.thinlto.bc -mllvm -optimize-hot-cold-new -emit-llvm -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,DEFAULT
+//
+// RUN: %clangxx -O2 -flto=thin -g -fsanitize=alloc-token -fmemory-profile-use=%t/use.memprofdata %t/src.cpp -c -o %t.o
+// RUN: llvm-lto2 run %t.o -thinlto-distributed-indexes -supports-hot-cold-new -r=%t.o,main,plx -r=%t.o,_Z3foov,plx -r=%t.o,_Znam, -o %t.out
+// RUN: %clang_cc1 -O2 -x ir %t.o -fsanitize=alloc-token -fthinlto-index=%t.o.thinlto.bc -mllvm -optimize-hot-cold-new -emit-llvm -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,ALLOCTOKEN
+
+//--- memprof.yaml
+---
+HeapProfileRecords:
+  - GUID: 0x7f8d88fcc70a347b
+    AllocSites:
+    - Callstack:
+      - { Function: 0x7f8d88fcc70a347b, LineOffset: 1, Column: 10, IsInlineFrame: false }
+      - { Function: 0xdb956436e78dd5fa, LineOffset: 1, Column: 13, IsInlineFrame: false }
+      MemInfoBlock:
+        AllocCount: 1
+        TotalAccessCount: 0
+        MinAccessCount: 0
+        MaxAccessCount: 0
+        TotalSize: 10
+        MinSize: 10
+        MaxSize: 10
+        AllocTimestamp: 100
+        DeallocTimestamp: 100
+        TotalLifetime: 100000
+        MinLifetime: 100000
+        MaxLifetime: 100000
+        AllocCpuId: 0
+        DeallocCpuId: 0
+        NumMigratedCpu: 0
+        NumLifetimeOverlaps: 0
+        NumSameAllocCpu: 0
+        NumSameDeallocCpu: 0
+        DataTypeId: 0
+        TotalAccessDensity: 0
+        MinAccessDensity: 0
+        MaxAccessDensity: 0
+        TotalLifetimeAccessDensity: 0
+        MinLifetimeAccessDensity: 0
+        MaxLifetimeAccessDensity: 0
+        AccessHistogramSize: 0
+        AccessHistogram: 0
+...
+
+//--- src.cpp
+// CHECK-LABEL: define{{.*}} ptr @_Z3foov()
+// DEFAULT:    call {{.*}} ptr @_Znam12__hot_cold_t(i64 10, i8 -128)
+// ALLOCTOKEN: call {{.*}} ptr @__alloc_token__Znam12__hot_cold_t(i64 10, i8 -128, i64 1538840549748785101){{.*}} !alloc_token
+char *foo() {
+  return new char[10];
+}
+
+int main() {
+  char *a = foo();
+  delete[] a;
+  return 0;
+}
diff --git a/clang/test/CodeGen/lto-newpm-pipeline.c b/clang/test/CodeGen/lto-newpm-pipeline.c
index dceaaf136ebfc..ea9784a76f923 100644
--- a/clang/test/CodeGen/lto-newpm-pipeline.c
+++ b/clang/test/CodeGen/lto-newpm-pipeline.c
@@ -32,12 +32,10 @@
 // CHECK-FULL-O0-NEXT: Running pass: AlwaysInlinerPass
 // CHECK-FULL-O0-NEXT: Running analysis: ProfileSummaryAnalysis
 // CHECK-FULL-O0-NEXT: Running pass: CoroConditionalWrapper
-// CHECK-FULL-O0-NEXT: Running pass: AllocTokenPass
-// CHECK-FULL-O0-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
-// CHECK-FULL-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-FULL-O0-NEXT: Running pass: CanonicalizeAliasesPass
 // CHECK-FULL-O0-NEXT: Running pass: NameAnonGlobalPass
 // CHECK-FULL-O0-NEXT: Running pass: AnnotationRemarksPass
+// CHECK-FULL-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-FULL-O0-NEXT: Running pass: VerifierPass
 // CHECK-FULL-O0-NEXT: Running pass: BitcodeWriterPass
 
@@ -48,12 +46,10 @@
 // CHECK-THIN-O0-NEXT: Running pass: AlwaysInlinerPass
 // CHECK-THIN-O0-NEXT: Running analysis: ProfileSummaryAnalysis
 // CHECK-THIN-O0-NEXT: Running pass: CoroConditionalWrapper
-// CHECK-THIN-O0-NEXT: Running pass: AllocTokenPass
-// CHECK-THIN-O0-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
-// CHECK-THIN-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-THIN-O0-NEXT: Running pass: CanonicalizeAliasesPass
 // CHECK-THIN-O0-NEXT: Running pass: NameAnonGlobalPass
 // CHECK-THIN-O0-NEXT: Running pass: AnnotationRemarksPass
+// CHECK-THIN-O0-NEXT: Running analysis: TargetLibraryAnalysis
 // CHECK-THIN-O0-NEXT: Running pass: VerifierPass
 // CHECK-THIN-O0-NEXT: Running pass: ThinLTOBitcodeWriterPass
 
diff --git a/clang/test/Driver/fsanitize-alloc-token.c b/clang/test/Driver/fsanitize-alloc-token.c
index 073b211d7b671..515b62e30b9ec 100644
--- a/clang/test/Driver/fsanitize-alloc-token.c
+++ b/clang/test/Driver/fsanitize-alloc-token.c
@@ -53,3 +53,27 @@
 // CHECK-MODE-TYPEHASHPTRSPLIT: "-falloc-token-mode=typehashpointersplit"
 // RUN: not %clang --target=x86_64-linux-gnu -falloc-token-mode=asdf %s 2>&1 | FileCheck -check-prefix=CHECK-INVALID-MODE %s
 // CHECK-INVALID-MODE: error: invalid value 'asdf'
+
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fsanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fsanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fno-sanitize=alloc-token -fsanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fno-sanitize=alloc-token -fsanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO %s
+// CHECK-LTO: "-plugin-opt=-lto-alloc-token-mode=default"
+
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fsanitize=alloc-token -fno-sanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-NO %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fsanitize=alloc-token -fno-sanitize=alloc-token %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-NO %s
+// CHECK-LTO-NO-NOT: "-plugin-opt=-lto-alloc-token-mode=default"
+
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fsanitize=alloc-token -fsanitize-alloc-token-fast-abi %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-FAST %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fsanitize=alloc-token -fsanitize-alloc-token-fast-abi %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-FAST %s
+// CHECK-LTO-FAST: "-plugin-opt=-lto-alloc-token-mode=default"
+// CHECK-LTO-FAST: "-plugin-opt=-alloc-token-fast-abi"
+
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fsanitize=alloc-token -falloc-token-max=100 %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-MAX %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fsanitize=alloc-token -falloc-token-max=100 %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-MAX %s
+// CHECK-LTO-MAX: "-plugin-opt=-lto-alloc-token-mode=default"
+// CHECK-LTO-MAX: "-plugin-opt=-alloc-token-max=100"
+
+// RUN: %clang --target=x86_64-linux-gnu -flto=thin -fsanitize=alloc-token -falloc-token-mode=random %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-MODE %s
+// RUN: %clang --target=x86_64-linux-gnu -flto=full -fsanitize=alloc-token -falloc-token-mode=random %s -### 2>&1 | FileCheck --check-prefix=CHECK-LTO-MODE %s
+// CHECK-LTO-MODE: "-plugin-opt=-lto-alloc-token-mode=random"
diff --git a/llvm/include/llvm/LTO/Config.h b/llvm/include/llvm/LTO/Config.h
index 566a87ed1a790..cd1c725fa3fc0 100644
--- a/llvm/include/llvm/LTO/Config.h
+++ b/llvm/include/llvm/LTO/Config.h
@@ -51,6 +51,8 @@ struct Config {
   std::vector<std::string> MAttrs;
   std::vector<std::string> MllvmArgs;
   std::vector<std::string> PassPlugins;
+  /// Callback to customize the PassBuilder.
+  std::function<void(PassBuilder &)> PassBuilderCallback;
   /// For adding passes that run right before codegen.
   std::function<void(legacy::PassManager &)> PreCodeGenPassesHook;
   std::optional<Reloc::Model> RelocModel = Reloc::PIC_;
diff --git a/llvm/lib/LTO/LTOBackend.cpp b/llvm/lib/LTO/LTOBackend.cpp
index 190cec4f5701c..cd20590f6a7b8 100644
--- a/llvm/lib/LTO/LTOBackend.cpp
+++ b/llvm/lib/LTO/LTOBackend.cpp
@@ -307,6 +307,8 @@ static void runNewPMPasses(const Config &Conf, Module &Mod, TargetMachine *TM,
   PassBuilder PB(TM, Conf.PTO, PGOOpt, &PIC);
 
   RegisterPassPlugins(Conf.PassPlugins, PB);
+  if (Conf.PassBuilderCallback)
+    Conf.PassBuilderCallback(PB);
 
   registerBackendInstrumentation(PB);
 

@vitalybuka
Copy link
Collaborator

Why AllocToken is a problem for other optimization?
Can PGHO be aware of AllocToken?

AllocToken replies on metadata, later it applied, less the quality.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having different ways to inject this pass for the different LTO backend pipelines, and for distributed vs in process ThinLTO, and also rather than passing internal options to the linker driver which we're trying to get away from, can the pass be added unconditionally from the PassBuilder::build*LTODefaultPipeline in the correct location and guarded by something in the IR? In fact, looking at the AllocToken pass, it already appears to be a noop without the necessary attributes and intrinsics. Can the mode also be passed down in the IR via an attribute or at least a module flag to simplify the enablement?

Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1
@melver
Copy link
Contributor Author

melver commented Nov 25, 2025

Why AllocToken is a problem for other optimization? Can PGHO be aware of AllocToken?

PGHO rewrites in LibCallSimplifier::optimizeNew rely in LibFunc identification. If we rewrite them it won't recognize them.

PGHO could be made aware of AllocToken if we teach it to recognize the ABI, but I think it's generally infeasible; perhaps checking for the __alloc_token prefix is easy, but we also need to support 2 calling conventions. On the other hand PGHO rewrites are relatively limited (few hot_cold_t variants) and its rewrites are part of known LibFuncs, which AllocToken already recognizes as-is.

Similarly, it's going to be easier for other similar optimizations that rely on specific known LibCall identification.

AllocToken replies on metadata, later it applied, less the quality.

I agree, but I think if we can make it work in the LTO backend it will be simpler for other optimizations incl. PGHO.

Rather than having different ways to inject this pass for the different LTO backend pipelines, and for distributed vs in process ThinLTO, and also rather than passing internal options to the linker driver which we're trying to get away from, can the pass be added unconditionally from the PassBuilder::build*LTODefaultPipeline in the correct location and guarded by something in the IR? In fact, looking at the AllocToken pass, it already appears to be a noop without the necessary attributes and intrinsics. Can the mode also be passed down in the IR via an attribute or at least a module flag to simplify the enablement?

I can look at the module flag, maybe that can work to derive the options.

Created using spr 1.3.8-beta.1
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 26, 2025
@melver
Copy link
Contributor Author

melver commented Nov 26, 2025

Rather than having different ways to inject this pass for the different LTO backend pipelines, and for distributed vs in process ThinLTO, and also rather than passing internal options to the linker driver which we're trying to get away from, can the pass be added unconditionally from the PassBuilder::build*LTODefaultPipeline in the correct location and guarded by something in the IR?

I reworked it and it's added in PassBuilder now.

In fact, looking at the AllocToken pass, it already appears to be a noop without the necessary attributes and intrinsics. Can the mode also be passed down in the IR via an attribute or at least a module flag to simplify the enablement?

Reworked to use module flags.

PTAL. Thanks!

@melver melver requested review from pcc and teresajohnson November 26, 2025 15:58
Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Created using spr 1.3.8-beta.1
@melver melver changed the title [Clang][CodeGen] Move AllocToken pass to backend LTO phases [Clang][CodeGen] Remove explicit insertion of AllocToken pass Nov 27, 2025
Created using spr 1.3.8-beta.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category LTO Link time optimization (regular/full LTO or ThinLTO)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants