Skip to content

Conversation

rlavaee
Copy link
Contributor

@rlavaee rlavaee commented Oct 13, 2025

When available, the Propeller CFG profile provides more fidelity than the PGO profile. It's also what has been used to optimize the function layout. So we should emit that in the PGO analysis map instead.

We still need a way to distinguish where the data comes from. We will handle this in a later PR.

@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2025

@llvm/pr-subscribers-backend-x86

Author: Rahman Lavaee (rlavaee)

Changes

When available, the Propeller CFG profile provides more fidelity than the PGO profile. It's also what has been used to optimize the function layout. So we should emit that in the PGO analysis map instead.

We still need a way to distinguish where the data comes from. We will handle this in a later PR.


Full diff: https://github.com/llvm/llvm-project/pull/163252.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h (+36-13)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+38-12)
  • (modified) llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp (+2-23)
  • (added) llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll (+92)
diff --git a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
index 82dd5feb31dba..3c0e26bad7a5b 100644
--- a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
+++ b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
@@ -42,18 +42,40 @@ struct BBClusterInfo {
   unsigned PositionInCluster;
 };
 
+// This represents the CFG profile data for a function.
+struct CFGProfile {
+  // Node counts for each basic block.
+  DenseMap<UniqueBBID, uint64_t> NodeCounts;
+  // Edge counts for each edge, stored as a nested map.
+  DenseMap<UniqueBBID, DenseMap<UniqueBBID, uint64_t>> EdgeCounts;
+
+  // Returns the profile count for the given basic block or zero if it does not
+  // exist.
+  uint64_t getNodeCount(const UniqueBBID &BBID) const {
+    return NodeCounts.lookup(BBID);
+  }
+
+  // Returns the profile count for the edge from `SrcBBID` to `SinkBBID` or
+  // zero if it does not exist.
+  uint64_t getEdgeCount(const UniqueBBID &SrcBBID,
+                        const UniqueBBID &SinkBBID) const {
+    auto It = EdgeCounts.find(SrcBBID);
+    if (It == EdgeCounts.end())
+      return 0;
+    return It->second.lookup(SinkBBID);
+  }
+};
+
 // This represents the raw input profile for one function.
-struct FunctionPathAndClusterInfo {
+struct FunctionProfile {
   // BB Cluster information specified by `UniqueBBID`s.
   SmallVector<BBClusterInfo> ClusterInfo;
   // Paths to clone. A path a -> b -> c -> d implies cloning b, c, and d along
   // the edge a -> b (a is not cloned). The index of the path in this vector
   // determines the `UniqueBBID::CloneID` of the cloned blocks in that path.
   SmallVector<SmallVector<unsigned>> ClonePaths;
-  // Node counts for each basic block.
-  DenseMap<UniqueBBID, uint64_t> NodeCounts;
-  // Edge counts for each edge, stored as a nested map.
-  DenseMap<UniqueBBID, DenseMap<UniqueBBID, uint64_t>> EdgeCounts;
+  // CFG profile data.
+  CFGProfile CFG;
 };
 
 class BasicBlockSectionsProfileReader {
@@ -81,10 +103,14 @@ class BasicBlockSectionsProfileReader {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
-  // Returns the profile count for the edge from `SrcBBID` to `SinkBBID` in
-  // function `FuncName` or zero if it does not exist.
-  uint64_t getEdgeCount(StringRef FuncName, const UniqueBBID &SrcBBID,
-                        const UniqueBBID &SinkBBID) const;
+  // Returns a pointer to the CFGProfile for the given function.
+  // Returns nullptr if no profile data is available for the function.
+  const CFGProfile *getFunctionCFGProfile(StringRef FuncName) const {
+    auto It = ProgramPathAndClusterInfo.find(getAliasName(FuncName));
+    if (It == ProgramPathAndClusterInfo.end())
+      return nullptr;
+    return &It->second.CFG;
+  }
 
 private:
   StringRef getAliasName(StringRef FuncName) const {
@@ -132,7 +158,7 @@ class BasicBlockSectionsProfileReader {
   // for (all or some of) its basic blocks. The cluster information for every
   // basic block includes its cluster ID along with the position of the basic
   // block in that cluster.
-  StringMap<FunctionPathAndClusterInfo> ProgramPathAndClusterInfo;
+  StringMap<FunctionProfile> ProgramPathAndClusterInfo;
 
   // Some functions have alias names. We use this map to find the main alias
   // name which appears in ProgramPathAndClusterInfo as a key.
@@ -192,9 +218,6 @@ class BasicBlockSectionsProfileReaderWrapperPass : public ImmutablePass {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
-  uint64_t getEdgeCount(StringRef FuncName, const UniqueBBID &SrcBBID,
-                        const UniqueBBID &DestBBID) const;
-
   // Initializes the FunctionNameToDIFilename map for the current module and
   // then reads the profile for the matching functions.
   bool doInitialization(Module &M) override;
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 219bbc9d5cdd4..9adc65ee0c1b1 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -37,6 +37,7 @@
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/GCMetadataPrinter.h"
 #include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
@@ -1531,12 +1532,17 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
     assert(BBAddrMapVersion >= 2 &&
            "PGOAnalysisMap only supports version 2 or later");
 
-    if (Features.FuncEntryCount) {
-      OutStreamer->AddComment("function entry count");
-      auto MaybeEntryCount = MF.getFunction().getEntryCount();
-      OutStreamer->emitULEB128IntValue(
-          MaybeEntryCount ? MaybeEntryCount->getCount() : 0);
-    }
+    // We will emit the BBSPR profile data if availale. Otherwise, we fall back
+    // to MBFI and MBPI.
+    const BasicBlockSectionsProfileReader *BBSPR = nullptr;
+    if (auto *BBSPRPass = getAnalysisIfAvailable<
+            BasicBlockSectionsProfileReaderWrapperPass>())
+      BBSPR = &BBSPRPass->getBBSPR();
+
+    const CFGProfile *FuncCFGProfile = nullptr;
+    if (BBSPR)
+      FuncCFGProfile = BBSPR->getFunctionCFGProfile(MF.getFunction().getName());
+
     const MachineBlockFrequencyInfo *MBFI =
         Features.BBFreq
             ? &getAnalysis<LazyMachineBlockFrequencyInfoPass>().getBFI()
@@ -1546,23 +1552,43 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
             ? &getAnalysis<MachineBranchProbabilityInfoWrapperPass>().getMBPI()
             : nullptr;
 
+    if (Features.FuncEntryCount) {
+      OutStreamer->AddComment("function entry count");
+      uint64_t EntryCount = 0;
+      if (FuncCFGProfile) {
+        EntryCount = FuncCFGProfile->getNodeCount(*MF.front().getBBID());
+      } else {
+        auto MaybeEntryCount = MF.getFunction().getEntryCount();
+        EntryCount = MaybeEntryCount ? MaybeEntryCount->getCount() : 0;
+      }
+      OutStreamer->emitULEB128IntValue(EntryCount);
+    }
+
     if (Features.BBFreq || Features.BrProb) {
       for (const MachineBasicBlock &MBB : MF) {
+
         if (Features.BBFreq) {
           OutStreamer->AddComment("basic block frequency");
-          OutStreamer->emitULEB128IntValue(
-              MBFI->getBlockFreq(&MBB).getFrequency());
+          uint64_t BlockFrequency =
+              FuncCFGProfile ? FuncCFGProfile->getNodeCount(*MBB.getBBID())
+                             : MBFI->getBlockFreq(&MBB).getFrequency();
+          OutStreamer->emitULEB128IntValue(BlockFrequency);
         }
         if (Features.BrProb) {
-          unsigned SuccCount = MBB.succ_size();
           OutStreamer->AddComment("basic block successor count");
-          OutStreamer->emitULEB128IntValue(SuccCount);
+          OutStreamer->emitULEB128IntValue(MBB.succ_size());
           for (const MachineBasicBlock *SuccMBB : MBB.successors()) {
             OutStreamer->AddComment("successor BB ID");
             OutStreamer->emitULEB128IntValue(SuccMBB->getBBID()->BaseID);
             OutStreamer->AddComment("successor branch probability");
-            OutStreamer->emitULEB128IntValue(
-                MBPI->getEdgeProbability(&MBB, SuccMBB).getNumerator());
+            // For MPBI, we emit the numerator of the probability. For BBSPR, we
+            // emit the raw edge count.
+            uint64_t EdgeFrequency =
+                FuncCFGProfile
+                    ? FuncCFGProfile->getEdgeCount(*MBB.getBBID(),
+                                                   *SuccMBB->getBBID())
+                    : MBPI->getEdgeProbability(&MBB, SuccMBB).getNumerator();
+            OutStreamer->emitULEB128IntValue(EdgeFrequency);
           }
         }
       }
diff --git a/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp b/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
index fbcd614b85d18..71a12e79fdd85 100644
--- a/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
+++ b/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
@@ -76,21 +76,6 @@ BasicBlockSectionsProfileReader::getClonePathsForFunction(
   return ProgramPathAndClusterInfo.lookup(getAliasName(FuncName)).ClonePaths;
 }
 
-uint64_t BasicBlockSectionsProfileReader::getEdgeCount(
-    StringRef FuncName, const UniqueBBID &SrcBBID,
-    const UniqueBBID &SinkBBID) const {
-  auto It = ProgramPathAndClusterInfo.find(getAliasName(FuncName));
-  if (It == ProgramPathAndClusterInfo.end())
-    return 0;
-  auto NodeIt = It->second.EdgeCounts.find(SrcBBID);
-  if (NodeIt == It->second.EdgeCounts.end())
-    return 0;
-  auto EdgeIt = NodeIt->second.find(SinkBBID);
-  if (EdgeIt == NodeIt->second.end())
-    return 0;
-  return EdgeIt->second;
-}
-
 // Reads the version 1 basic block sections profile. Profile for each function
 // is encoded as follows:
 //   m <module_name>
@@ -279,10 +264,10 @@ Error BasicBlockSectionsProfileReader::ReadV1Profile() {
                 Twine("unsigned integer expected: '") + CountStr + "'");
           if (i == 0) {
             // The first element represents the source and its total count.
-            FI->second.NodeCounts[SrcBBID = *BBID] = Count;
+            FI->second.CFG.NodeCounts[SrcBBID = *BBID] = Count;
             continue;
           }
-          FI->second.EdgeCounts[SrcBBID][*BBID] = Count;
+          FI->second.CFG.EdgeCounts[SrcBBID][*BBID] = Count;
         }
       }
       continue;
@@ -487,12 +472,6 @@ BasicBlockSectionsProfileReaderWrapperPass::getClonePathsForFunction(
   return BBSPR.getClonePathsForFunction(FuncName);
 }
 
-uint64_t BasicBlockSectionsProfileReaderWrapperPass::getEdgeCount(
-    StringRef FuncName, const UniqueBBID &SrcBBID,
-    const UniqueBBID &SinkBBID) const {
-  return BBSPR.getEdgeCount(FuncName, SrcBBID, SinkBBID);
-}
-
 BasicBlockSectionsProfileReader &
 BasicBlockSectionsProfileReaderWrapperPass::getBBSPR() {
   return BBSPR;
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll b/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll
new file mode 100644
index 0000000000000..ff97066659f4d
--- /dev/null
+++ b/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll
@@ -0,0 +1,92 @@
+; Verify PGO analysis map features with basic block sections profile.
+;
+; RUN: echo 'v1' > %t
+; RUN: echo 'f foo' >> %t
+; RUN: echo 'g 0:1000,1:800,2:200 1:800,3:800 2:200,3:200 3:1000' >> %t
+; RUN: echo 'c 0 1 2' >> %t
+;
+; RUN: llc < %s -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=%t -basic-block-address-map -pgo-analysis-map=all | FileCheck %s
+
+define void @foo() nounwind !prof !0 {
+entry:
+  br label %bb1
+
+bb1:
+  br i1 undef, label %bb2, label %bb3, !prof !1
+
+bb2:
+  br label %bb3
+
+bb3:
+  ret void
+}
+
+!0 = !{!"function_entry_count", i64 1500}
+!1 = !{!"branch_weights", i32 1200, i32 300}
+
+;; Verify that foo gets its PGO map from its Propeller CFG profile.
+
+; CHECK: 	.section	.llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text.foo
+; CHECK-NEXT:	.byte	3		# version
+; CHECK-NEXT:	.byte	15		# feature
+; CHECK:	.quad	.Lfunc_begin0	# base address
+; CHECK:	.byte	0		# BB id
+; CHECK:	.byte	1		# BB id
+; CHECK:	.byte	2		# BB id
+; CHECK:	.byte	3		# BB id
+
+; PGO Analysis Map for foo
+; CHECK:	.ascii	"\350\007"	# function entry count
+; CHECK-NEXT:	.ascii	"\350\007"	# basic block frequency
+; CHECK-NEXT:	.byte	1		# basic block successor count
+; CHECK-NEXT:	.byte	1		# successor BB ID
+; CHECK-NEXT:	.ascii	"\240\006"	# successor branch probability
+; CHECK-NEXT:	.ascii	"\240\006"	# basic block frequency
+; CHECK-NEXT:	.byte	2		# basic block successor count
+; CHECK-NEXT:	.byte	2		# successor BB ID
+; CHECK-NEXT:	.byte	0		# successor branch probability
+; CHECK-NEXT:	.byte	3		# successor BB ID
+; CHECK-NEXT:	.ascii	"\240\006"	# successor branch probability
+; CHECK-NEXT:	.ascii	"\310\001"	# basic block frequency
+; CHECK-NEXT:	.byte	1		# basic block successor count
+; CHECK-NEXT:	.byte	3		# successor BB ID
+; CHECK-NEXT:	.ascii	"\310\001"	# successor branch probability
+; CHECK-NEXT:	.ascii	"\350\007"	# basic block frequency
+; CHECK-NEXT:	.byte	0		# basic block successor count
+
+define void @bar() nounwind !prof !2 {
+entry:
+  br i1 undef, label %bb1, label %bb2, !prof !3
+
+bb1:
+  ret void
+
+bb2:
+  ret void
+}
+
+!2 = !{!"function_entry_count", i64 80}
+!3 = !{!"branch_weights", i32 2, i32 78}
+
+;; Verify that we emit the PGO map for bar although it doesn't have Propeller profile.
+
+; CHECK: 	.section	.llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text.bar
+; CHECK-NEXT:	.byte	3		# version
+; CHECK-NEXT:	.byte	7		# feature
+; CHECK:	.quad	.Lfunc_begin1	# function address
+; CHECK:	.byte	0		# BB id
+; CHECK:	.byte	1		# BB id
+; CHECK:	.byte	2		# BB id
+
+; CHECK:	.byte	80                              # function entry count
+; CHECK-NEXT:	.ascii	"\200\200\200\200\200\200\200 " # basic block frequency
+; CHECK-NEXT:	.byte	2                               # basic block successor count
+; CHECK-NEXT:	.byte	1                               # successor BB ID
+; CHECK-NEXT:	.ascii	"\200\200\200\200\004"          # successor branch probability
+; CHECK-NEXT:	.byte	2                               # successor BB ID
+; CHECK-NEXT:	.ascii	"\200\200\200\200\004"          # successor branch probability
+; CHECK-NEXT:	.ascii	"\200\200\200\200\200\200\200\020" # basic block frequency
+; CHECK-NEXT:	.byte	0                               # basic block successor count
+; CHECK-NEXT:	.ascii	"\200\200\200\200\200\200\200\020" # basic block frequency
+; CHECK-NEXT:	.byte	0                               # basic block successor count
+

Copy link

github-actions bot commented Oct 13, 2025

✅ With the latest revision this PR passed the undef deprecator.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this?

The only current users of PGOAnalysisMap that I'm aware of are using it to validate PGO information, not the Propeller Profile.

@rlavaee
Copy link
Contributor Author

rlavaee commented Oct 13, 2025

What's the point of this?

The only current users of PGOAnalysisMap that I'm aware of are using it to validate PGO information, not the Propeller Profile.

We are going to compare fleet profiles against the compiler profile to verify profile fidelity. When binary is optimized using Propeller, the info coming from PGO aren't as useful.

@boomanaiden154
Copy link
Contributor

We are going to compare fleet profiles against the compiler profile to verify profile fidelity. When binary is optimized using Propeller, the info coming from PGO aren't as useful.

Verify profile fidelity of what? IRPGO? CSPGO? The propeller profile?

@rlavaee
Copy link
Contributor Author

rlavaee commented Oct 13, 2025

We are going to compare fleet profiles against the compiler profile to verify profile fidelity. When binary is optimized using Propeller, the info coming from PGO aren't as useful.

Verify profile fidelity of what? IRPGO? CSPGO? The propeller profile?

Any profile that was used (if the function has Propeller profile, it will be used instead). I am open to suggestions. Do you think we should emit the Propeller info separate from the PGO?

@boomanaiden154
Copy link
Contributor

Any profile that was used (if the function has Propeller profile, it will be used instead). I am open to suggestions. Do you think we should emit the Propeller info separate from the PGO?

I don't think there's a need to emit separately. I think a flag to explicitly choose between data computed from BFI/BPI and Propeller would probably make the most sense for this given the use case (and error out if we can trivially detect that what was requested is not available, which probably is only possible in the propeller case).

Emitting separately I guess would enable validating multiple profiling types with a single build, but I would think that's a pretty niche use case and probably better handled by just doing the build multiple times.

@rlavaee
Copy link
Contributor Author

rlavaee commented Oct 13, 2025

Any profile that was used (if the function has Propeller profile, it will be used instead). I am open to suggestions. Do you think we should emit the Propeller info separate from the PGO?

I don't think there's a need to emit separately. I think a flag to explicitly choose between data computed from BFI/BPI and Propeller would probably make the most sense for this given the use case (and error out if we can trivially detect that what was requested is not available, which probably is only possible in the propeller case).

Emitting separately I guess would enable validating multiple profiling types with a single build, but I would think that's a pretty niche use case and probably better handled by just doing the build multiple times.

Thanks for the feedback. I added an option -pgo-analysis-map-use-propeller-cfg.

To clarify, I do want to keep the BFI/PBI profile for functions which don't have any Propeller profile, though I am undecided about functions with both Propeller and BFI profiles. The BFI profile can still be useful for the Propeller-cold part of the function. However, I think we can start with the single source PGO map for now and iterate as we move forward.

Regarding the rebuilding idea, we decided to include the PGO analysis map in the production binary and avoid rebuilding the binary to make the process simpler and to avoid being affected by non-determinism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants