-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Use the Propeller CFG profile in the PGO analysis map if it is available. #163252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-x86 Author: Rahman Lavaee (rlavaee) ChangesWhen available, the Propeller CFG profile provides more fidelity than the PGO profile. It's also what has been used to optimize the function layout. So we should emit that in the PGO analysis map instead. We still need a way to distinguish where the data comes from. We will handle this in a later PR. Full diff: https://github.com/llvm/llvm-project/pull/163252.diff 4 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
index 82dd5feb31dba..3c0e26bad7a5b 100644
--- a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
+++ b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
@@ -42,18 +42,40 @@ struct BBClusterInfo {
unsigned PositionInCluster;
};
+// This represents the CFG profile data for a function.
+struct CFGProfile {
+ // Node counts for each basic block.
+ DenseMap<UniqueBBID, uint64_t> NodeCounts;
+ // Edge counts for each edge, stored as a nested map.
+ DenseMap<UniqueBBID, DenseMap<UniqueBBID, uint64_t>> EdgeCounts;
+
+ // Returns the profile count for the given basic block or zero if it does not
+ // exist.
+ uint64_t getNodeCount(const UniqueBBID &BBID) const {
+ return NodeCounts.lookup(BBID);
+ }
+
+ // Returns the profile count for the edge from `SrcBBID` to `SinkBBID` or
+ // zero if it does not exist.
+ uint64_t getEdgeCount(const UniqueBBID &SrcBBID,
+ const UniqueBBID &SinkBBID) const {
+ auto It = EdgeCounts.find(SrcBBID);
+ if (It == EdgeCounts.end())
+ return 0;
+ return It->second.lookup(SinkBBID);
+ }
+};
+
// This represents the raw input profile for one function.
-struct FunctionPathAndClusterInfo {
+struct FunctionProfile {
// BB Cluster information specified by `UniqueBBID`s.
SmallVector<BBClusterInfo> ClusterInfo;
// Paths to clone. A path a -> b -> c -> d implies cloning b, c, and d along
// the edge a -> b (a is not cloned). The index of the path in this vector
// determines the `UniqueBBID::CloneID` of the cloned blocks in that path.
SmallVector<SmallVector<unsigned>> ClonePaths;
- // Node counts for each basic block.
- DenseMap<UniqueBBID, uint64_t> NodeCounts;
- // Edge counts for each edge, stored as a nested map.
- DenseMap<UniqueBBID, DenseMap<UniqueBBID, uint64_t>> EdgeCounts;
+ // CFG profile data.
+ CFGProfile CFG;
};
class BasicBlockSectionsProfileReader {
@@ -81,10 +103,14 @@ class BasicBlockSectionsProfileReader {
SmallVector<SmallVector<unsigned>>
getClonePathsForFunction(StringRef FuncName) const;
- // Returns the profile count for the edge from `SrcBBID` to `SinkBBID` in
- // function `FuncName` or zero if it does not exist.
- uint64_t getEdgeCount(StringRef FuncName, const UniqueBBID &SrcBBID,
- const UniqueBBID &SinkBBID) const;
+ // Returns a pointer to the CFGProfile for the given function.
+ // Returns nullptr if no profile data is available for the function.
+ const CFGProfile *getFunctionCFGProfile(StringRef FuncName) const {
+ auto It = ProgramPathAndClusterInfo.find(getAliasName(FuncName));
+ if (It == ProgramPathAndClusterInfo.end())
+ return nullptr;
+ return &It->second.CFG;
+ }
private:
StringRef getAliasName(StringRef FuncName) const {
@@ -132,7 +158,7 @@ class BasicBlockSectionsProfileReader {
// for (all or some of) its basic blocks. The cluster information for every
// basic block includes its cluster ID along with the position of the basic
// block in that cluster.
- StringMap<FunctionPathAndClusterInfo> ProgramPathAndClusterInfo;
+ StringMap<FunctionProfile> ProgramPathAndClusterInfo;
// Some functions have alias names. We use this map to find the main alias
// name which appears in ProgramPathAndClusterInfo as a key.
@@ -192,9 +218,6 @@ class BasicBlockSectionsProfileReaderWrapperPass : public ImmutablePass {
SmallVector<SmallVector<unsigned>>
getClonePathsForFunction(StringRef FuncName) const;
- uint64_t getEdgeCount(StringRef FuncName, const UniqueBBID &SrcBBID,
- const UniqueBBID &DestBBID) const;
-
// Initializes the FunctionNameToDIFilename map for the current module and
// then reads the profile for the matching functions.
bool doInitialization(Module &M) override;
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 219bbc9d5cdd4..9adc65ee0c1b1 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -37,6 +37,7 @@
#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/BinaryFormat/ELF.h"
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/GCMetadataPrinter.h"
#include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
@@ -1531,12 +1532,17 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
assert(BBAddrMapVersion >= 2 &&
"PGOAnalysisMap only supports version 2 or later");
- if (Features.FuncEntryCount) {
- OutStreamer->AddComment("function entry count");
- auto MaybeEntryCount = MF.getFunction().getEntryCount();
- OutStreamer->emitULEB128IntValue(
- MaybeEntryCount ? MaybeEntryCount->getCount() : 0);
- }
+ // We will emit the BBSPR profile data if availale. Otherwise, we fall back
+ // to MBFI and MBPI.
+ const BasicBlockSectionsProfileReader *BBSPR = nullptr;
+ if (auto *BBSPRPass = getAnalysisIfAvailable<
+ BasicBlockSectionsProfileReaderWrapperPass>())
+ BBSPR = &BBSPRPass->getBBSPR();
+
+ const CFGProfile *FuncCFGProfile = nullptr;
+ if (BBSPR)
+ FuncCFGProfile = BBSPR->getFunctionCFGProfile(MF.getFunction().getName());
+
const MachineBlockFrequencyInfo *MBFI =
Features.BBFreq
? &getAnalysis<LazyMachineBlockFrequencyInfoPass>().getBFI()
@@ -1546,23 +1552,43 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
? &getAnalysis<MachineBranchProbabilityInfoWrapperPass>().getMBPI()
: nullptr;
+ if (Features.FuncEntryCount) {
+ OutStreamer->AddComment("function entry count");
+ uint64_t EntryCount = 0;
+ if (FuncCFGProfile) {
+ EntryCount = FuncCFGProfile->getNodeCount(*MF.front().getBBID());
+ } else {
+ auto MaybeEntryCount = MF.getFunction().getEntryCount();
+ EntryCount = MaybeEntryCount ? MaybeEntryCount->getCount() : 0;
+ }
+ OutStreamer->emitULEB128IntValue(EntryCount);
+ }
+
if (Features.BBFreq || Features.BrProb) {
for (const MachineBasicBlock &MBB : MF) {
+
if (Features.BBFreq) {
OutStreamer->AddComment("basic block frequency");
- OutStreamer->emitULEB128IntValue(
- MBFI->getBlockFreq(&MBB).getFrequency());
+ uint64_t BlockFrequency =
+ FuncCFGProfile ? FuncCFGProfile->getNodeCount(*MBB.getBBID())
+ : MBFI->getBlockFreq(&MBB).getFrequency();
+ OutStreamer->emitULEB128IntValue(BlockFrequency);
}
if (Features.BrProb) {
- unsigned SuccCount = MBB.succ_size();
OutStreamer->AddComment("basic block successor count");
- OutStreamer->emitULEB128IntValue(SuccCount);
+ OutStreamer->emitULEB128IntValue(MBB.succ_size());
for (const MachineBasicBlock *SuccMBB : MBB.successors()) {
OutStreamer->AddComment("successor BB ID");
OutStreamer->emitULEB128IntValue(SuccMBB->getBBID()->BaseID);
OutStreamer->AddComment("successor branch probability");
- OutStreamer->emitULEB128IntValue(
- MBPI->getEdgeProbability(&MBB, SuccMBB).getNumerator());
+ // For MPBI, we emit the numerator of the probability. For BBSPR, we
+ // emit the raw edge count.
+ uint64_t EdgeFrequency =
+ FuncCFGProfile
+ ? FuncCFGProfile->getEdgeCount(*MBB.getBBID(),
+ *SuccMBB->getBBID())
+ : MBPI->getEdgeProbability(&MBB, SuccMBB).getNumerator();
+ OutStreamer->emitULEB128IntValue(EdgeFrequency);
}
}
}
diff --git a/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp b/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
index fbcd614b85d18..71a12e79fdd85 100644
--- a/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
+++ b/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
@@ -76,21 +76,6 @@ BasicBlockSectionsProfileReader::getClonePathsForFunction(
return ProgramPathAndClusterInfo.lookup(getAliasName(FuncName)).ClonePaths;
}
-uint64_t BasicBlockSectionsProfileReader::getEdgeCount(
- StringRef FuncName, const UniqueBBID &SrcBBID,
- const UniqueBBID &SinkBBID) const {
- auto It = ProgramPathAndClusterInfo.find(getAliasName(FuncName));
- if (It == ProgramPathAndClusterInfo.end())
- return 0;
- auto NodeIt = It->second.EdgeCounts.find(SrcBBID);
- if (NodeIt == It->second.EdgeCounts.end())
- return 0;
- auto EdgeIt = NodeIt->second.find(SinkBBID);
- if (EdgeIt == NodeIt->second.end())
- return 0;
- return EdgeIt->second;
-}
-
// Reads the version 1 basic block sections profile. Profile for each function
// is encoded as follows:
// m <module_name>
@@ -279,10 +264,10 @@ Error BasicBlockSectionsProfileReader::ReadV1Profile() {
Twine("unsigned integer expected: '") + CountStr + "'");
if (i == 0) {
// The first element represents the source and its total count.
- FI->second.NodeCounts[SrcBBID = *BBID] = Count;
+ FI->second.CFG.NodeCounts[SrcBBID = *BBID] = Count;
continue;
}
- FI->second.EdgeCounts[SrcBBID][*BBID] = Count;
+ FI->second.CFG.EdgeCounts[SrcBBID][*BBID] = Count;
}
}
continue;
@@ -487,12 +472,6 @@ BasicBlockSectionsProfileReaderWrapperPass::getClonePathsForFunction(
return BBSPR.getClonePathsForFunction(FuncName);
}
-uint64_t BasicBlockSectionsProfileReaderWrapperPass::getEdgeCount(
- StringRef FuncName, const UniqueBBID &SrcBBID,
- const UniqueBBID &SinkBBID) const {
- return BBSPR.getEdgeCount(FuncName, SrcBBID, SinkBBID);
-}
-
BasicBlockSectionsProfileReader &
BasicBlockSectionsProfileReaderWrapperPass::getBBSPR() {
return BBSPR;
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll b/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll
new file mode 100644
index 0000000000000..ff97066659f4d
--- /dev/null
+++ b/llvm/test/CodeGen/X86/basic-block-sections-pgo-features.ll
@@ -0,0 +1,92 @@
+; Verify PGO analysis map features with basic block sections profile.
+;
+; RUN: echo 'v1' > %t
+; RUN: echo 'f foo' >> %t
+; RUN: echo 'g 0:1000,1:800,2:200 1:800,3:800 2:200,3:200 3:1000' >> %t
+; RUN: echo 'c 0 1 2' >> %t
+;
+; RUN: llc < %s -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=%t -basic-block-address-map -pgo-analysis-map=all | FileCheck %s
+
+define void @foo() nounwind !prof !0 {
+entry:
+ br label %bb1
+
+bb1:
+ br i1 undef, label %bb2, label %bb3, !prof !1
+
+bb2:
+ br label %bb3
+
+bb3:
+ ret void
+}
+
+!0 = !{!"function_entry_count", i64 1500}
+!1 = !{!"branch_weights", i32 1200, i32 300}
+
+;; Verify that foo gets its PGO map from its Propeller CFG profile.
+
+; CHECK: .section .llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text.foo
+; CHECK-NEXT: .byte 3 # version
+; CHECK-NEXT: .byte 15 # feature
+; CHECK: .quad .Lfunc_begin0 # base address
+; CHECK: .byte 0 # BB id
+; CHECK: .byte 1 # BB id
+; CHECK: .byte 2 # BB id
+; CHECK: .byte 3 # BB id
+
+; PGO Analysis Map for foo
+; CHECK: .ascii "\350\007" # function entry count
+; CHECK-NEXT: .ascii "\350\007" # basic block frequency
+; CHECK-NEXT: .byte 1 # basic block successor count
+; CHECK-NEXT: .byte 1 # successor BB ID
+; CHECK-NEXT: .ascii "\240\006" # successor branch probability
+; CHECK-NEXT: .ascii "\240\006" # basic block frequency
+; CHECK-NEXT: .byte 2 # basic block successor count
+; CHECK-NEXT: .byte 2 # successor BB ID
+; CHECK-NEXT: .byte 0 # successor branch probability
+; CHECK-NEXT: .byte 3 # successor BB ID
+; CHECK-NEXT: .ascii "\240\006" # successor branch probability
+; CHECK-NEXT: .ascii "\310\001" # basic block frequency
+; CHECK-NEXT: .byte 1 # basic block successor count
+; CHECK-NEXT: .byte 3 # successor BB ID
+; CHECK-NEXT: .ascii "\310\001" # successor branch probability
+; CHECK-NEXT: .ascii "\350\007" # basic block frequency
+; CHECK-NEXT: .byte 0 # basic block successor count
+
+define void @bar() nounwind !prof !2 {
+entry:
+ br i1 undef, label %bb1, label %bb2, !prof !3
+
+bb1:
+ ret void
+
+bb2:
+ ret void
+}
+
+!2 = !{!"function_entry_count", i64 80}
+!3 = !{!"branch_weights", i32 2, i32 78}
+
+;; Verify that we emit the PGO map for bar although it doesn't have Propeller profile.
+
+; CHECK: .section .llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text.bar
+; CHECK-NEXT: .byte 3 # version
+; CHECK-NEXT: .byte 7 # feature
+; CHECK: .quad .Lfunc_begin1 # function address
+; CHECK: .byte 0 # BB id
+; CHECK: .byte 1 # BB id
+; CHECK: .byte 2 # BB id
+
+; CHECK: .byte 80 # function entry count
+; CHECK-NEXT: .ascii "\200\200\200\200\200\200\200 " # basic block frequency
+; CHECK-NEXT: .byte 2 # basic block successor count
+; CHECK-NEXT: .byte 1 # successor BB ID
+; CHECK-NEXT: .ascii "\200\200\200\200\004" # successor branch probability
+; CHECK-NEXT: .byte 2 # successor BB ID
+; CHECK-NEXT: .ascii "\200\200\200\200\004" # successor branch probability
+; CHECK-NEXT: .ascii "\200\200\200\200\200\200\200\020" # basic block frequency
+; CHECK-NEXT: .byte 0 # basic block successor count
+; CHECK-NEXT: .ascii "\200\200\200\200\200\200\200\020" # basic block frequency
+; CHECK-NEXT: .byte 0 # basic block successor count
+
|
✅ With the latest revision this PR passed the undef deprecator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this?
The only current users of PGOAnalysisMap that I'm aware of are using it to validate PGO information, not the Propeller Profile.
We are going to compare fleet profiles against the compiler profile to verify profile fidelity. When binary is optimized using Propeller, the info coming from PGO aren't as useful. |
Verify profile fidelity of what? IRPGO? CSPGO? The propeller profile? |
Any profile that was used (if the function has Propeller profile, it will be used instead). I am open to suggestions. Do you think we should emit the Propeller info separate from the PGO? |
I don't think there's a need to emit separately. I think a flag to explicitly choose between data computed from BFI/BPI and Propeller would probably make the most sense for this given the use case (and error out if we can trivially detect that what was requested is not available, which probably is only possible in the propeller case). Emitting separately I guess would enable validating multiple profiling types with a single build, but I would think that's a pretty niche use case and probably better handled by just doing the build multiple times. |
Thanks for the feedback. I added an option To clarify, I do want to keep the BFI/PBI profile for functions which don't have any Propeller profile, though I am undecided about functions with both Propeller and BFI profiles. The BFI profile can still be useful for the Propeller-cold part of the function. However, I think we can start with the single source PGO map for now and iterate as we move forward. Regarding the rebuilding idea, we decided to include the PGO analysis map in the production binary and avoid rebuilding the binary to make the process simpler and to avoid being affected by non-determinism. |
When available, the Propeller CFG profile provides more fidelity than the PGO profile. It's also what has been used to optimize the function layout. So we should emit that in the PGO analysis map instead.
We still need a way to distinguish where the data comes from. We will handle this in a later PR.