-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[MemProf] Convert Hot contexts to NotCold early #124219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-pgo @llvm/pr-subscribers-llvm-analysis Author: Teresa Johnson (teresajohnson) ChangesBy default we were marking some contexts as hot, and adding hot hints to While we convert hot contexts to notcold contexts during the cloning Additionally the generation of any hot hints has been disabled by This change resulted in significant overhead reductions for a large Note the same effect occurs from either disabling hot hints by default Full diff: https://github.com/llvm/llvm-project/pull/124219.diff 5 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index 215139caef696d..11325da3b86461 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -85,6 +85,11 @@ class CallStackTrie {
void collectContextSizeInfo(CallStackTrieNode *Node,
std::vector<ContextTotalSize> &ContextSizeInfo);
+ // Recursively convert hot allocation types to notcold, since we don't
+ // actually do any cloning for hot contexts, to facilitate more aggressive
+ // pruning of contexts.
+ void convertHotToNotCold(CallStackTrieNode *Node);
+
// Recursive helper to trim contexts and create metadata nodes.
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 2f3c87a89f9f98..d140ac08fa49e2 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -42,6 +42,11 @@ cl::opt<unsigned> MemProfMinAveLifetimeAccessDensityHotThreshold(
cl::desc("The minimum TotalLifetimeAccessDensity / AllocCount for an "
"allocation to be considered hot"));
+cl::opt<bool>
+ MemProfUseHotHints("memprof-use-hot-hints", cl::init(false), cl::Hidden,
+ cl::desc("Enable use of hot hints (only supported for "
+ "unambigously hot allocations)"));
+
cl::opt<bool> MemProfReportHintedSizes(
"memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
cl::desc("Report total allocation sizes of hinted allocations"));
@@ -60,8 +65,9 @@ AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
// The access densities are multiplied by 100 to hold 2 decimal places of
// precision, so need to divide by 100.
- if (((float)TotalLifetimeAccessDensity) / AllocCount / 100 >
- MemProfMinAveLifetimeAccessDensityHotThreshold)
+ if (MemProfUseHotHints &&
+ ((float)TotalLifetimeAccessDensity) / AllocCount / 100 >
+ MemProfMinAveLifetimeAccessDensityHotThreshold)
return AllocationType::Hot;
return AllocationType::NotCold;
@@ -222,6 +228,15 @@ void CallStackTrie::collectContextSizeInfo(
collectContextSizeInfo(Caller.second, ContextSizeInfo);
}
+void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
+ if (Node->AllocTypes & static_cast<uint8_t>(AllocationType::Hot)) {
+ Node->AllocTypes &= ~static_cast<uint8_t>(AllocationType::Hot);
+ Node->AllocTypes |= static_cast<uint8_t>(AllocationType::NotCold);
+ }
+ for (auto &Caller : Node->Callers)
+ convertHotToNotCold(Caller.second);
+}
+
// Recursive helper to trim contexts and create metadata nodes.
// Caller should have pushed Node's loc to MIBCallStack. Doing this in the
// caller makes it simpler to handle the many early returns in this method.
@@ -301,6 +316,22 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
"single");
return false;
}
+ // If there were any hot allocation contexts, the Alloc trie node would have
+ // the Hot type set. If so, because we don't currently support cloning for hot
+ // contexts, they should be converted to NotCold. This happens in the cloning
+ // support anyway, however, doing this now enables more aggressive context
+ // trimming when building the MIB metadata (and possibly may make the
+ // allocation have a single NotCold allocation type), greatly reducing
+ // overheads in bitcode, cloning memory and cloning time.
+ if (Alloc->AllocTypes & static_cast<uint8_t>(AllocationType::Hot)) {
+ convertHotToNotCold(Alloc);
+ // Check whether we now have a single alloc type.
+ if (hasSingleAllocType(Alloc->AllocTypes)) {
+ addSingleAllocTypeAttribute(CI, (AllocationType)Alloc->AllocTypes,
+ "single");
+ return false;
+ }
+ }
auto &Ctx = CI->getContext();
std::vector<uint64_t> MIBCallStack;
MIBCallStack.push_back(AllocStackId);
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index c0e44cccbf16ff..6aa2d307a1dc88 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -408,8 +408,8 @@ for.end: ; preds = %for.cond
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"cold"
-; MEMPROFRAND2: !"hot"
-; MEMPROFRAND2: !"hot"
+; MEMPROFRAND2: !"notcold"
+; MEMPROFRAND2: !"notcold"
; MEMPROFSTATS: 8 memprof - Number of alloc contexts in memory profile.
; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
diff --git a/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll b/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
index 9bc1282ab4529e..2461ca32e98212 100644
--- a/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
@@ -10,7 +10,9 @@
;; $ clang++ -gmlt -fdebug-info-for-profiling -S %S/Inputs/memprof_loop_unroll_b.cc -emit-llvm
; RUN: llvm-profdata merge %S/Inputs/memprof_loop_unroll.memprofraw --profiled-binary %S/Inputs/memprof_loop_unroll.exe -o %t.memprofdata
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s
+;; Set the minimum lifetime threshold to 0 to ensure that one context is
+;; considered cold (the other will be notcold).
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes -memprof-ave-lifetime-cold-threshold=0 2>&1 | FileCheck %s
;; Conservatively annotate as not cold. We get two messages as there are two
;; unrolled copies of the allocation.
diff --git a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
index 4c177ae8446905..4161be0fadca73 100644
--- a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
+++ b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
@@ -25,6 +25,7 @@ using namespace llvm::memprof;
extern cl::opt<float> MemProfLifetimeAccessDensityColdThreshold;
extern cl::opt<unsigned> MemProfAveLifetimeColdThreshold;
extern cl::opt<unsigned> MemProfMinAveLifetimeAccessDensityHotThreshold;
+extern cl::opt<bool> MemProfUseHotHints;
namespace {
@@ -81,14 +82,22 @@ TEST_F(MemoryProfileInfoTest, GetAllocType) {
// MemProfMinAveLifetimeAccessDensityHotThreshold
// so compute the HotTotalLifetimeAccessDensityThreshold at the threshold.
const uint64_t HotTotalLifetimeAccessDensityThreshold =
- (uint64_t)(MemProfMinAveLifetimeAccessDensityHotThreshold * AllocCount * 100);
-
-
+ (uint64_t)(MemProfMinAveLifetimeAccessDensityHotThreshold * AllocCount * 100);
+
+ // Make sure the option for detecting hot allocations is set.
+ MemProfUseHotHints = true;
// Test Hot
// More accesses per byte per sec than hot threshold is hot.
EXPECT_EQ(getAllocType(HotTotalLifetimeAccessDensityThreshold + 1, AllocCount,
ColdTotalLifetimeThreshold + 1),
- AllocationType::Hot);
+ AllocationType::Hot);
+ // Undo the manual set of the option above.
+ cl::ResetAllOptionOccurrences();
+
+ // Without MemProfUseHotHints (default) we should treat simply as NotCold.
+ EXPECT_EQ(getAllocType(HotTotalLifetimeAccessDensityThreshold + 1, AllocCount,
+ ColdTotalLifetimeThreshold + 1),
+ AllocationType::NotCold);
// Test Cold
// Long lived with less accesses per byte per sec than cold threshold is cold.
@@ -155,6 +164,8 @@ define i32* @test() {
%1 = bitcast i8* %call2 to i32*
%call3 = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
%2 = bitcast i8* %call3 to i32*
+ %call4 = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
+ %3 = bitcast i8* %call4 to i32*
ret i32* %1
}
declare dso_local noalias noundef i8* @malloc(i64 noundef)
@@ -194,6 +205,18 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_FALSE(Call3->hasMetadata(LLVMContext::MD_memprof));
EXPECT_TRUE(Call3->hasFnAttr("memprof"));
EXPECT_EQ(Call3->getFnAttr("memprof").getValueAsString(), "hot");
+
+ // Fourth call has hot and non-cold contexts. These should be treated as
+ // notcold and given a notcold attribute.
+ CallStackTrie Trie4;
+ Trie4.addCallStack(AllocationType::Hot, {5, 6});
+ Trie4.addCallStack(AllocationType::NotCold, {5, 7, 8});
+ CallBase *Call4 = findCall(*Func, "call4");
+ Trie4.buildAndAttachMIBMetadata(Call4);
+
+ EXPECT_FALSE(Call4->hasMetadata(LLVMContext::MD_memprof));
+ EXPECT_TRUE(Call4->hasFnAttr("memprof"));
+ EXPECT_EQ(Call4->getFnAttr("memprof").getValueAsString(), "notcold");
}
// Test CallStackTrie::addCallStack interface taking allocation type and list of
@@ -289,56 +312,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
else {
ASSERT_EQ(StackId->getZExtValue(), 3u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
- }
- }
-}
-
-// Test CallStackTrie::addCallStack interface taking allocation type and list of
-// call stack ids.
-// Test that an allocation call reached by both non cold and hot call stacks
-// gets memprof metadata representing the different allocation type contexts.
-TEST_F(MemoryProfileInfoTest, NotColdAndHotMIB) {
- LLVMContext C;
- std::unique_ptr<Module> M = makeLLVMModule(C,
- R"IR(
-target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
-target triple = "x86_64-pc-linux-gnu"
-define i32* @test() {
-entry:
- %call = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
- %0 = bitcast i8* %call to i32*
- ret i32* %0
-}
-declare dso_local noalias noundef i8* @malloc(i64 noundef)
-)IR");
-
- Function *Func = M->getFunction("test");
-
- CallStackTrie Trie;
- Trie.addCallStack(AllocationType::NotCold, {1, 2});
- Trie.addCallStack(AllocationType::Hot, {1, 3});
-
- CallBase *Call = findCall(*Func, "call");
- Trie.buildAndAttachMIBMetadata(Call);
-
- EXPECT_FALSE(Call->hasFnAttr("memprof"));
- EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
- MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
- ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
- for (auto &MIBOp : MemProfMD->operands()) {
- MDNode *MIB = dyn_cast<MDNode>(MIBOp);
- MDNode *StackMD = getMIBStackNode(MIB);
- ASSERT_NE(StackMD, nullptr);
- ASSERT_EQ(StackMD->getNumOperands(), 2u);
- auto *StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(0));
- ASSERT_EQ(StackId->getZExtValue(), 1u);
- StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(1));
- if (StackId->getZExtValue() == 2u)
+ // Hot contexts are converted to NotCold when building the metadata.
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
- else {
- ASSERT_EQ(StackId->getZExtValue(), 3u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
}
}
}
@@ -391,7 +366,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
} else {
ASSERT_EQ(StackId->getZExtValue(), 4u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
@@ -453,7 +429,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
else {
ASSERT_EQ(StackId->getZExtValue(), 8u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
@@ -596,7 +573,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
else {
ASSERT_EQ(StackId->getZExtValue(), 8u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the new metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
|
|
|
||
| ; RUN: llvm-profdata merge %S/Inputs/memprof_loop_unroll.memprofraw --profiled-binary %S/Inputs/memprof_loop_unroll.exe -o %t.memprofdata | ||
| ; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s | ||
| ;; Set the minimum lifetime threshold to 0 to ensure that one context is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is necessary because previously one context was NotCold and the other was Hot, and now they were both NotCold, changing the test behavior.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some helpers to Node so that we don't have to sprinkle static_cast and bit masking operations in different places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why we need this. If we set MemProfUseHotHints to false, then no alloctypes should have hot type and we'll never enter this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default hot hint generation is off, however, since it is still optionally supported this avoids a big overhead increase if that is re-enabled for experimentation etc. If we keep that support here, even if optional, I want to ensure things keep working the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so this change really does 2 things -- optimize when hot hints are used and allow disabling them altogether. The description now makes more sense to me. Could you commit this as two separate changes -- adding the flag and the optimization (conversion to notcold)? For the former it would be good to have a small test for the flag usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I can do that. There is a unittest that tests with the flag, but I'll update llvm/test/Transforms/PGOProfile/memprof.ll to test with this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR124338. I'll update this one to be just about the conversion once that goes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, ptal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the former it would be good to have a small test for the flag usage.
I had to remove the test I added there as it checks metadata, and I had forgotten that with this change that would no longer happen. I will submit a follow on test that checks the option, using an unambiguously hot allocation.
30330f1 to
4ee2fe7
Compare
While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. To address this, any hot contexts are converted to notcold contexts immediately after first checking for unambiguous allocation types, and before checking it again and before adding metadata while performing context trimming. Note that hot hints are now disabled by default, however, this avoids adding unnecessary overhead if they are re-enabled.
4ee2fe7 to
3afe178
Compare
snehasish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
here defeats that. I will add a separate hot hint test for the option as a follow on.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.
Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.