Skip to content

Commit dc90472

Browse files
[MemProf] Ensure node merging happens for newly created nodes (#151593)
We weren't performing node merging on newly created nodes in some cases. Use a simple iteration over the node and its callers until no more opportunities are found. I confirmed that for several large codes the max iterations is 3 (meaning we only needed to do any work on the first 2, as expected). This can potentially be made more elegant in the future, but it is a simple and effective solution. Also fix a bug exposed by the test case, getting the function for a call instruction in the FullLTO handling, using an existing method to look through aliases if needed.
1 parent 9fdb5e1 commit dc90472

File tree

4 files changed

+1146
-9
lines changed

4 files changed

+1146
-9
lines changed

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,9 @@ STATISTIC(SkippedCallsCloning,
9999
"Number of calls skipped during cloning due to unexpected operand");
100100
STATISTIC(MismatchedCloneAssignments,
101101
"Number of callsites assigned to call multiple non-matching clones");
102+
STATISTIC(TotalMergeInvokes, "Number of merge invocations for nodes");
103+
STATISTIC(TotalMergeIters, "Number of merge iterations for nodes");
104+
STATISTIC(MaxMergeIters, "Max merge iterations for nodes");
102105

103106
static cl::opt<std::string> DotFilePathPrefix(
104107
"memprof-dot-file-path-prefix", cl::init(""), cl::Hidden,
@@ -109,6 +112,11 @@ static cl::opt<bool> ExportToDot("memprof-export-to-dot", cl::init(false),
109112
cl::Hidden,
110113
cl::desc("Export graph to dot files."));
111114

115+
// TODO: Remove this option once new handling is validated more widely.
116+
static cl::opt<bool> DoMergeIteration(
117+
"memprof-merge-iteration", cl::init(true), cl::Hidden,
118+
cl::desc("Iteratively apply merging on a node to catch new callers"));
119+
112120
// How much of the graph to export to dot.
113121
enum DotScope {
114122
All, // The full CCG graph.
@@ -3995,7 +4003,7 @@ IndexCallsiteContextGraph::getAllocationCallType(const CallInfo &Call) const {
39954003

39964004
void ModuleCallsiteContextGraph::updateCall(CallInfo &CallerCall,
39974005
FuncInfo CalleeFunc) {
3998-
auto *CurF = cast<CallBase>(CallerCall.call())->getCalledFunction();
4006+
auto *CurF = getCalleeFunc(CallerCall.call());
39994007
auto NewCalleeCloneNo = CalleeFunc.cloneNo();
40004008
if (isMemProfClone(*CurF)) {
40014009
// If we already assigned this callsite to call a specific non-default
@@ -4191,16 +4199,36 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::mergeClones(
41914199
if (!Inserted.second)
41924200
return;
41934201

4194-
// Make a copy since the recursive call may move a caller edge to a new
4195-
// callee, messing up the iterator.
4196-
auto CallerEdges = Node->CallerEdges;
4197-
for (auto CallerEdge : CallerEdges) {
4198-
// Skip any caller edge moved onto a different callee during recursion.
4199-
if (CallerEdge->Callee != Node)
4200-
continue;
4201-
mergeClones(CallerEdge->Caller, Visited, ContextIdToAllocationNode);
4202+
// Iteratively perform merging on this node to handle new caller nodes created
4203+
// during the recursive traversal. We could do something more elegant such as
4204+
// maintain a worklist, but this is a simple approach that doesn't cause a
4205+
// measureable compile time effect, as most nodes don't have many caller
4206+
// edges to check.
4207+
bool FoundUnvisited = true;
4208+
unsigned Iters = 0;
4209+
while (FoundUnvisited) {
4210+
Iters++;
4211+
FoundUnvisited = false;
4212+
// Make a copy since the recursive call may move a caller edge to a new
4213+
// callee, messing up the iterator.
4214+
auto CallerEdges = Node->CallerEdges;
4215+
for (auto CallerEdge : CallerEdges) {
4216+
// Skip any caller edge moved onto a different callee during recursion.
4217+
if (CallerEdge->Callee != Node)
4218+
continue;
4219+
// If we found an unvisited caller, note that we should check the caller
4220+
// edges again as mergeClones may add or change caller nodes.
4221+
if (DoMergeIteration && !Visited.contains(CallerEdge->Caller))
4222+
FoundUnvisited = true;
4223+
mergeClones(CallerEdge->Caller, Visited, ContextIdToAllocationNode);
4224+
}
42024225
}
42034226

4227+
TotalMergeInvokes++;
4228+
TotalMergeIters += Iters;
4229+
if (Iters > MaxMergeIters)
4230+
MaxMergeIters = Iters;
4231+
42044232
// Merge for this node after we handle its callers.
42054233
mergeNodeCalleeClones(Node, Visited, ContextIdToAllocationNode);
42064234
}

0 commit comments

Comments
 (0)