Skip to content

Commit a498be9

Browse files
spupyrevmemfrob
authored andcommitted
improving hfsort+ algorithm
Summary: A few improvements for hfsort+ algorithm. The goal of the diff is (i) to make the resulting function order more i-cache "friendly" and (ii) fix a bug with incorrect input edge weights. A specific list of changes is as follows: - The "samples" field of CallGraph.Node should be at least the sum of incoming edge weights. Fixed with a new method CallGraph::adjustArcWeights() - A new optimization pass for hfsort+ in which pairs of functions that call each other with very high probability (>=0.99) are always merged. This improves the resulting i-cache but may worsen i-TLB. See a new method HFSortPlus::runPassOne() - Adjusted optimization goal to make the resulting ordering more i-cache "friendly", see HFSortPlus::expectedCalls and HFSortPlus::mergeGain - Functions w/o samples are now reordered too (they're placed at the end of the list of hot functions). These functions do appear in the call graph, as some of their basic blocks have samples in the LBR dataset. See HfSortPlus::initializeClusters (cherry picked from FBD6248850)
1 parent 7de9bba commit a498be9

File tree

5 files changed

+254
-151
lines changed

5 files changed

+254
-151
lines changed

bolt/Passes/CallGraph.cpp

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,6 @@ const CallGraph::Arc &CallGraph::incArcWeight(NodeId Src, NodeId Dst, double W,
9494
}
9595

9696
void CallGraph::normalizeArcWeights() {
97-
// Normalize arc weights
9897
for (NodeId FuncId = 0; FuncId < numNodes(); ++FuncId) {
9998
auto& Func = getNode(FuncId);
10099
for (auto Caller : Func.predecessors()) {
@@ -108,5 +107,18 @@ void CallGraph::normalizeArcWeights() {
108107
}
109108
}
110109

110+
void CallGraph::adjustArcWeights() {
111+
for (NodeId FuncId = 0; FuncId < numNodes(); ++FuncId) {
112+
auto& Func = getNode(FuncId);
113+
uint64_t InWeight = 0;
114+
for (auto Caller : Func.predecessors()) {
115+
auto Arc = findArc(Caller, FuncId);
116+
InWeight += (uint64_t)Arc->weight();
117+
}
118+
if (Func.samples() < InWeight)
119+
setSamples(FuncId, InWeight);
120+
}
121+
}
122+
111123
}
112124
}

bolt/Passes/CallGraph.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,21 @@ class CallGraph {
153153
return double(Arcs.size()) / (Nodes.size()*Nodes.size());
154154
}
155155

156+
// Initialize NormalizedWeight field for every arc
156157
void normalizeArcWeights();
158+
// Make sure that the sum of incoming arc weights is at least the number of
159+
// samples for every node
160+
void adjustArcWeights();
157161

158162
template <typename L>
159163
void printDot(char* fileName, L getLabel) const;
164+
160165
private:
166+
void setSamples(const NodeId Id, uint64_t Samples) {
167+
assert(Id < Nodes.size());
168+
Nodes[Id].Samples = Samples;
169+
}
170+
161171
std::vector<Node> Nodes;
162172
ArcsType Arcs;
163173
};

bolt/Passes/HFSort.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ std::vector<Cluster> clusterize(const CallGraph &Cg);
103103
/*
104104
* Optimize function placement for iTLB cache and i-cache.
105105
*/
106-
std::vector<Cluster> hfsortPlus(const CallGraph &Cg,
106+
std::vector<Cluster> hfsortPlus(CallGraph &Cg,
107107
bool UseGainCache = true,
108108
bool UseShortCallCache = true);
109109

0 commit comments

Comments
 (0)