Skip to content

Commit 7a39522

Browse files
authored
[NFC] DeadArgumentElimination: Compute callers once (#8053)
This is an over-approximation, as later iterations may have fewer calls - if we remove a call. Such removal is rare, however, and the cost of computing this map is actually very high. Computing it once up front, and using that over- approximation, turns out to be much faster, even if in theory it can lead to a little wasted work (more scanning; but no optimizations are missed, as we do not use this mapping to decide what/how to optimize). This makes the pass 30-40% faster on several large testcases I tried, from Dart, Java, and C++. I did see two testcases that benefited only by 10-20%, from GraalVM and Rust, but I saw none that did not benefit significantly. This was the slowest pass in some of those 30-40% cases.
1 parent 5190098 commit 7a39522

File tree

1 file changed

+33
-6
lines changed

1 file changed

+33
-6
lines changed

src/passes/DeadArgumentElimination.cpp

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
// watch for here).
3535
//
3636

37+
#include <algorithm>
3738
#include <unordered_map>
3839
#include <unordered_set>
3940

@@ -218,6 +219,20 @@ struct DAE : public Pass {
218219
}
219220
}
220221

222+
// For each function, the set of callers. This is used to propagate changes,
223+
// e.g. if we remove a return value from a function, the calls might benefit
224+
// from optimization. It is ok if this is an over-approximation, that is, if
225+
// we think there are more callers than there are, as it would just lead to
226+
// unneeded extra scanning of calling functions (in the example just given, if
227+
// a caller did not actually call, they would not benefit from the extra
228+
// optimization, but no harm is done, and no optimization missed). Such over-
229+
// approximation can happen in later optimization iterations: We may manage to
230+
// remove a call from a function to another (say, after applying a constant
231+
// param, we see the call is not reached). This is somewhat rare, and the cost
232+
// of computing this map is significant, so we compute it once at the start
233+
// and then use that possibly-over-approximating data.
234+
std::vector<std::vector<Name>> callers;
235+
221236
bool iteration(Module* module, DAEFunctionInfoMap& infoMap) {
222237
allDroppedCalls.clear();
223238

@@ -246,15 +261,10 @@ struct DAE : public Pass {
246261
std::vector<bool> tailCallees(numFunctions);
247262
std::vector<bool> hasUnseenCalls(numFunctions);
248263

249-
// For each function, the set of callers.
250-
std::vector<std::unordered_set<Name>> callers(numFunctions);
251-
252264
for (auto& [func, info] : infoMap) {
253265
for (auto& [name, calls] : info.calls) {
254-
auto targetIndex = indexes[name];
255-
auto& allCallsToName = allCalls[targetIndex];
266+
auto& allCallsToName = allCalls[indexes[name]];
256267
allCallsToName.insert(allCallsToName.end(), calls.begin(), calls.end());
257-
callers[targetIndex].insert(func);
258268
}
259269
for (auto& callee : info.tailCallees) {
260270
tailCallees[indexes[callee]] = true;
@@ -273,6 +283,23 @@ struct DAE : public Pass {
273283
}
274284
}
275285

286+
// See comment above, we compute callers once and never again.
287+
if (callers.empty()) {
288+
// Compute first as sets, to deduplicate.
289+
std::vector<std::unordered_set<Name>> callersSets(numFunctions);
290+
for (auto& [func, info] : infoMap) {
291+
for (auto& [name, calls] : info.calls) {
292+
callersSets[indexes[name]].insert(func);
293+
}
294+
}
295+
// Copy into efficient vectors.
296+
callers.resize(numFunctions);
297+
for (Index i = 0; i < numFunctions; ++i) {
298+
auto& set = callersSets[i];
299+
callers[i] = std::vector<Name>(set.begin(), set.end());
300+
}
301+
}
302+
276303
// Track which functions we changed that are worth re-optimizing at the end.
277304
std::unordered_set<Function*> worthOptimizing;
278305

0 commit comments

Comments
 (0)