Skip to content

Commit 9de4aca

Browse files
authored
Precompute: Rewrite logic for handling children to carefully decide which to keep (#7863)
The main changes here are: * Rather than precompute sometimes while ignoring effects of tees etc. and sometimes not, do so in a single manner: while considering the effects carefully and deciding which children to keep. * This lets us remove the dual cache from #7857, as now there is a single mode. But really, this is a rewrite of that core logic from scratch in a cleaner and less hackish way, while fixing issues with the dual cache and even the earlier single cache that it fixed: * We must have a single cached object for each expression. Having a dual cache opened us up to bugs, because it turns out we might actually cache an object in the propagate phase, and use it in the main phase, and each used a different cache. Now both phases do the same thing, so there is no risk. * We must compute effects when there are effects, because they are state in the PrecomputingExpressionRunner, a source of bugs with all previous caches. I realized that the solution here is simple: note when there are effects, and if so, just compute them. This is fine, because the quadratic case happens in global objects, which have no effects anyhow (and even inside functions it is rare to have such effects). And, after computing the effects, we use the single cached heap location, keeping identity stable (a key fix here). Then, the main visitExpression is straightforward: compute in the most general manner: NOT trying to replace the entire expression, which requires no side effects, but allowing them, and looking at the children afterwards to see which are actually needed. This is necessary to avoid a regression in this PR, but it actually ends up as a progression, since we can handle more cases, like (ref.eq (tee) (get)). Before the tee would stop us, and propagate doesn't handle this if it isn't written to a local, but now we can just compute it, and keep the tee around. Also, I figured out how to avoid the monotonically increasing code size problem, which e.g. GUFA has, where you see expression A, figure out it evaluates to constant C, but has effects you must keep, so you emit (A, C). That lets the constant get optimized, but if you run twice you can add C twice, unless you carefully look at the parent, which is annoying. Here, that is avoided because while we may add such a constant, regressing size, we still make progress because we remove the main expression itself - we may keep some children, but never the parent, so the increase is bounded. This improves not just GC code but is an improvement in some Emscripten size tests. There are also some minor theoretical regressions, as a few tests show, but those are things other passes handle better (like (return (return ..))), so they only happen when running the pass by itself (production code using the full pipeline should only get better). This is also a slight improvement in compile times.
1 parent 126b4f7 commit 9de4aca

File tree

7 files changed

+683
-170
lines changed

7 files changed

+683
-170
lines changed

src/passes/Precompute.cpp

Lines changed: 173 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -72,20 +72,19 @@ using GetValues = std::unordered_map<LocalGet*, Literals>;
7272
// possible input values than that struct.new, which means we will not infer
7373
// a value for it, and not attempt to say anything about comparisons of $x.
7474
struct HeapValues {
75-
// Store two maps, one for effects and one without. The one with effects is
76-
// used when PRESERVE_SIDEEFFECTS is on, and the other when not. This is
77-
// necessary because when we preserve effects then nested effects in a GC
78-
// allocation can cause us to end up as nonconstant (nothing can be
79-
// precomputed), and we do not want to mix results between the two modes (if
80-
// we did, we might cache a result when we ignore effects that we later use
81-
// when not ignoring them, which would forget the effects).
82-
std::unordered_map<Expression*, std::shared_ptr<GCData>> withEffects,
83-
withoutEffects;
84-
85-
void clear() {
86-
withEffects.clear();
87-
withoutEffects.clear();
88-
}
75+
struct Entry {
76+
// The GC data for an expression.
77+
std::shared_ptr<GCData> data;
78+
// Whether the expression has effects. If it does then we must recompute it
79+
// each time we see it, even though we return |data| to represent it.
80+
// (Recomputing will apply those effects each time, so we don't forget them
81+
// when we read from the cache. This recomputing is rare, and doesn't happen
82+
// e.g. in global GC objects, where most of the work happens, so this cache
83+
// still saves a lot.)
84+
bool hasEffects;
85+
};
86+
87+
std::unordered_map<Expression*, Entry> map;
8988
};
9089

9190
// Precomputes an expression. Errors if we hit anything that can't be
@@ -202,23 +201,28 @@ class PrecomputingExpressionRunner
202201

203202
// Generates heap info for a heap-allocating expression.
204203
Flow getGCAllocation(Expression* curr, std::function<Flow()> visitFunc) {
205-
auto& heapValuesMap = (flags & FlagValues::PRESERVE_SIDEEFFECTS)
206-
? heapValues.withEffects
207-
: heapValues.withoutEffects;
208204
// We must return a literal that refers to the canonical location for this
209205
// source expression, so that each time we compute a specific *.new then
210206
// we get the same identity.
211-
auto iter = heapValuesMap.find(curr);
212-
if (iter != heapValuesMap.end()) {
207+
auto iter = heapValues.map.find(curr);
208+
if (iter != heapValues.map.end()) {
209+
auto& [data, hasEffects] = iter->second;
210+
if (hasEffects) {
211+
// Visit, so we recompute the effects. (This is rare, see comment
212+
// above.)
213+
visitFunc();
214+
}
213215
// Refer to the same canonical GCData that we already created.
214-
return Literal(iter->second, curr->type.getHeapType());
216+
return Literal(data, curr->type.getHeapType());
215217
}
216-
// Only call the visitor function here, so we do it once per allocation.
218+
// Only call the visitor function here, so we do it once per allocation. See
219+
// if we have effects while doing so.
217220
auto flow = visitFunc();
218221
if (flow.breaking()) {
219222
return flow;
220223
}
221-
heapValuesMap[curr] = flow.getSingleValue().getGCData();
224+
heapValues.map[curr] =
225+
HeapValues::Entry{flow.getSingleValue().getGCData(), hasEffectfulSets()};
222226
return flow;
223227
}
224228

@@ -301,94 +305,169 @@ struct Precompute
301305
// unlikely chance, we leave such things for later.
302306
}
303307

304-
template<typename T> void reuseConstantNode(T* curr, Flow flow) {
305-
if (flow.values.isConcrete()) {
306-
// reuse a const / ref.null / ref.func node if there is one
307-
if (curr->value && flow.values.size() == 1) {
308-
Literal singleValue = flow.getSingleValue();
309-
if (singleValue.type.isNumber()) {
310-
if (auto* c = curr->value->template dynCast<Const>()) {
311-
c->value = singleValue;
312-
c->finalize();
313-
curr->finalize();
314-
return;
315-
}
316-
} else if (singleValue.isNull()) {
317-
if (auto* n = curr->value->template dynCast<RefNull>()) {
318-
n->finalize(singleValue.type);
319-
curr->finalize();
320-
return;
321-
}
322-
} else if (singleValue.type.isRef() &&
323-
singleValue.type.getHeapType().isSignature()) {
324-
if (auto* r = curr->value->template dynCast<RefFunc>()) {
325-
r->func = singleValue.getFunc();
326-
auto heapType = getModule()->getFunction(r->func)->type;
327-
r->finalize(heapType);
328-
curr->finalize();
329-
return;
330-
}
331-
}
308+
void visitExpression(Expression* curr) {
309+
// Ignore trivial things like constants, nops, local/global.set (which have
310+
// an effect we cannot remove, and it is simpler to ignore them here than
311+
// later below), return (which we cannot improve), and loop (which it is
312+
// simpler to leave for other passes).
313+
if (Properties::isConstantExpression(curr) || curr->is<Nop>() ||
314+
curr->is<LocalSet>() || curr->is<GlobalSet>() || curr->is<Return>() ||
315+
curr->is<Loop>()) {
316+
return;
317+
}
318+
// Breaks with conditions can be simplified, but unconditional ones are like
319+
// returns, and we cannot improve.
320+
if (auto* br = curr->dynCast<Break>()) {
321+
if (!br->condition) {
322+
return;
332323
}
333-
curr->value = flow.getConstExpression(*getModule());
334-
} else {
335-
curr->value = nullptr;
336324
}
337-
curr->finalize();
338-
}
339325

340-
void visitExpression(Expression* curr) {
341-
// TODO: if local.get, only replace with a constant if we don't care about
342-
// size...?
343-
if (Properties::isConstantExpression(curr) || curr->is<Nop>()) {
326+
// See if we can precompute the value that flows out. We set
327+
// |replaceExpression| to false because we do not necessarily want to
328+
// replace it entirely, see below - we may keep parts, in some cases, if we
329+
// can still simplify it to a precomputed value.
330+
Flow flow;
331+
PrecomputingExpressionRunner runner(
332+
getModule(), getValues, heapValues, false /* replaceExpression */);
333+
try {
334+
flow = runner.visit(curr);
335+
} catch (NonconstantException&) {
344336
return;
345337
}
346-
// try to evaluate this into a const
347-
Flow flow = precomputeExpression(curr);
338+
// The resulting value must be of a type we can emit a constant for (or
339+
// there must be no value at all, in which case the value is a nop).
348340
if (!canEmitConstantFor(flow.values)) {
349341
return;
350342
}
343+
if (flow.breakTo == NONCONSTANT_FLOW) {
344+
// This cannot be turned into a constant, but perhaps we can partially
345+
// precompute it.
346+
considerPartiallyPrecomputing(curr);
347+
return;
348+
}
349+
// TODO: Handle suspends somehow?
350+
if (flow.suspendTag) {
351+
return;
352+
}
353+
354+
// This looks like a promising precomputation: We have found that its value,
355+
// if any, can be emitted as a constant (or there is no value, and it is a
356+
// nop or break etc.). Build that value, so we can replace the expression
357+
// with it.
358+
Builder builder(*getModule());
359+
Expression* value = nullptr;
360+
if (flow.values.isConcrete()) {
361+
value = flow.getConstExpression(*getModule());
362+
}
351363
if (flow.breaking()) {
352-
if (flow.breakTo == NONCONSTANT_FLOW) {
353-
// This cannot be turned into a constant, but perhaps we can partially
354-
// precompute it.
355-
considerPartiallyPrecomputing(curr);
364+
if (flow.breakTo == RETURN_FLOW) {
365+
// We avoided trivial returns earlier (by doing so, we avoid wasted
366+
// work replacing a return with itself).
367+
assert(!curr->is<Return>());
368+
value = builder.makeReturn(value);
369+
} else {
370+
value = builder.makeBreak(flow.breakTo, value);
371+
}
372+
// Note we don't need to handle RETURN_CALL_FLOW, as the call there has
373+
// effects that would stop us earlier.
374+
}
375+
376+
// We have something to replace the expression. While precomputing the
377+
// expression, we verified it has no effects that cause problems - no traps
378+
// or exceptions etc., as those things would lead to NONCONSTANT_FLOW. We
379+
// can therefore replace this with what flows out of it. The only exception
380+
// is that we set replaceExpression to false, above, which means we run the
381+
// interpreter without PRESERVE_SIDEEFFECTS. That allows local and global
382+
// sets to happen (to help optimize small code fragments with sets and
383+
// gets). To handle that, keep relevant children if we have such sets.
384+
if (runner.hasEffectfulSets()) {
385+
if (curr->is<Block>() || curr->is<If>() || curr->is<Try>()) {
386+
// These control flow structures have children that might not execute.
387+
// We know that some of the children have effectful sets, but not which,
388+
// and we can't just keep them all, so give up.
389+
// TODO: Check if this would be useful to improve, but other passes
390+
// might do enough already.
356391
return;
357392
}
358-
if (flow.breakTo == RETURN_FLOW) {
359-
// this expression causes a return. if it's already a return, reuse the
360-
// node
361-
if (auto* ret = curr->dynCast<Return>()) {
362-
reuseConstantNode(ret, flow);
363-
} else {
364-
Builder builder(*getModule());
365-
replaceCurrent(builder.makeReturn(
366-
flow.values.isConcrete() ? flow.getConstExpression(*getModule())
367-
: nullptr));
368-
}
393+
394+
// To keep things simple, stop here if we are precomputing to a break/
395+
// return. Handling that case requires ordering considerations:
396+
//
397+
// (foo
398+
// (br)
399+
// (call)
400+
// )
401+
//
402+
// Here we know we need to keep the call, and can remove foo, but this
403+
// would be wrong:
404+
//
405+
// (block
406+
// ;; removed br
407+
// (call)
408+
// (br) ;; the value we precompute to, added at the end
409+
// )
410+
//
411+
// Instead we must keep the br, leaving this for later opts to improve:
412+
//
413+
// (block
414+
// (br)
415+
// (call)
416+
// (br)
417+
// )
418+
//
419+
// That is, we cannot remove unneeded children easily in this case, where
420+
// control flow might transfer, so we need to keep all children when we
421+
// remove foo. In that case, it's not clear we are helping much, and other
422+
// passes can do better with the break/return anyhow. After dismissing
423+
// this situation, we know no transfer of control flow needs to be handled
424+
// in the code below (because we executed the code, and found it did not
425+
// do so).
426+
if (flow.breaking()) {
369427
return;
370428
}
371-
// this expression causes a break, emit it directly. if it's already a br,
372-
// reuse the node.
373-
if (auto* br = curr->dynCast<Break>()) {
374-
br->name = flow.breakTo;
375-
br->condition = nullptr;
376-
reuseConstantNode(br, flow);
377-
} else {
378-
Builder builder(*getModule());
379-
replaceCurrent(builder.makeBreak(
380-
flow.breakTo,
381-
flow.values.isConcrete() ? flow.getConstExpression(*getModule())
382-
: nullptr));
429+
430+
// Find the necessary children that we must keep.
431+
SmallVector<Expression*, 10> kept;
432+
for (auto* child : ChildIterator(curr)) {
433+
EffectAnalyzer effects(getPassOptions(), *getModule(), child);
434+
if (!effects.localsWritten.empty() || !effects.globalsWritten.empty()) {
435+
kept.push_back(builder.makeDrop(child));
436+
}
437+
}
438+
// Find all the things we must keep, which might include |value|.
439+
if (!kept.empty()) {
440+
if (value) {
441+
kept.push_back(value);
442+
}
443+
if (kept.size() == 1) {
444+
value = kept[0];
445+
} else {
446+
// We are returning a block with some kept children + some value. This
447+
// may seem to increase code size in some cases, but it cannot do so
448+
// monotonically: while doing all this we are definitely removing
449+
// |curr| itself, so we are making progress, even if we emit a new
450+
// constant that we weren't before. That is, we are not in this
451+
// situation:
452+
//
453+
// (foo A B) => (block (foo A B) (value))
454+
//
455+
// We are in this one:
456+
//
457+
// (foo A B) => (block A B (value))
458+
//
459+
// where foo vanishes.
460+
value = builder.makeBlock(kept);
461+
}
383462
}
384-
return;
385463
}
386-
// this was precomputed
387-
if (flow.values.isConcrete()) {
388-
replaceCurrent(flow.getConstExpression(*getModule()));
389-
} else {
464+
if (!value) {
465+
// We don't need to replace this with anything: there is no value or other
466+
// code that we need. Just nop it.
390467
ExpressionManipulator::nop(curr);
468+
return;
391469
}
470+
replaceCurrent(value);
392471
}
393472

394473
void visitBlock(Block* curr) {
@@ -695,7 +774,7 @@ struct Precompute
695774
// |parent|. Results here must not be cached for later.
696775
HeapValues temp;
697776
auto ifTrue = precomputeExpression(parent, true, &temp);
698-
temp.clear();
777+
temp.map.clear();
699778
if (isValidPrecomputation(ifTrue)) {
700779
*pointerToSelect = select->ifFalse;
701780
auto ifFalse = precomputeExpression(parent, true, &temp);

src/wasm-interpreter.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2741,6 +2741,11 @@ class ConstantExpressionRunner : public ExpressionRunner<SubType> {
27412741
globalValues[name] = values;
27422742
}
27432743

2744+
// Returns true if we set a local or a global.
2745+
bool hasEffectfulSets() const {
2746+
return !localValues.empty() || !globalValues.empty();
2747+
}
2748+
27442749
Flow visitLocalGet(LocalGet* curr) {
27452750
// Check if a constant value has been set in the context of this runner.
27462751
auto iter = localValues.find(curr->index);

0 commit comments

Comments
 (0)