You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Precompute: Rewrite logic for handling children to carefully decide which to keep (#7863)
The main changes here are:
* Rather than precompute sometimes while ignoring effects of tees etc.
and sometimes not, do so in a single manner: while considering the effects
carefully and deciding which children to keep.
* This lets us remove the dual cache from #7857, as now there is a single
mode.
But really, this is a rewrite of that core logic from scratch in a cleaner and
less hackish way, while fixing issues with the dual cache and even the
earlier single cache that it fixed:
* We must have a single cached object for each expression. Having a dual
cache opened us up to bugs, because it turns out we might actually
cache an object in the propagate phase, and use it in the main phase, and
each used a different cache. Now both phases do the same thing, so there
is no risk.
* We must compute effects when there are effects, because they are
state in the PrecomputingExpressionRunner, a source of bugs with all
previous caches. I realized that the solution here is simple: note when there
are effects, and if so, just compute them. This is fine, because the quadratic
case happens in global objects, which have no effects anyhow (and even inside
functions it is rare to have such effects). And, after computing the effects, we
use the single cached heap location, keeping identity stable (a key fix here).
Then, the main visitExpression is straightforward: compute in the most
general manner: NOT trying to replace the entire expression, which requires
no side effects, but allowing them, and looking at the children afterwards to
see which are actually needed. This is necessary to avoid a regression in this
PR, but it actually ends up as a progression, since we can handle more cases,
like (ref.eq (tee) (get)). Before the tee would stop us, and propagate doesn't
handle this if it isn't written to a local, but now we can just compute it, and
keep the tee around.
Also, I figured out how to avoid the monotonically increasing code size
problem, which e.g. GUFA has, where you see expression A, figure out it
evaluates to constant C, but has effects you must keep, so you emit (A, C).
That lets the constant get optimized, but if you run twice you can add C
twice, unless you carefully look at the parent, which is annoying. Here,
that is avoided because while we may add such a constant, regressing
size, we still make progress because we remove the main expression itself -
we may keep some children, but never the parent, so the increase is
bounded.
This improves not just GC code but is an improvement in some Emscripten size
tests.
There are also some minor theoretical regressions, as a few tests
show, but those are things other passes handle better (like
(return (return ..))), so they only happen when running the pass
by itself (production code using the full pipeline should only get
better).
This is also a slight improvement in compile times.
0 commit comments