You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Temporary automatic reference counting(ish) pass for inserting async deallocations. (#20765)
This performs local analysis only and bails on almost any case other
than the most trivial (calls, control flow (cf/scf), etc).
It does seem to work well for current programs that require it, though.
The intent is that the ARC pass would grow to cover all programs via
global analysis and handle program boundary cases around I/O using the
new `stream.resource.retain`/`stream.resource.release` ops. Currently
those cases are ignored.
This opens the opportunity for some in-compiler reuse analysis that can
happen after the ARC pass runs to take dealloca -> (join) -> alloca
sequences of the same affinity and size and reuse them. In extremely
fragmented programs (like sharded tensor-level parallel models) this
could eliminate nearly all allocations within the program. A simple
local ReuseAllocationsPass was added to handle the basic cases and in TP
405B that reduces the total number of allocations from 36k to 4k (which
is still way too high). Future global passes that track timelines better
or options that allow users to aggressively reuse allocations that may
be non-temporally adjacent could drop that number significantly now that
the deallocations are modeled.
A few canonicalization patterns were added for common cases that we want
to eagerly fix in the IR, such as erasing unused allocations and
flattening deallocation chains. There are some timepoint patterns
required but more testing is required to know whether they are
performance positive/neutral.
On the TP 405b model we now end up with perfect pairings of deallocas to
allocas in all but the boundary cases:
https://gist.github.com/benvanik/6f7a8abdca4fca389955882e6e98cf9d
After the new ReuseAllocationsPass the model is able to drop most
transients moving between devices:

0 commit comments