@@ -80,21 +80,21 @@ consists of the following fields:
8080- VReg, or virtual register. * Every* operand mentions a virtual
8181 register, even if it is constrained to a single physical register in
8282 practice. This is because we track liveranges uniformly by vreg.
83-
83+
8484- Policy, or "constraint". Every reference to a vreg can apply some
8585 constraint to the vreg at that point in the program. Valid policies are:
86-
86+
8787 - Any location;
8888 - Any register of the vreg's class;
8989 - Any stack slot;
9090 - A particular fixed physical register; or
9191 - For a def (output), a * reuse* of an input register.
92-
92+
9393- The "kind" of reference to this vreg: Def, Use, Mod. A def
9494 (definition) writes to the vreg, and disregards any possible earlier
9595 value. A mod (modify) reads the current value then writes a new
9696 one. A use simply reads the vreg's value.
97-
97+
9898- The position: before or after the instruction.
9999 - Note that to have a def (output) register available in a way that
100100 does not conflict with inputs, the def should be placed at the
@@ -159,7 +159,7 @@ block parameters must provide values for those parameters via
159159operands. When a branch has more than one successor, it provides
160160separate operands for each possible successor. These block parameters
161161are equivalent to phi-nodes; we chose this representation because they
162- are in many ways a more consistent representation of SSA.
162+ are in many ways a more consistent representation of SSA.
163163
164164To see why we believe block parameters are a slightly nicer design
165165choice than use of phi nodes, consider: phis are special
@@ -176,8 +176,8 @@ reasonable to handle.
176176## Output
177177
178178The allocator produces two main data structures as output: an array of
179- ` Allocation ` s and a sequence of edits. Some other data, such as
180- stackmap slot info, is also provided.
179+ ` Allocation ` s and a sequence of edits. Some other miscellaneous data is also
180+ provided.
181181
182182### Allocations
183183
@@ -229,8 +229,7 @@ The livein and liveout bitsets (`liveins` and `liveouts` on the `Env`)
229229are allocated one per basic block and record, per block, which vregs
230230are live entering and leaving that block. They are computed using a
231231standard backward iterative dataflow analysis and are exact; they do
232- not over-approximate (this turns out to be important for performance,
233- and is also necessary for correctness in the case of stackmaps).
232+ not over-approximate (this turns out to be important for performance).
234233
235234### Blockparam Vectors: Source-Side and Dest-Side
236235
@@ -631,7 +630,7 @@ them all here.
631630 across its entire range. This has the effect of causing bundles to
632631 be more important (more likely to evict others) the more they are
633632 split.
634-
633+
635634- Requirement: a bundle's requirement is a value in a lattice that we
636635 have defined, where top is "Unknown" and bottom is
637636 "Conflict". Between these two, we have: any register (of a class);
@@ -640,7 +639,7 @@ them all here.
640639 different requirements meets to Conflict. Requirements are derived
641640 from the operand constraints for all uses in all liveranges in a
642641 bundle, and then merged with the lattice meet-function.
643-
642+
644643The lattice is as follows (diagram simplified to remove multiple
645644classes and multiple fixed registers which parameterize nodes; any two
646645differently-parameterized values are unordered with respect to each
@@ -1176,13 +1175,13 @@ similarities than the differences.
11761175
11771176* The core abstractions of "liverange", "bundle", "vreg", "preg", and
11781177 "operand" (with policies/constraints) are the same.
1179-
1178+
11801179* The overall allocator pipeline is the same, and the top-level
11811180 structure of each stage should look similar. Both allocators begin
11821181 by computing liveranges, then merging bundles, then handling bundles
11831182 and splitting/evicting as necessary, then doing second-chance
11841183 allocation, then reifying the decisions.
1185-
1184+
11861185* The cost functions are very similar, though the heuristics that make
11871186 decisions based on them are not.
11881187
@@ -1204,33 +1203,33 @@ Several notable high-level differences are:
12041203 and does not depend on scanning the code at all. In general, we
12051204 should be able to state simple invariants and see by inspection (as
12061205 well as fuzzing -- see above) that they hold.
1207-
1206+
12081207* The data structures themselves are simplified. Where IonMonkey uses
12091208 linked lists in many places, this allocator stores simple inline
12101209 smallvecs of liveranges on bundles and vregs, and smallvecs of uses
12111210 on liveranges. We also (i) find a way to construct liveranges
12121211 in-order immediately, without any need for splicing, unlike
12131212 IonMonkey, and (ii) relax sorting invariants where possible to allow
12141213 for cheap append operations in many cases.
1215-
1214+
12161215* The splitting heuristics are significantly reworked. Whereas
12171216 IonMonkey has an all-at-once approach to splitting an entire bundle,
12181217 and has a list of complex heuristics to choose where to split, this
12191218 allocator does conflict-based splitting, and tries to decide whether
12201219 to split or evict and which split to take based on cost heuristics.
1221-
1220+
12221221* The liverange computation is exact, whereas IonMonkey approximates
12231222 using a single-pass algorithm that makes vregs live across entire
12241223 loop bodies. We have found that precise liveness improves allocation
12251224 performance and generated code quality, even though the liveness
12261225 itself is slightly more expensive to compute.
1227-
1226+
12281227* Many of the algorithms in the IonMonkey allocator are built with
12291228 helper functions that do linear scans. These "small quadratic" loops
12301229 are likely not a huge issue in practice, but nevertheless have the
12311230 potential to be in corner cases. As much as possible, all work in
12321231 this allocator is done in linear scans.
1233-
1232+
12341233* There are novel schemes for solving certain interesting design
12351234 challenges. One example: in IonMonkey, liveranges are connected
12361235 across blocks by, when reaching one end of a control-flow edge in a
@@ -1246,7 +1245,7 @@ Several notable high-level differences are:
12461245 for the core regalloc. Ion instead has to tweak its definition of
12471246 minimal bundles and create two liveranges that overlap (!) to
12481247 represent the two uses.
1249-
1248+
12501249* Using block parameters rather than phi-nodes significantly
12511250 simplifies handling of inter-block data movement. IonMonkey had to
12521251 special-case phis in many ways because they are actually quite
@@ -1257,7 +1256,7 @@ Several notable high-level differences are:
12571256* The allocator supports irreducible control flow and arbitrary block
12581257 ordering (its only CFG requirement is that critical edges are
12591258 split).
1260-
1259+
12611260* The allocator supports non-SSA code, and has native support for
12621261 handling program moves specially.
12631262
@@ -1278,7 +1277,7 @@ number of general principles:
12781277 an allocation map for each PReg. This turned out to be significantly
12791278 (!) less efficient than Rust's built-in BTree data structures, for
12801279 the usual cache-efficiency vs. pointer-chasing reasons.
1281-
1280+
12821281* We initially used dense bitvecs, as IonMonkey does, for
12831282 livein/liveout bits. It turned out that a chunked sparse design (see
12841283 below) was much more efficient.
@@ -1302,7 +1301,7 @@ number of general principles:
13021301 append liveranges to in-progress vreg liverange vectors and then
13031302 reverse at the end. The expensive part is a single pass; only the
13041303 bitset computation is a fixpoint loop.
1305-
1304+
13061305* Sorts are better than always-sorted data structures (like btrees):
13071306 they amortize all the comparison and update cost to one phase, and
13081307 this phase is much more cache-friendly than a bunch of spread-out
0 commit comments