Skip to content

Commit 0130fee

Browse files
authored
Fastalloc doc (#198)
Added design doc for fastalloc.
1 parent 24e5c9d commit 0130fee

File tree

3 files changed

+536
-208
lines changed

3 files changed

+536
-208
lines changed

doc/FASTALLOC.md

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
# Fastalloc Design Overview
2+
3+
Fastalloc is a register allocator made specifically for fast
4+
compile times. It's based on the reverse linear scan register
5+
allocation/SSRA algorithm.
6+
This document describes the data structures used and the allocation steps.
7+
8+
# Data Structures
9+
10+
The main data structures that Fastalloc uses to track its state are
11+
described below.
12+
13+
## Current VReg Allocations (`vreg_allocs`)
14+
15+
This is a vector that is used to hold the current allocation for every
16+
VReg during execution.
17+
18+
## VReg Spillslots (`vreg_spillslots`)
19+
20+
Whenever a VReg needs a spillslot, a dedicated slot is allocated for it.
21+
This vector is where all VReg's spillslots are stored.
22+
23+
## Live VRegs (`live_vregs`)
24+
25+
Live VReg information is kept in a `VRegSet`, a doubly linked list
26+
based on a vector. This is used for quick insertion, removal, and
27+
iteration.
28+
29+
## Least Recently Used Caches (`lrus`)
30+
31+
Every register class (int, float, and vector) has its own LRU and they
32+
are stored together in an array: `lrus`. An LRU is represented similarly
33+
to a `VRegSet`: it's a circular, doubly-linked list based on a vector.
34+
35+
The last PReg in an LRU is the least-recently allocated PReg:
36+
37+
most recently used PReg (head) -> 2nd MRU PReg -> ... -> LRU PReg
38+
39+
## Current VReg In PReg Info (`vreg_in_preg`)
40+
41+
During allocation, it's necessary to determine which VReg is in a PReg
42+
to generate the right move(s) for eviction.
43+
`vreg_in_preg` is a vector that stores this information.
44+
45+
## Available PRegs For Use In Instruction (`available_pregs`)
46+
47+
This is a 2-tuple of `PRegSet`s, a bitset of physical registers, one for
48+
the instruction's early phase and one for the late phase.
49+
They are used to determine which registers are available for use in the
50+
early/late phases of an instruction.
51+
52+
Prior to the beginning of any instruction's allocation, this set is reset
53+
to include all allocatable physical registers, some of which may already
54+
contain a VReg.
55+
56+
## VReg Liverange Location Info (`vreg_to_live_inst_range`)
57+
58+
This is a vector of 3-tuples containing the beginning and the end
59+
of all VReg's liveranges, along with an allocation they are guaranteed
60+
to be in throughout that liverange.
61+
This is used to build the debug locations vector after allocation
62+
is complete.
63+
64+
# Allocation Process Breakdown
65+
66+
Allocation proceeds in reverse: from the last block to the first block,
67+
and in each block: from the last instruction to the first instruction.
68+
69+
The allocation for each operand in an instruction can be viewed to happen
70+
in four phases: selection, assignment, eviction, and edit insertion.
71+
72+
## Allocation Phase: Selection
73+
74+
In this phase, a PReg is selected from `available_pregs` for the
75+
operand based on the operand constraints. Depending on the operand's
76+
position the selected PReg is removed from either the early or late
77+
phase or both, indicating that the PReg is no longer available for
78+
allocation by other operands in that phase.
79+
80+
## Allocation Phase: Assignment
81+
82+
In this phase, the selected PReg is set as the allocation for
83+
the operand in the final output.
84+
85+
## Allocation Phase: Eviction
86+
87+
In this phase, the previous VReg in the allocation assigned to
88+
an operand is evicted, if any.
89+
90+
During eviction, a dedicated spillslot is allocated for the evicted
91+
VReg and an edit is inserted after the instruction to move from the
92+
slot to the allocation it's expected to be in after the instruction.
93+
94+
## Allocation Phase: Edit Insertion
95+
96+
In this phase, edits are inserted to ensure that the dataflow from
97+
before the instruction to the selected allocation to after
98+
the instruction remain correct.
99+
100+
# Invariants
101+
102+
Some invariants that remain true throughout execution:
103+
104+
1. During processing, the allocation of a VReg at any point in time
105+
as indicated in `vreg_allocs` changes exactly twice or thrice.
106+
Initially it is set to none. When it's allocated, it is
107+
changed to that allocation. After this, it doesn't change unless
108+
it's evicted or spilled across a block boundary;
109+
if it is, then its current allocation will change to its dedicated
110+
spillslot. After this, it doesn't change again until it's definition
111+
is reached and it's deallocated, during which its `vreg_allocs`
112+
entry is set to none. The only exception is block parameters that
113+
are never used: these are never allocated.
114+
115+
2. A virtual register that outlives the block it was defined in will
116+
be in its dedicated spillslot by the end of the block.
117+
118+
3. At the end of a block, before edits are inserted to move values
119+
from branch arguments to block parameters spillslots, all branch
120+
arguments will be in their dedicated spillslots.
121+
122+
4. At the beginning of a block, all branch parameters and livein
123+
virtual registers will be in their dedicated spillslots.
124+
125+
# Instruction Allocation
126+
127+
To allocate a single instruction, the first step is to reset the
128+
`available_pregs` sets to all allocated PRegs.
129+
130+
Next, the selection phase is carried out for all operands with
131+
fixed register constraints: the registers they are constrained to use are
132+
marked as unavailable in the `available_pregs` set, depending on the
133+
phase that they are valid in. If the operand is an early use or late
134+
def operand, then the register will be marked as unavailable in the
135+
early set or late set, respectively. Otherwise, the PReg is marked
136+
as unavailable in both the early and late sets, because a PReg
137+
assigned to an early def or late use operand cannot be reused by another
138+
operand in the same instruction.
139+
140+
After selection for fixed register operands, the eviction phase is
141+
carried out for fixed register operands. Any VReg in their selected
142+
registers, indicated by `vreg_in_preg`, is evicted: a dedicated
143+
spillslot is allocated for the VReg (if it doesn't have one already),
144+
an edit is inserted to move from the slot to the PReg, which is where
145+
the VReg expected to be after the instruction, and its current
146+
allocation in `vreg_allocs` is set to the spillslot.
147+
148+
Next, all clobbers are removed from the early and late `available_pregs`
149+
sets to avoid allocating a clobber to a def.
150+
151+
Next, the selection, assignment, eviction, and edit insertion phases are
152+
carried out for all def operands. When each def operand's allocation is
153+
complete, the def operands is immediately freed, marking the end of the
154+
VReg's liverange. It is removed from the `live_vregs` set, its allocation
155+
in `vreg_allocs` is set to none, and if it was in a PReg, that PReg's
156+
entry in `vreg_in_preg` is set to none. The selection and eviction phases
157+
are omitted if the operand has a fixed constraint, as those phases have
158+
already been carried out.
159+
160+
Next, the selection, assignment, and eviction phases are carried out for all
161+
use operands. As with def operands, the selection and eviction phases are
162+
omitted if the operand has a fixed constraint, as those phases have already
163+
been carried out.
164+
165+
Then the edit insertion phase is carried out for all use operands.
166+
167+
Lastly, if the instruction being processed is a branch instruction, the
168+
parallel move resolver is used to insert edits before the instruction
169+
to move from the branch arguments spillslots to the block parameter
170+
spillslots.
171+
172+
## Operand Allocation
173+
174+
During the allocation of an operand, a check is first made to
175+
see if the VReg's current allocation as indicated in
176+
`vreg_allocs` is within the operand constraints.
177+
178+
If it is, the assignment phase is carried out, setting the final
179+
allocation output's entry for that operand to the allocation.
180+
The selection phase is carried out, marking the PReg
181+
(if the allocation is a PReg) as unavailable in the respective
182+
early/late sets. The state of the LRUs is also updated to reflect
183+
the new most recently used PReg.
184+
No eviction needs to be done since the VReg is already in the
185+
allocation and no edit insertion needs to be done either.
186+
187+
On the other hand, if the VReg's current allocation is not within
188+
constraints, the selection and eviction phases are carried out for
189+
non-fixed operands. First, a set of PRegs that can be drawn from is
190+
created from `available_pregs`. For early uses and late defs,
191+
this draw-from set is the early set or late set respectively.
192+
For late uses and early defs, the draw-from set is an intersection
193+
of the available early and late sets (because a PReg used for a late
194+
use can't be reassigned to another operand in the early phase;
195+
likewise, a PReg used for an early def can't be reassigned to another
196+
operand in the late phase).
197+
The LRU for the VReg's regclass is then traversed from the end to find
198+
the least-recently used PReg in the draw-from set. Once a PReg is found,
199+
it is marked as the most recently used in the LRU, unavailable in the
200+
`available_pregs` sets, and whatever VReg was in it before is evicted.
201+
202+
The assignment phase is carried out next: the final allocation for the
203+
operand is set to the selected register.
204+
205+
If the newly allocated operand has not been allocated before, that is,
206+
this is the first use/def of the VReg encountered, the VReg is
207+
inserted into `live_vregs` and marked as the value in the allocated
208+
PReg in `vreg_in_preg`.
209+
210+
Otherwise, if the VReg has been allocated before, then an edit will need
211+
to be inserted to ensure that the dataflow remains correct.
212+
The edit insertion phase is now carried out if the operand is a def
213+
operand: an edit is inserted after the instruction to move from the
214+
new allocation to the allocation it's expected to be in after the
215+
instruction.
216+
217+
The edit insertion phase for use operands is done after all operands
218+
have been processed. Edits are inserted to move from the current
219+
allocations in `vreg_allocs` to the final allocated position before
220+
the instruction. This is to account for the possibility of multiple
221+
uses of the same operand in the instruction.
222+
223+
## Reuse Operands
224+
225+
Reuse def operands are handled by creating a new operand identical to the
226+
reuse def, except that its constraints are the constraints of the
227+
reused input and allocating that in its place.
228+
229+
Reused inputs are handled by creating a new operand with a fixed register
230+
constraint to use whatever register was assigned to the reuse def.
231+
232+
Because of the way reuse operands and reused inputs are handled, when
233+
selecting a register for an early use operand with a fixed constraint,
234+
the PReg is also marked as unavailable in the `available_pregs` late
235+
set if the operand is a reused input. And when selecting a register
236+
for reuse def operands, the selected register is marked as unavailable
237+
in the `available_pregs` early set.
238+
239+
## VReg Spillslots
240+
241+
Whenever a VReg needs a spillslot, a suitable one is allocated and
242+
marked as the VReg's dedicated spillslot in `vreg_spillslots`.
243+
If a VReg never needs a spillslot, none is allocated for it.
244+
To ensure that a VReg will always be in its spillslot when expected,
245+
during the processing of a def operand, before it's deallocated,
246+
an edit is inserted to move from its current allocation as indicated
247+
in `vreg_allocs` to its dedicated spillslot, if one is present in
248+
`vreg_spillslots`.
249+
250+
## Branch Instructions
251+
252+
As an invariant, all branch arguments will be in their dedicated
253+
spillslots at the end of the block before edits are inserted to
254+
move from those spillslots to the block parameter spillslots
255+
of the successor blocks.
256+
257+
If a branch argument is already in an allocation that isn't
258+
its spillslot (this could happen if the branch argument is used
259+
as an operand in the same instruction, because all normal
260+
instruction processing is completed before branch-specific
261+
processing), then an edit is inserted
262+
to move from the spillslot to that allocation and its current
263+
allocation in `vreg_allocs` is set to the spillslot.
264+
265+
It's after these edits have been inserted that the parallel move
266+
resolver is then used to generate and insert edits to move from
267+
those spillslots to the spillslots of the block parameters.
268+
269+
# Across Blocks
270+
271+
When a block completes processing, some VRegs will still be live.
272+
These VRegs are either block parameters or livein VRegs.
273+
As an invariant, prior to the first instruction in a block, all
274+
block parameters and livein VRegs will be in their dedicated spillslots.
275+
276+
To maintain this invariant, after a block completes processing, edits
277+
are inserted at the beginning of the block to move from the block
278+
parameter and livein spillslots to the allocation they are expected
279+
to be in from the first instruction.
280+
All block parameters are freed, just like defs, and liveins' current
281+
allocations in `vreg_allocs` are set to their spillslots.
282+
283+
# Edits Order
284+
285+
`regalloc2`'s outward interface guarantees that edits are in
286+
sorted order. Since allocation proceeds in reverse, all edits
287+
are also added in reverse. After all blocks have completed
288+
processing the edits are simply reversed to put it in the
289+
correct order.
290+
291+
One of the reasons why the allocation order proceeds the way it
292+
does is because of this edit-order constraint. All edits that
293+
occur after the instruction must be inserted before all edits
294+
that occur before the instruction.
295+
296+
# Debug Info
297+
298+
After all blocks have completed processing, the debug locations
299+
vector is built.
300+
The information it's built from is assembled from liverange info
301+
that is tracked throughout the allocation.
302+
Whenever a VReg is allocated for the first time, its liverange end
303+
is saved in the VReg's slot in the `vreg_to_live_inst_range`
304+
vector. Whenever a VReg's definition is encountered, its liverange
305+
beginning is saved, too. And the allocation it will be in
306+
throughout that range is also saved alongside.
307+
308+
To determine the allocation the VReg will be in throughout the
309+
liverange, the first invariant is used: the first time a VReg
310+
is allocated, its current allocation in `vreg_allocs` doesn't
311+
change unless its evicted or spilled across block boundaries.
312+
Using this info, if by the time the def of a VReg is allocated,
313+
that VReg has no dedicated spillslot,
314+
that implies that the VReg was never evicted or spilled, so whatever
315+
value its `vreg_allocs` entry says is the location it will be in
316+
throughout its liverange. Otherwise, if it has a spillslot
317+
allocated to it, that implies that the VReg was either evicted
318+
at some point or it was a livein of a predecessor or a block parameter.
319+
Either way, since all spillslots are dedicated to their respective VRegs,
320+
it is safe to record the spillslot as the allocation for the
321+
`vreg_to_live_inst_range` info.

0 commit comments

Comments
 (0)