Skip to content

Commit 5c78ae4

Browse files
authored
Fastalloc Def branch arg on branch & Bug Fixes (#240)
Resolves bytecodealliance/wasmtime#11545 and bytecodealliance/wasmtime#11544. - Add support for any, fixed-reg and stack-only branch arguments being defined in its branch instruction. - Remove over-constraint on reservation of registers for operands with any-reg constraints. Instead, use counters to decide when it is safe to allocate registers to operands with no constraints. - Correct `allocd_within_constraint` to account for allocation of clobbers to late phase registers. - Correct `select_suitable_reg_in_lru` to only allocate available pregs in both late and early phases to early defs and late uses. - Allocate late operands first, followed by early operands, instead of defs then uses. - Remove over-constraint on available registers by removing clobbers only from the late available register set. Previously, operands with any-reg constraints all got their registers reserved as if they were all late uses so as to avoid a situation where a register meant for an operand valid for both early and late phases is allocated to an operand valid only in an early phase or a late phase, but not both, potentially leaving no valid registers for an early & late phase operand. This is an over-constraint that led to this issue bytecodealliance/wasmtime#11544. This is resolved by completely ditching the reservation of any-reg operands in favor of using counters to determine whether or not it is safe to allocate registers to operands with no constraints. Another issue: ``` use v0 fixed(p0), def v1 fixed(p0), use late v0 any ``` In this scenario, p0 is fixed to both v1 and v0, but that shouldn't be a problem because they are in different phases. Prior to this PR, this was problematic because all defs were allocated first, then uses resulting in an allocation order in the above example that looked like this: ``` p0 -> v1 (this is a def, so it's freed and vreg_allocs[v1] is set to none) p0 -> v0 (vreg_allocs[v0] = p0) p0 -> v0 (vreg_allocs[v0] is p0, which is within constraints, so it is selected) ``` Which is incorrect. The root cause is that during allocation, vreg_allocs[vi] tells the current allocation of some register vi - but when the late v0 operand is being allocated, vreg_allocs[vi] tells the allocation of v0 in the early phase of the instruction, not the late phase, and since allocation proceeds in reverse, this is an incorrect order. It should always proceed from the late phase to the early phase. To resolve this, instead of all def operands being allocated first, then use operands, it's the late operands that are allocated first, followed by the early operands. This is still safe because the reason def operands were allocated first is because registers allocated to late def operands can be reused by early use operands, and in this processing order, this order will still remain this same. Fuzzed overnight for 8-9 hours. I also ran Wasmtime's tests. Most pass. The ones that didn't pass didn't seem to fail because of register allocation - for example, the disas test checks against hardcoded output.
1 parent 454cd01 commit 5c78ae4

File tree

4 files changed

+626
-332
lines changed

4 files changed

+626
-332
lines changed

doc/FASTALLOC.md

Lines changed: 156 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -42,21 +42,14 @@ During allocation, it's necessary to determine which VReg is in a PReg
4242
to generate the right move(s) for eviction.
4343
`vreg_in_preg` is a vector that stores this information.
4444

45-
## Available PRegs For Use In Instruction (`available_pregs_for_regs`, `available_pregs_for_any`)
45+
## Available PRegs For Use In Instruction (`available_pregs`)
4646

47-
These are a 2-tuples of `PRegSet`s, a bitset of physical registers, one for
48-
the instruction's early phase and one for the late phase.
49-
They are used to determine which registers are available for use in the
50-
early/late phases of an instruction.
47+
This is a 2-tuple of PRegSets, a bitset of physical registers, one for the
48+
instruction's early phase and one for the late phase. They are used to determine
49+
which registers are available for use in the early/late phases of an instruction.
5150

52-
Prior to the beginning of any instruction's allocation, `available_pregs_for_regs`
53-
is reset to include all allocatable physical registers, some of which may already
54-
contain a VReg.
55-
56-
The two sets have the same function, except that `available_pregs_for_regs` is
57-
used to determine which registers are available for operands with a register-only
58-
constraint while `available_pregs_for_any` is used to determine which registers
59-
are available for operands with no constraints.
51+
Prior to the beginning of any instruction's allocation, this set is reset to
52+
include all allocatable physical registers, some of which may already contain a VReg.
6053

6154
## VReg Liverange Location Info (`vreg_to_live_inst_range`)
6255

@@ -66,6 +59,35 @@ to be in throughout that liverange.
6659
This is used to build the debug locations vector after allocation
6760
is complete.
6861

62+
## Number of Available Registers (`num_available_registers`)
63+
64+
These are counters that keep track of the number of registers that
65+
can be allocated to any-reg and anywhere operands for int, float and
66+
vector registers, in the late, early and both phases of an instruction.
67+
68+
Prior to the beginning of any instruction, this set is reset to
69+
include the number of all allocatable physical registers.
70+
71+
## Number of Any-Reg Operands (`num_any_reg_operands`)
72+
73+
These are counters that keep track of the number of any-reg
74+
operands that are yet to be allocated in an instruction.
75+
76+
It is closely associated with `num_available_registers` and
77+
are used together for the same purpose.
78+
The two counters are used together to avoid allocating too many
79+
registers to anywhere operands when any-reg operands need them.
80+
When register reservations are made, the corresponding number
81+
of available registers in `num_available_registers` are decremented.
82+
When an any-reg operand is allocated, the corresponding
83+
`num_any_reg_operands` is decremented.
84+
The sole purpose of this is so that when anywhere operands are
85+
allocated, a check can be made to see if the available registers
86+
`num_available_registers` are enough to cover the remaining
87+
any-reg operands in the instruction `num_any_reg_operands`,
88+
to determine whether or not it is safe to allocate a register to
89+
the operand instead of a spillslot.
90+
6991
# Allocation Process Breakdown
7092

7193
Allocation proceeds in reverse: from the last block to the first block,
@@ -76,11 +98,11 @@ in four phases: selection, assignment, eviction, and edit insertion.
7698

7799
## Allocation Phase: Selection
78100

79-
In this phase, a PReg is selected from `available_pregs_for_regs` or
80-
`available_pregs_for_any` for the operand based on the operand constraints.
81-
Depending on the operand's position, the selected PReg is removed from either
82-
the early or late phase or both, indicating that the PReg is no longer available
83-
for allocation by other operands in that phase.
101+
In this phase, a PReg is selected from available_pregs for the operand
102+
based on the operand constraints. Depending on the operand's position
103+
the selected PReg is removed from either the early or late phase or both,
104+
indicating that the PReg is no longer available for allocation by other
105+
operands in that phase.
84106

85107
## Allocation Phase: Assignment
86108

@@ -128,114 +150,112 @@ arguments will be in their dedicated spillslots.
128150
4. At the beginning of a block, all branch parameters and livein
129151
virtual registers will be in their dedicated spillslots.
130152

131-
# Instruction Allocation
132-
133-
To allocate a single instruction, the first step is to reset the
134-
`available_pregs_for_regs` sets to all allocatable PRegs.
135-
136-
Next, the selection phase is carried out for all operands with
137-
fixed register constraints: the registers they are constrained to use are
138-
marked as unavailable in the `available_pregs_for_regs` set, depending on the
139-
phase that they are valid in. If the operand is an early use or late
140-
def operand, then the register will be marked as unavailable in the
141-
early set or late set, respectively. Otherwise, the PReg is marked
142-
as unavailable in both the early and late sets, because a PReg
143-
assigned to an early def or late use operand cannot be reused by another
144-
operand in the same instruction.
145-
146-
Next, all clobbers are removed from the early and late `available_pregs_for_regs`
147-
sets to avoid allocating a clobber to a def.
148-
149-
Next, registers are reserved for register-only operands and marked as
150-
unavailable in `available_pregs_for_regs`.
151-
Then `available_pregs_for_any` for the instruction is derived from
152-
`available_pregs_for_regs` by marking all other registers not reserved as
153-
available. This is to avoid a situation where operands with no
154-
constraints take up all available registers, leaving none for operands
155-
with register-only constraints.
156-
157-
After selection for register-only operands, the eviction phase is
158-
carried out for fixed register operands. Any VReg in their selected
159-
registers, indicated by `vreg_in_preg`, is evicted: a dedicated
160-
spillslot is allocated for the VReg (if it doesn't have one already),
161-
an edit is inserted to move from the slot to the PReg, which is where
162-
the VReg expected to be after the instruction, and its current
163-
allocation in `vreg_allocs` is set to the spillslot.
164-
The same is then done for clobbers, then register-only operands.
165-
166-
Next, the selection, assignment, eviction, and edit insertion phases are
167-
carried out for all def operands. When each def operand's allocation is
168-
complete, the def operand is immediately freed, marking the end of the
169-
VReg's liverange. It is removed from the `live_vregs` set, its allocation
170-
in `vreg_allocs` is set to none, and if it was in a PReg, that PReg's
171-
entry in `vreg_in_preg` is set to none. The selection and eviction phases
172-
are omitted if the operand has a fixed constraint, as those phases have
173-
already been carried out.
174-
175-
Next, the selection, assignment, and eviction phases are carried out for all
176-
use operands. As with def operands, the selection and eviction phases are
177-
omitted if the operand has a fixed constraint, as those phases have already
178-
been carried out.
153+
There is an exception to invariant 2 and 3: if a branch instruction defines
154+
the VReg used as a branch arg, then there may be no opportunity for
155+
the VReg to be placed in its spillslot.
179156

180-
Then the edit insertion phase is carried out for all use operands.
157+
# Instruction Allocation
181158

182-
Lastly, if the instruction being processed is a branch instruction, the
183-
parallel move resolver is used to insert edits before the instruction
184-
to move from the branch arguments spillslots to the block parameter
185-
spillslots.
159+
To allocate a single instruction, the first step is to reset the
160+
`available_pregs` sets to all allocatable PRegs.
161+
162+
Next, the selection phase is carried out for all operands with
163+
fixed register constraints: the registers they are constrained
164+
to use are marked as unavailable in the `available_pregs` set,
165+
depending on the phase that they are valid in. If the operand
166+
is an early use or late def operand, then the register will be
167+
marked as unavailable in the early set or late set, respectively.
168+
Otherwise, the PReg is marked as unavailable in both the early
169+
and late sets, because a PReg assigned to an early def or late
170+
use operand cannot be reused by another operand in the same instruction.
171+
172+
After selection for fixed register operands, the eviction phase
173+
is carried out for fixed register operands. Any VReg in their
174+
selected registers, indicated by vreg_in_preg, is evicted: a
175+
dedicated spillslot is allocated for the VReg (if it doesn't
176+
have one already), an edit is inserted to move from the slot to
177+
the PReg, which is where the VReg expected to be after the instruction,
178+
and its current allocation in vreg_allocs is set to the spillslot.
179+
180+
Next, all clobbers are removed from the late `available_pregs` set
181+
to avoid allocating a clobber to a late operand.
182+
183+
Next, the selection, assignment, eviction, and edit insertion
184+
phases are carried out for all late operands, both defs and uses.
185+
Then the early operands are processed in the same manner, after the
186+
late operands.
187+
188+
In both late and early processing, when a def operand's
189+
allocation is complete, the def operand is immediately freed,
190+
marking the end of the VReg's liverange. It is removed from the
191+
`live_vregs` set, its allocation in `vreg_allocs` is set to none,
192+
and if it was in a PReg, that PReg's entry in `vreg_in_preg` is
193+
set to none. The selection and eviction phases are omitted if the
194+
operand has a fixed constraint, as those phases have already been
195+
carried out.
196+
197+
When a use operand is processed, the selection, assignment, and eviction
198+
phases only are carried out. As with def operands, the selection and
199+
eviction phases are omitted if the operand has a fixed constraint, as
200+
those phases have already been carried out.
201+
202+
After the late and early operands have completed processing,
203+
the edit insertion phase is carried out for all use operands.
204+
205+
Lastly, if the instruction being processed is a branch instruction,
206+
the parallel move resolver is used to insert edits before the instruction
207+
to move from the branch arguments spillslots to the block parameter spillslots.
186208

187209
## Operand Allocation
188210

189211
During the allocation of an operand, a check is first made to
190212
see if the VReg's current allocation as indicated in
191213
`vreg_allocs` is within the operand constraints.
192214

193-
If it is, the assignment phase is carried out, setting the final
194-
allocation output's entry for that operand to the allocation.
195-
The selection phase is carried out, marking the PReg
196-
(if the allocation is a PReg) as unavailable in the respective
197-
early/late sets. The state of the LRUs is also updated to reflect
198-
the new most recently used PReg.
199-
No eviction needs to be done since the VReg is already in the
200-
allocation and no edit insertion needs to be done either.
201-
202-
On the other hand, if the VReg's current allocation is not within
203-
constraints, the selection and eviction phases are carried out for
204-
non-fixed operands. First, a set of PRegs that can be drawn from is
205-
created from `available_pregs_for_regs` or `available_pregs_for_any`,
206-
depending on whether the operand has a register-only constraint
207-
or no constraint. For early uses and late defs,
208-
this draw-from set is the early set or late set, respectively.
209-
For late uses and early defs, the draw-from set is an intersection
210-
of the available early and late sets (because a PReg used for a late
211-
use can't be reassigned to another operand in the early phase;
212-
likewise, a PReg used for an early def can't be reassigned to another
213-
operand in the late phase).
214-
The LRU for the VReg's regclass is then traversed from the end to find
215-
the least recently used PReg in the draw-from set. Once a PReg is found,
216-
it is marked as the most recently used in the LRU, unavailable in both
217-
available pregs sets, and whatever VReg was in it before is evicted.
218-
219-
The assignment phase is carried out next. The final allocation for the
215+
If it is, the assignment phase is carried out, setting the
216+
final allocation output's entry for that operand to the allocation.
217+
The selection phase is carried out, marking the PReg (if the
218+
allocation is a PReg) as unavailable in the respective early/late
219+
sets. The state of the LRUs is also updated to reflect the new
220+
most recently used PReg. No eviction needs to be done since the
221+
VReg is already in the allocation and no edit insertion needs to
222+
be done either.
223+
224+
On the other hand, if the VReg's current allocation is not within
225+
constraints, the selection and eviction phases are carried out
226+
for non-fixed operands. First, a set of PRegs that can be drawn
227+
from is created from `available_pregs`. For early uses and late
228+
defs, this draw-from set is the early set or late set, respectively.
229+
For late uses and early defs, the draw-from set is an intersection
230+
of the available early and late sets (because a PReg used for a
231+
late use can't be reassigned to another operand in the early phase;
232+
likewise, a PReg used for an early def can't be reassigned to another
233+
operand in the late phase). The LRU for the VReg's regclass is then
234+
traversed from the end to find the least recently used PReg in the
235+
draw-from set. Once a PReg is found, it is marked as the most recently
236+
used in the LRU, unavailable in the `available_pregs` sets, and whatever
237+
VReg was in it before is evicted.
238+
239+
The assignment phase is carried out next. The final allocation for the
220240
operand is set to the selected register.
221241

222-
If the newly allocated operand has not been allocated before, that is,
223-
this is the first use/def of the VReg encountered; the VReg is
224-
inserted into `live_vregs` and marked as the value in the allocated
225-
PReg in `vreg_in_preg`.
242+
If the newly allocated operand has not been allocated before,
243+
that is, this is the first use/def of the VReg encountered;
244+
the VReg is inserted into live_vregs and marked as the value
245+
in the allocated PReg in vreg_in_preg.
226246

227-
Otherwise, if the VReg has been allocated before, then an edit will need
228-
to be inserted to ensure that the dataflow remains correct.
229-
The edit insertion phase is now carried out if the operand is a def
230-
operand: an edit is inserted after the instruction to move from the
231-
new allocation to the allocation it's expected to be in after the
232-
instruction.
247+
Otherwise, if the VReg has been allocated before, then an edit
248+
will need to be inserted to ensure that the dataflow remains correct.
249+
The edit insertion phase is now carried out if the operand is a
250+
def operand: an edit is inserted after the instruction to move
251+
from the new allocation to the allocation it's expected to be
252+
in after the instruction.
233253

234-
The edit insertion phase for use operands is done after all operands
235-
have been processed. Edits are inserted to move from the current
236-
allocations in `vreg_allocs` to the final allocated position before
237-
the instruction. This is to account for the possibility of multiple
238-
uses of the same operand in the instruction.
254+
The edit insertion phase for use operands is done after all
255+
operands have been processed. Edits are inserted to move from
256+
the current allocations in `vreg_allocs` to the final allocated
257+
position before the instruction. This is to account for the
258+
possibility of multiple uses of the same operand in the instruction.
239259

240260
## Reuse Operands
241261

@@ -283,6 +303,15 @@ It's after these edits have been inserted that the parallel move
283303
resolver is then used to generate and insert edits to move from
284304
those spillslots to the spillslots of the block parameters.
285305

306+
There is an exception to the invariant - it's possible that the
307+
branch argument is defined in the same branch instruction.
308+
If the branch argument VReg has a fixed-reg constraint, the move
309+
will have to be done in the successor.
310+
If it has an stack or anywhere constraint, it is allocated directly
311+
into the block param's spillslot, so there is no need to insert moves.
312+
The other constraints, reuse and any-reg, are not supported in this
313+
case.
314+
286315
# Across Blocks
287316

288317
When a block completes processing, some VRegs will still be live.
@@ -297,6 +326,20 @@ to be in from the first instruction.
297326
All block parameters are freed, just like defs, and liveins' current
298327
allocations in `vreg_allocs` are set to their spillslots.
299328

329+
Any block parameter that receives a branch argument from a predecessor
330+
where the argument VReg was defined in the branch instruction will
331+
also need moves inserted at the block beginning because the predecessor
332+
couldn't have inserted the required moves.
333+
All predecessors branch arguments to the block are checked to see if any
334+
are defined in the same branch instruction. For all branch arguments that
335+
are defined in the branch instruction and have fixed-reg constraints, a
336+
move will be inserted from the fixed-reg to the block param's spillslot
337+
at the beginning of the block. In the case of stack and anywhere constraints,
338+
nothing is done, because in that case, the VRegs used as the branch arguments
339+
will be defined directly into the block param's spillslot. Reuse and any-reg
340+
constraints are not supported and aren't handled.
341+
342+
300343
# Edits Order
301344

302345
`regalloc2`'s outward interface guarantees that edits are in

src/fastalloc/iter.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
use crate::{Operand, OperandConstraint, OperandKind};
1+
use crate::{Operand, OperandConstraint, OperandKind, OperandPos};
22

33
pub struct Operands<'a>(pub &'a [Operand]);
44

@@ -37,6 +37,14 @@ impl<'a> Operands<'a> {
3737
pub fn any_reg(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
3838
self.matches(|op| matches!(op.constraint(), OperandConstraint::Reg))
3939
}
40+
41+
pub fn late(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
42+
self.matches(|op| op.pos() == OperandPos::Late)
43+
}
44+
45+
pub fn early(&self) -> impl Iterator<Item = (usize, Operand)> + 'a {
46+
self.matches(|op| op.pos() == OperandPos::Early)
47+
}
4048
}
4149

4250
impl<'a> core::ops::Index<usize> for Operands<'a> {

src/fastalloc/lru.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,8 @@ pub struct PartedByRegClass<T> {
272272
pub items: [T; 3],
273273
}
274274

275+
impl<T: Copy> Copy for PartedByRegClass<T> {}
276+
275277
impl<T> Index<RegClass> for PartedByRegClass<T> {
276278
type Output = T;
277279

@@ -286,6 +288,12 @@ impl<T> IndexMut<RegClass> for PartedByRegClass<T> {
286288
}
287289
}
288290

291+
impl<T: PartialEq> PartialEq for PartedByRegClass<T> {
292+
fn eq(&self, other: &Self) -> bool {
293+
self.items.eq(&other.items)
294+
}
295+
}
296+
289297
/// Least-recently-used caches for register classes Int, Float, and Vector, respectively.
290298
pub type Lrus = PartedByRegClass<Lru>;
291299

0 commit comments

Comments
 (0)