Skip to content

Commit 5f5f7d2

Browse files
committed
elaborate on choosing N
1 parent 9dc26ab commit 5f5f7d2

File tree

1 file changed

+83
-30
lines changed

1 file changed

+83
-30
lines changed

text/3838-repr-scalable.md

Lines changed: 83 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@
77
[summary]: #summary
88

99
Extends Rust's existing SIMD infrastructure, `#[repr(simd)]`, with a
10-
complementary scalable representation, `#[repr(scalable)]`, to support scalable
11-
vector types, such as Arm's Scalable Vector Extension (SVE), or RISC-V's Vector
12-
Extension (RVV).
10+
complementary scalable representation, `#[repr(scalable(N)]`, to support
11+
scalable vector types, such as Arm's Scalable Vector Extension (SVE), or
12+
RISC-V's Vector Extension (RVV).
1313

14-
Like the existing `repr(simd)` representation, `repr(scalable)` is internal
14+
Like the existing `repr(simd)` representation, `repr(scalable(N))` is internal
1515
compiler infrastructure that will be used only in the standard library to
1616
introduce scalable vector types which can then be stablised. Only the
1717
infrastructure to define these types are introduced in this RFC, not the types
@@ -39,7 +39,7 @@ Instead of releasing more extensions with ever increasing register bit widths,
3939
AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has
4040
a Vector Extension (RVV). These extensions have vector registers whose width
4141
depends on the CPU implementation and bit-width-agnostic intrinsics for
42-
operating on these registers. By using scalale vectors, code won't need to be
42+
operating on these registers. By using scalable vectors, code won't need to be
4343
re-written using new architecture extensions with larger registers, new types
4444
and intrinsics, but instead will work on newer processors with different vector
4545
register lengths and performance characteristics.
@@ -132,9 +132,9 @@ Types annotated with the `#[repr(simd)]` attribute contains either an array
132132
field or multiple fields to indicate the intended size of the SIMD vector that
133133
the type represents.
134134

135-
Similarly, a `scalable` repr is introduced to define a scalable vector type.
136-
`scalable` accepts an integer to determine the minimum number of elements the
137-
vector contains. For example:
135+
Similarly, a `scalable(N)` representation is introduced to define a scalable
136+
vector type. `scalable(N)` accepts an integer to determine the minimum number of
137+
elements the vector contains. For example:
138138

139139
```rust
140140
#[repr(simd, scalable(4))]
@@ -144,6 +144,37 @@ pub struct svfloat32_t { _ty: [f32], }
144144
As with the existing `repr(simd)`, `_ty` is purely a type marker, used to get
145145
the element type for the codegen backend.
146146

147+
`svfloat32_t` is a scalable vector with a minimum of four `f32` elements and
148+
potentially more depending on the length of the vector register at runtime.
149+
150+
## Choosing `N`
151+
[choosing-n]: #choosing-n
152+
153+
Many intrinsics using scalable vectors accept both a predicate vector argument
154+
and data vector arguments. Predicate vectors determine whether a lane is on or
155+
off for the operation performed by any given intrinsic. Predicate vectors may
156+
use different registers of sizes to the vectors containing data.
157+
`repr(scalable)` is used to define vectors containing both data and predicates.
158+
159+
As `repr(scalable(N))` is intended to be a permanently unstable attribute, any
160+
value of `N` is accepted by the attribute and it is the responsibility of
161+
whomever is defining the type to provide a valid value. A correct value for `N`
162+
depends on the purpose of the specific scalable vector type and the
163+
architecture.
164+
165+
For example, with SVE, the scalable vector register length is a minimum of 128
166+
bits, must be a multiple of 128 bits and a power of 2; and predicate registers
167+
have one bit for each byte in the vector registers. So, for `svfloat32_t`
168+
defined shown above, an `f32` is 32-bits and with `N=4`, the entire minimum
169+
register length of 128 bits is used (4 x 32 = 128). An intrinsic that takes a
170+
`svfloat32_t` may also want to accept as an argument a predicate vector with a
171+
matching four elements (`N=4`), which would only use 4 bits of the predicate
172+
register rather than the full 16 bits.
173+
174+
See
175+
[*Manually-chosen or compiler-calculated element count*][manual-or-calculated-element-count]
176+
for a discussion on why `N` is not calculated by the compiler.
177+
147178
## Properties of scalable vectors
148179
[properties-of-scalable-vector-types]: #properties-of-scalable-vectors
149180

@@ -217,28 +248,17 @@ Most of the complexity of SVE is handled by LLVM: lowering Rust's scalable
217248
vectors to the correct type in LLVM and the `vscale` modifier that is applied to
218249
LLVM's vector types.
219250

220-
LLVM's scalable vector type is of the form `<vscale x elements x type>`:
221-
222-
- `elements` multiplied by `size_of::<$ty>` gives the smallest allowed register
223-
size and the increment size
224-
- `vscale` is a runtime constant that is used to determine the actual vector
225-
register size
251+
LLVM's scalable vector type is of the form `<vscale x element_count x type>`.
252+
`vscale` is the scaling factor determined by the hardware at runtime, it can be
253+
any value providing it gives a legal vector register size for the architecture.
226254

227-
For example, with SVE, the scalable vector register (`Z` register) size has to
228-
be a multiple of 128 bits and a power of 2. Only the value of `elements` can be
229-
chosen by compiler. For `f32`, `elements` must always be four, as with the
230-
minimum `vscale` of one, `1 * 4 * sizeof(f32)` is the 128-bit minimum register
231-
size.
255+
For example, a `<vscale x 4 x f32>` is a scalable vector with a minimum of four
256+
`f32` elements and with SVE, `vscale` could then be any power of two which would
257+
result in register sizes of 128, 256, 512, 1024 or 2048 and 4, 8, 16, 32, or 64
258+
`f32` elements respectively.
232259

233-
At runtime `vscale` could then be any power of two which would result in
234-
register sizes of 128, 256, 512, 1024 and 2048. `vscale` could be any value
235-
providing it gives a legal vector register size for the architecture.
236-
237-
`repr(scalable)` expects the number of `elements` to be provided rather than
238-
calculating it. This avoids needing to teach the compiler how to calculate the
239-
required `element` count, particularly as some of these scalable types can have
240-
different element counts. For instance, the predicates used in SVE have
241-
different element counts depending on the types they are a predicate for.
260+
The `N` in the `#[repr(scalable(N))]` determines the `element_count` used in the
261+
LLVM type for a scalable vector.
242262

243263
While it is possible to change the vector length at runtime using a
244264
[`prctl()`][prctl] call to the kernel, this would require that `vscale` change,
@@ -250,8 +270,8 @@ behaviour, consistent with C and C++.
250270
# Drawbacks
251271
[drawbacks]: #drawbacks
252272

253-
- `repr(scalable)` is inherently additional complexity to the language, despite
254-
being largely hidden from users.
273+
- `repr(scalable(N))` is inherently additional complexity to the language,
274+
despite being largely hidden from users.
255275

256276
# Rationale and alternatives
257277
[rationale-and-alternatives]: #rationale-and-alternatives
@@ -267,6 +287,39 @@ By aligning with the approach taken by C (discussed in the
267287
[*Prior art*][prior-art] below), most of the documentation that already exists
268288
for scalable vector intrinsics in C should still be applicable to Rust.
269289

290+
## Manually-chosen or compiler-calculated element count
291+
[manual-or-calculated-element-count]: #manually-chosen-or-compiler-calculated-element-count
292+
293+
`repr(scalable(N))` expects `N` to be provided rather than calculating it. This
294+
avoids needing to teach the compiler how to calculate the required `element`
295+
count, which isn't always trivial.
296+
297+
Many of the intrinsics which accept scalable vectors as an argument also accept
298+
a predicate vector. Predicate vectors decide which lanes are on or off for an
299+
operation (i.e. which elements in the vector are operated on). Predicate vectors
300+
can be in different and smaller registers than the data. For example,
301+
`<vscale x 16 x i1>` could be the predicate vector for a `<vscale x 16 x u8>`
302+
vector
303+
304+
For non-predicate scalable vectors, it will be typical that `N` will be
305+
`$minimum_register_length / $type_size` (e.g. `4` for `f32` or `8` for `f16`
306+
with a minimum 128-bit register length). In this circumstance, `N` could be
307+
trivially calculated by the compiler.
308+
309+
For predicate vectors, it is desirable to be able to to define types where `N`
310+
matches the number of elements in the non-predicate vector, i.e. a
311+
`<vscale x 4 x i1>` to match a `<vscale x 4 x f32>`, `<vscale x 8 x i1>` to
312+
match `<vscale x 8 x u16>`, or `<vscale x 16 x i1>` to match
313+
`<vscale x 16 x u8>`. In this circumstance, it might still be possible to give
314+
rustc all of the relevant information such that it could compute `N`, but it
315+
would add extra complexity.
316+
317+
This RFC takes the position that the additional complexity required to have the
318+
compiler always be able to calculate `N` isn't justified given the permanently
319+
unstable nature of the `repr(scalable(N))` attribute and the scalable vector
320+
types defined in `std::arch` are likely to be few in number, automatically
321+
generated and well-tested.
322+
270323
# Prior art
271324
[prior-art]: #prior-art
272325

0 commit comments

Comments
 (0)