7
7
[ summary ] : #summary
8
8
9
9
Extends Rust's existing SIMD infrastructure, ` #[repr(simd)] ` , with a
10
- complementary scalable representation, ` #[repr(scalable)] ` , to support scalable
11
- vector types, such as Arm's Scalable Vector Extension (SVE), or RISC-V's Vector
12
- Extension (RVV).
10
+ complementary scalable representation, ` #[repr(scalable(N )] ` , to support
11
+ scalable vector types, such as Arm's Scalable Vector Extension (SVE), or
12
+ RISC-V's Vector Extension (RVV).
13
13
14
- Like the existing ` repr(simd) ` representation, ` repr(scalable) ` is internal
14
+ Like the existing ` repr(simd) ` representation, ` repr(scalable(N) ) ` is internal
15
15
compiler infrastructure that will be used only in the standard library to
16
16
introduce scalable vector types which can then be stablised. Only the
17
17
infrastructure to define these types are introduced in this RFC, not the types
@@ -39,7 +39,7 @@ Instead of releasing more extensions with ever increasing register bit widths,
39
39
AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has
40
40
a Vector Extension (RVV). These extensions have vector registers whose width
41
41
depends on the CPU implementation and bit-width-agnostic intrinsics for
42
- operating on these registers. By using scalale vectors, code won't need to be
42
+ operating on these registers. By using scalable vectors, code won't need to be
43
43
re-written using new architecture extensions with larger registers, new types
44
44
and intrinsics, but instead will work on newer processors with different vector
45
45
register lengths and performance characteristics.
@@ -132,9 +132,9 @@ Types annotated with the `#[repr(simd)]` attribute contains either an array
132
132
field or multiple fields to indicate the intended size of the SIMD vector that
133
133
the type represents.
134
134
135
- Similarly, a ` scalable ` repr is introduced to define a scalable vector type.
136
- ` scalable ` accepts an integer to determine the minimum number of elements the
137
- vector contains. For example:
135
+ Similarly, a ` scalable(N) ` representation is introduced to define a scalable
136
+ vector type. ` scalable(N) ` accepts an integer to determine the minimum number of
137
+ elements the vector contains. For example:
138
138
139
139
``` rust
140
140
#[repr(simd, scalable(4))]
@@ -144,6 +144,37 @@ pub struct svfloat32_t { _ty: [f32], }
144
144
As with the existing ` repr(simd) ` , ` _ty ` is purely a type marker, used to get
145
145
the element type for the codegen backend.
146
146
147
+ ` svfloat32_t ` is a scalable vector with a minimum of four ` f32 ` elements and
148
+ potentially more depending on the length of the vector register at runtime.
149
+
150
+ ## Choosing ` N `
151
+ [ choosing-n ] : #choosing-n
152
+
153
+ Many intrinsics using scalable vectors accept both a predicate vector argument
154
+ and data vector arguments. Predicate vectors determine whether a lane is on or
155
+ off for the operation performed by any given intrinsic. Predicate vectors may
156
+ use different registers of sizes to the vectors containing data.
157
+ ` repr(scalable) ` is used to define vectors containing both data and predicates.
158
+
159
+ As ` repr(scalable(N)) ` is intended to be a permanently unstable attribute, any
160
+ value of ` N ` is accepted by the attribute and it is the responsibility of
161
+ whomever is defining the type to provide a valid value. A correct value for ` N `
162
+ depends on the purpose of the specific scalable vector type and the
163
+ architecture.
164
+
165
+ For example, with SVE, the scalable vector register length is a minimum of 128
166
+ bits, must be a multiple of 128 bits and a power of 2; and predicate registers
167
+ have one bit for each byte in the vector registers. So, for ` svfloat32_t `
168
+ defined shown above, an ` f32 ` is 32-bits and with ` N=4 ` , the entire minimum
169
+ register length of 128 bits is used (4 x 32 = 128). An intrinsic that takes a
170
+ ` svfloat32_t ` may also want to accept as an argument a predicate vector with a
171
+ matching four elements (` N=4 ` ), which would only use 4 bits of the predicate
172
+ register rather than the full 16 bits.
173
+
174
+ See
175
+ [ * Manually-chosen or compiler-calculated element count* ] [ manual-or-calculated-element-count ]
176
+ for a discussion on why ` N ` is not calculated by the compiler.
177
+
147
178
## Properties of scalable vectors
148
179
[ properties-of-scalable-vector-types ] : #properties-of-scalable-vectors
149
180
@@ -217,28 +248,17 @@ Most of the complexity of SVE is handled by LLVM: lowering Rust's scalable
217
248
vectors to the correct type in LLVM and the ` vscale ` modifier that is applied to
218
249
LLVM's vector types.
219
250
220
- LLVM's scalable vector type is of the form ` <vscale x elements x type> ` :
221
-
222
- - ` elements ` multiplied by ` size_of::<$ty> ` gives the smallest allowed register
223
- size and the increment size
224
- - ` vscale ` is a runtime constant that is used to determine the actual vector
225
- register size
251
+ LLVM's scalable vector type is of the form ` <vscale x element_count x type> ` .
252
+ ` vscale ` is the scaling factor determined by the hardware at runtime, it can be
253
+ any value providing it gives a legal vector register size for the architecture.
226
254
227
- For example, with SVE, the scalable vector register (` Z ` register) size has to
228
- be a multiple of 128 bits and a power of 2. Only the value of ` elements ` can be
229
- chosen by compiler. For ` f32 ` , ` elements ` must always be four, as with the
230
- minimum ` vscale ` of one, ` 1 * 4 * sizeof(f32) ` is the 128-bit minimum register
231
- size.
255
+ For example, a ` <vscale x 4 x f32> ` is a scalable vector with a minimum of four
256
+ ` f32 ` elements and with SVE, ` vscale ` could then be any power of two which would
257
+ result in register sizes of 128, 256, 512, 1024 or 2048 and 4, 8, 16, 32, or 64
258
+ ` f32 ` elements respectively.
232
259
233
- At runtime ` vscale ` could then be any power of two which would result in
234
- register sizes of 128, 256, 512, 1024 and 2048. ` vscale ` could be any value
235
- providing it gives a legal vector register size for the architecture.
236
-
237
- ` repr(scalable) ` expects the number of ` elements ` to be provided rather than
238
- calculating it. This avoids needing to teach the compiler how to calculate the
239
- required ` element ` count, particularly as some of these scalable types can have
240
- different element counts. For instance, the predicates used in SVE have
241
- different element counts depending on the types they are a predicate for.
260
+ The ` N ` in the ` #[repr(scalable(N))] ` determines the ` element_count ` used in the
261
+ LLVM type for a scalable vector.
242
262
243
263
While it is possible to change the vector length at runtime using a
244
264
[ ` prctl() ` ] [ prctl ] call to the kernel, this would require that ` vscale ` change,
@@ -250,8 +270,8 @@ behaviour, consistent with C and C++.
250
270
# Drawbacks
251
271
[ drawbacks ] : #drawbacks
252
272
253
- - ` repr(scalable) ` is inherently additional complexity to the language, despite
254
- being largely hidden from users.
273
+ - ` repr(scalable(N)) ` is inherently additional complexity to the language,
274
+ despite being largely hidden from users.
255
275
256
276
# Rationale and alternatives
257
277
[ rationale-and-alternatives ] : #rationale-and-alternatives
@@ -267,6 +287,39 @@ By aligning with the approach taken by C (discussed in the
267
287
[ * Prior art* ] [ prior-art ] below), most of the documentation that already exists
268
288
for scalable vector intrinsics in C should still be applicable to Rust.
269
289
290
+ ## Manually-chosen or compiler-calculated element count
291
+ [ manual-or-calculated-element-count ] : #manually-chosen-or-compiler-calculated-element-count
292
+
293
+ ` repr(scalable(N)) ` expects ` N ` to be provided rather than calculating it. This
294
+ avoids needing to teach the compiler how to calculate the required ` element `
295
+ count, which isn't always trivial.
296
+
297
+ Many of the intrinsics which accept scalable vectors as an argument also accept
298
+ a predicate vector. Predicate vectors decide which lanes are on or off for an
299
+ operation (i.e. which elements in the vector are operated on). Predicate vectors
300
+ can be in different and smaller registers than the data. For example,
301
+ ` <vscale x 16 x i1> ` could be the predicate vector for a ` <vscale x 16 x u8> `
302
+ vector
303
+
304
+ For non-predicate scalable vectors, it will be typical that ` N ` will be
305
+ ` $minimum_register_length / $type_size ` (e.g. ` 4 ` for ` f32 ` or ` 8 ` for ` f16 `
306
+ with a minimum 128-bit register length). In this circumstance, ` N ` could be
307
+ trivially calculated by the compiler.
308
+
309
+ For predicate vectors, it is desirable to be able to to define types where ` N `
310
+ matches the number of elements in the non-predicate vector, i.e. a
311
+ ` <vscale x 4 x i1> ` to match a ` <vscale x 4 x f32> ` , ` <vscale x 8 x i1> ` to
312
+ match ` <vscale x 8 x u16> ` , or ` <vscale x 16 x i1> ` to match
313
+ ` <vscale x 16 x u8> ` . In this circumstance, it might still be possible to give
314
+ rustc all of the relevant information such that it could compute ` N ` , but it
315
+ would add extra complexity.
316
+
317
+ This RFC takes the position that the additional complexity required to have the
318
+ compiler always be able to calculate ` N ` isn't justified given the permanently
319
+ unstable nature of the ` repr(scalable(N)) ` attribute and the scalable vector
320
+ types defined in ` std::arch ` are likely to be few in number, automatically
321
+ generated and well-tested.
322
+
270
323
# Prior art
271
324
[ prior-art ] : #prior-art
272
325
0 commit comments