Skip to content

Commit 93964b6

Browse files
README.md
1 parent 8da4fc0 commit 93964b6

File tree

2 files changed

+120
-39
lines changed

2 files changed

+120
-39
lines changed

testable-simd-models/README.md

Lines changed: 119 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Each such folder has 3 sub-folders, `models`, `tests`, and `specs`.
1212

1313
The `models` folder contains the models of the intrinsics, with a file
1414
corresponding to different target features, and are written using the
15-
various abstractions implementedin `crate::abstractions`, especially
15+
various abstractions implemented in `crate::abstractions`, especially
1616
those in `crate::abstractions::simd`. These models are meant to
1717
closely resemble their implementations within the Rust core itself.
1818

@@ -25,9 +25,111 @@ outputs.
2525

2626
The tests can run by executing `cargo test`.
2727

28-
## Modeling Process
29-
The process of adding a specific intrinsic's model goes as follows.
30-
For this example, let us say the intrinsic we are adding is
28+
## Modeling a SIMD Intrinsic
29+
30+
There are three kinds of SIMD intrinsics we find in `core::arch`.
31+
32+
The first kind are builtin Rust compiler intrinsics, some of which are
33+
in the [`intrinsics/simd.rs` file](https://github.com/model-checking/verify-rust-std/blob/main/library/core/src/intrinsics/simd.rs)
34+
in the `core` crate, and others are in the [`simd.rs` file of `core_arch`](https://github.com/model-checking/verify-rust-std/blob/main/library/stdarch/crates/core_arch/src/simd.rs).
35+
These builtin intrinsics define generic SIMD operations that the Rust compiler knows how to implement on each platform.
36+
37+
The second kind are `extern` intrinsics that are links to definitions in LLVM.
38+
See, for example, [this list](https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/avx2.rs#L3596C8-L3596C14)
39+
of `extern` intrinsics used in the Intel x86 AVX2 library.
40+
These extern intrinsics are typically platform-specific functions that map to low-level instructions.
41+
42+
The third kind are `defined` intrinsics that are given proper definitions in Rust, and their code may
43+
depend on the builtin intrinsics or the extern intrinsics. There defined intrinsics represent higher-level
44+
operations that are wrappers around one or more assembly instructions.
45+
46+
### Modeling builtin intrinsics manually
47+
48+
We model all three kinds of intrinsics, but in slightly different
49+
ways. For the builtin intrinsics, we can write implementations once
50+
and for all, and to this end, we use a library within the
51+
`abstractions/simd.rs` file, where we copy the signatures of the
52+
intrinsics from Rust but give them our own implementation. In
53+
particular, we model each SIMD vector as an array of scalars, and
54+
define each generic operation as functions over such arrays. This can
55+
be seen as a reference implementation of the builtin intrinsics.
56+
57+
Hence, for example, the SIMD add intrinsic `simd_add` is modeled as follows,
58+
it takes two arrays of machine integers and adds them pointwise using a
59+
`wrapping_add` operation:
60+
61+
```rust
62+
pub fn simd_add<const N: u64, T: MachineInteger + Copy>(
63+
x: FunArray<N, T>,
64+
y: FunArray<N, T>,
65+
) -> FunArray<N, T> {
66+
FunArray::from_fn(|i| (x[i].wrapping_add(y[i])))
67+
}
68+
```
69+
70+
Notably, we model a strongly typed version of `simd_add`, in contrast to the compiler
71+
intrinsic which is too generic and unimplementable in safe Rust:
72+
73+
```rust
74+
/// Adds two simd vectors elementwise.
75+
///
76+
/// `T` must be a vector of integers or floats.
77+
#[rustc_intrinsic]
78+
#[rustc_nounwind]
79+
pub unsafe fn simd_add<T>(x: T, y: T) -> T;
80+
```
81+
82+
The main rules for writing these models is that they should be simple and self-contained,
83+
relying only on the libraries in `abstractions` or on builtin Rust language features or on
84+
other testable models. In particular, they should not themselves directly call Rust libraries
85+
or external crates, without going through the abstractions API.
86+
87+
88+
### Modeling extern intrinsics manually
89+
90+
For each file in `core::arch`, we split the code into extern
91+
intrinsics that must be modeled by hand and defined intrinsics whose
92+
models can be derived semi-automatically. The extern intrinsics are
93+
placed in a module suffixed with `_handwritten`. Hence, for example,
94+
the extern intrinsics used in `avx2.rs` can be found in `avx2_handwritten.rs`.
95+
96+
Modeling extern intrinsics is similar to modeling the builtin ones,
97+
in that the models are written by hand and treat the SIMD vectors
98+
as arrays of machine integers. The main difference is that these intrinsics
99+
are platform specific and so their modeling requires looking at the Intel or ARM
100+
documentation for the underlying operation.
101+
102+
For example, the extern intrinsic `phaddw` used in `avx2` corresponds to an
103+
Intel instruction called "Packed Horizontal Add" and is used in AVX2 intrinsics
104+
like `_mm256_hadd_epi16` documented [here](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hadd_epi16&ig_expand=3667_)
105+
By inspecting the Intel documentation, we can write a Rust model for it
106+
as follows
107+
108+
```rust
109+
pub fn phaddw(a: i16x16, b: i16x16) -> i16x16 {
110+
i16x16::from_fn(|i| {
111+
if i < 4 {
112+
a[2 * i].wrapping_add(a[2 * i + 1])
113+
} else if i < 8 {
114+
b[2 * (i - 4)].wrapping_add(b[2 * (i - 4) + 1])
115+
} else if i < 12 {
116+
a[2 * (i - 4)].wrapping_add(a[2 * (i - 4) + 1])
117+
} else {
118+
b[2 * (i - 8)].wrapping_add(b[2 * (i - 8) + 1])
119+
}
120+
})
121+
}
122+
```
123+
124+
### Modeling defined intrinsics semi-automatically
125+
126+
To model the third category of intrinsics, we copy the Rust code of
127+
the intrinsic and adapt it to use our underlying abstractions. The
128+
changes needed to the code are sometimes scriptable, and indeed most
129+
of our models were generated from a script, but some changes are still
130+
needed by hand.
131+
132+
For example, let us say the intrinsic we are modeling is
31133
`_mm256_bsrli_epi128` from the avx2 feature set.
32134

33135
1. We go to [rust-lang/stdarch/crates/core_arch/src/x86/](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch/src/x86/), and find the implementation of the intrinsic in `avx2.rs`.
@@ -69,39 +171,17 @@ pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
69171
transmute(r)
70172
}
71173
}
72-
```
73-
Thus, we then go to to `core_arch/x86/models/avx2.rs`, and add the implementation. After some modification, it ends up looking like this.
74-
``` rust
75-
/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
76-
///
77-
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
174+
```
78175

79-
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
80-
const fn mask(shift: i32, i: u32) -> u64 {
81-
let shift = shift as u32 & 0xff;
82-
if shift > 15 || (15 - (i % 16)) < shift {
83-
0 as u64
84-
} else {
85-
(32 + (i + shift)) as u64
86-
}
87-
}
88-
89-
let a = BitVec::to_i8x32(a);
90-
let r: i8x32 = simd_shuffle(
91-
i8x32::from_fn(|_| 0),
92-
a,
93-
[
94-
mask(IMM8, 0),
95-
mask(IMM8, 1),
96-
mask(IMM8, 2),
97-
mask(IMM8, 3),
98-
...
99-
mask(IMM8, 31),
100-
],
101-
);
102-
r.into()
103-
}
104-
```
176+
Thus, we then go to to `core_arch/x86/models/avx2.rs`, and add this implementation.
177+
The only change it requires here is that the `simd_shuffle` macro is a function in our model,
178+
and we discard all the function attributes.
179+
180+
For other intrinsics, sometimes we need to make more changes. Since our model of the builtin intrinsics
181+
are more precise with respect to the type of their arguments compared to their Rust counterparts, we
182+
sometimes need to add more type annotations in our defined models. We also remove all `unsafe` guards,
183+
since our models are always in safe Rust. Otherwise, our code for the defined intrinsics looks very
184+
similar to the upstream code in `core::arch`.
105185

106186
3. Next, we add a test for this intrinsic. For this, we navigate to `core_arch/avx2/tests/avx2.rs`. Since the value of
107187
`IMM8` can be up to 8 bits, we want to test constant arguments up to 255. Thus, we write the following macro invocation.
@@ -116,9 +196,10 @@ pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
116196
To contribute new models of intrinsics, we expect the author to follow
117197
the above steps and provide comprehensive tests. It is important that
118198
the model author look carefully at both the Intel/ARM specification
119-
and the Rust `stdarch` implementation, because the Rust implementation
120-
may not necessarily be correct.
199+
and the Rust `stdarch` implementation, because they may look quite different
200+
from each other.
121201

202+
In some cases, the Rust implementation may not be correct.
122203
Indeed, the previous implementation of `_mm256_bsrli_epi128` (and a
123204
similar intrinsic called `_mm512_bsrli_epi128`) in `stdarch` had a
124205
bug, which we found during the process of modeling and testing this

testable-simd-models/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
//! By providing a readable, testable, well-specified version of `core`'s behavior, it serves as a foundation for
2626
//! proof assistants and other verification tools.
2727
28-
// This recursion limit is necessary for mk! macro sued for tests.
28+
// This recursion limit is necessary for mk! macro used for tests.
2929
// We test functions with const generics, the macro generate a test per possible (const generic) control value.
3030
#![recursion_limit = "4096"]
3131
pub mod abstractions;

0 commit comments

Comments
 (0)