Skip to content

Commit b5ee135

Browse files
committed
updating README
1 parent 51d28c8 commit b5ee135

File tree

1 file changed

+169
-1
lines changed

1 file changed

+169
-1
lines changed

testable-simd-models/README.md

Lines changed: 169 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,170 @@
11
# testable-simd-models
2-
Rust models for the Core Library
2+
3+
This crates contains models for the intrinsics provided by `core::arch`. Its structure is based off of
4+
[rust-lang/stdarch/crates/core_arch](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch). Within the `core_arch` folder in this crate, there is a different
5+
folder for each architecture whose intrinsics are being implemented (corresponding to folders in the previous link). Each such
6+
folder has 3 sub-folders, `models`, `tests`, and `specs`.
7+
8+
The `models` folder contains the models of the intrinsics, with a file corresponding to different target features,
9+
and are written using the various abstractions implementedin `crate::abstractions`, especially those
10+
in `crate::abstractions::simd`. These models are meant to closely resemble their implementations within
11+
the Rust core itself.
12+
13+
The `tests` folder contains the tests of these models, and is structured the same way as `models`. Each file
14+
additionally contains the definition of a macro that makes writing these tests easier. The tests
15+
work by testing the models against the intrinsics in the Rust core, trying out random inputs
16+
(generally 1000), and comparing their outputs.
17+
18+
The `specs` folder contains specifications. These are implementatioons written without
19+
using the function abstractions in `crate::abstractions::simd`, and are written to be
20+
match their vendor specification as closely as possible.
21+
22+
The process of adding a specific intrinsic's model goes as follows. For this example,
23+
let us say the intrinsic we are adding is `_mm256_bsrli_epi128` from the avx2 feature set.
24+
25+
1. We go to [rust-lang/stdarch/crates/core_arch/src/x86/](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch/src/x86/), and find the implementation of the intrinsic in `avx2.rs`.
26+
2. We see that the implementation looks like this:
27+
``` rust
28+
/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
29+
///
30+
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
31+
#[inline]
32+
#[target_feature(enable = "avx2")]
33+
#[cfg_attr(test, assert_instr(vpsrldq, IMM8 = 1))]
34+
#[rustc_legacy_const_generics(1)]
35+
#[stable(feature = "simd_x86", since = "1.27.0")]
36+
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
37+
static_assert_uimm_bits!(IMM8, 8);
38+
const fn mask(shift: i32, i: u32) -> u32 {
39+
let shift = shift as u32 & 0xff;
40+
if shift > 15 || (15 - (i % 16)) < shift {
41+
0
42+
} else {
43+
32 + (i + shift)
44+
}
45+
}
46+
unsafe {
47+
let a = a.as_i8x32();
48+
let r: i8x32 = simd_shuffle!(
49+
i8x32::ZERO,
50+
a,
51+
[
52+
mask(IMM8, 0),
53+
mask(IMM8, 1),
54+
mask(IMM8, 2),
55+
mask(IMM8, 3),
56+
mask(IMM8, 4),
57+
mask(IMM8, 5),
58+
mask(IMM8, 6),
59+
mask(IMM8, 7),
60+
mask(IMM8, 8),
61+
mask(IMM8, 9),
62+
mask(IMM8, 10),
63+
mask(IMM8, 11),
64+
mask(IMM8, 12),
65+
mask(IMM8, 13),
66+
mask(IMM8, 14),
67+
mask(IMM8, 15),
68+
mask(IMM8, 16),
69+
mask(IMM8, 17),
70+
mask(IMM8, 18),
71+
mask(IMM8, 19),
72+
mask(IMM8, 20),
73+
mask(IMM8, 21),
74+
mask(IMM8, 22),
75+
mask(IMM8, 23),
76+
mask(IMM8, 24),
77+
mask(IMM8, 25),
78+
mask(IMM8, 26),
79+
mask(IMM8, 27),
80+
mask(IMM8, 28),
81+
mask(IMM8, 29),
82+
mask(IMM8, 30),
83+
mask(IMM8, 31),
84+
],
85+
);
86+
transmute(r)
87+
}
88+
}
89+
```
90+
Thus, we then go to to `core_arch/x86/models/avx2.rs`, and add the implementation. After some modification, it ends up looking like this.
91+
``` rust
92+
/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
93+
///
94+
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
95+
96+
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
97+
const fn mask(shift: i32, i: u32) -> u64 {
98+
let shift = shift as u32 & 0xff;
99+
if shift > 15 || (15 - (i % 16)) < shift {
100+
0 as u64
101+
} else {
102+
(32 + (i + shift)) as u64
103+
}
104+
}
105+
106+
let a = BitVec::to_i8x32(a);
107+
let r: i8x32 = simd_shuffle(
108+
i8x32::from_fn(|_| 0),
109+
a,
110+
[
111+
mask(IMM8, 0),
112+
mask(IMM8, 1),
113+
mask(IMM8, 2),
114+
mask(IMM8, 3),
115+
mask(IMM8, 4),
116+
mask(IMM8, 5),
117+
mask(IMM8, 6),
118+
mask(IMM8, 7),
119+
mask(IMM8, 8),
120+
mask(IMM8, 9),
121+
mask(IMM8, 10),
122+
mask(IMM8, 11),
123+
mask(IMM8, 12),
124+
mask(IMM8, 13),
125+
mask(IMM8, 14),
126+
mask(IMM8, 15),
127+
mask(IMM8, 16),
128+
mask(IMM8, 17),
129+
mask(IMM8, 18),
130+
mask(IMM8, 19),
131+
mask(IMM8, 20),
132+
mask(IMM8, 21),
133+
mask(IMM8, 22),
134+
mask(IMM8, 23),
135+
mask(IMM8, 24),
136+
mask(IMM8, 25),
137+
mask(IMM8, 26),
138+
mask(IMM8, 27),
139+
mask(IMM8, 28),
140+
mask(IMM8, 29),
141+
mask(IMM8, 30),
142+
mask(IMM8, 31),
143+
],
144+
);
145+
r.into()
146+
}
147+
```
148+
149+
3. Next, we add a test for this intrinsic. For this, we navigate to `core_arch/avx2/tests/avx2.rs`. Since the value of
150+
`IMM8` can be up to 8 bits, we want to test constant arguments up to 255. Thus, we write the following macro invocation.
151+
```rust
152+
mk!([100]_mm256_bsrli_epi128{<0>,<1>,<2>,<3>,<4>,<5>,<6>,<7>,<8>,<9>,<10>,<11>,<12>,<13>,<14>,<15>,<16>,<17>,<18>,<19>,<20>,<21>,<22>,<23>,<24>,<25>,<26>,<27>,<28>,<29>,<30>,<31>,<32>,<33>,<34>,<35>,<36>,<37>,<38>,<39>,<40>,<41>,<42>,<43>,<44>,<45>,<46>,<47>,<48>,<49>,<50>,<51>,<52>,<53>,<54>,<55>,<56>,<57>,<58>,<59>,<60>,<61>,<62>,<63>,<64>,<65>,<66>,<67>,<68>,<69>,<70>,<71>,<72>,<73>,<74>,<75>,<76>,<77>,<78>,<79>,<80>,<81>,<82>,<83>,<84>,<85>,<86>,<87>,<88>,<89>,<90>,<91>,<92>,<93>,<94>,<95>,<96>,<97>,<98>,<99>,<100>,<101>,<102>,<103>,<104>,<105>,<106>,<107>,<108>,<109>,<110>,<111>,<112>,<113>,<114>,<115>,<116>,<117>,<118>,<119>,<120>,<121>,<122>,<123>,<124>,<125>,<126>,<127>,<128>,<129>,<130>,<131>,<132>,<133>,<134>,<135>,<136>,<137>,<138>,<139>,<140>,<141>,<142>,<143>,<144>,<145>,<146>,<147>,<148>,<149>,<150>,<151>,<152>,<153>,<154>,<155>,<156>,<157>,<158>,<159>,<160>,<161>,<162>,<163>,<164>,<165>,<166>,<167>,<168>,<169>,<170>,<171>,<172>,<173>,<174>,<175>,<176>,<177>,<178>,<179>,<180>,<181>,<182>,<183>,<184>,<185>,<186>,<187>,<188>,<189>,<190>,<191>,<192>,<193>,<194>,<195>,<196>,<197>,<198>,<199>,<200>,<201>,<202>,<203>,<204>,<205>,<206>,<207>,<208>,<209>,<210>,<211>,<212>,<213>,<214>,<215>,<216>,<217>,<218>,<219>,<220>,<221>,<222>,<223>,<224>,<225>,<226>,<227>,<228>,<229>,<230>,<231>,<232>,<233>,<234>,<235>,<236>,<237>,<238>,<239>,<240>,<241>,<242>,<243>,<244>,<245>,<246>,<247>,<248>,<249>,<250>,<251>,<252>,<253>,<254>,<255>}(a: BitVec));
153+
```
154+
Here, the `[100]` means we test 100 random inputs for each constant value. This concludes the necessary steps for implementing an intrinsic.
155+
4. Optionally, we may want to add a specification, since the code for the Rust implemetation is non straightforward. For this, we look up the [Intel Documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128).
156+
Based on the documentation, we may write the following specification.
157+
```rust
158+
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
159+
let a = BitVec::to_i128x2(a);
160+
let a = i128x2::from_fn(|i| {
161+
let tmp = IMM8 % 256;
162+
let tmp = tmp % 16;
163+
((a[i] as u128) >> (tmp * 8)) as i128
164+
});
165+
BitVec::from_i128x2(a)
166+
}
167+
```
168+
169+
170+

0 commit comments

Comments
 (0)