2121
2222## Features
2323
24+ [ All supported intrinsics here] ( https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX&ssetechs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2&avxnewtechs=AVX,AVX2 ) .
25+
2426### SIMD intrinsics with ` _mm_ ` prefix
2527
2628| | DMD x86/x86_64 | LDC x86/x86_64 | LDC arm64 | GDC x86_64 |
2729| -------| -----------------------| ------------------------| ----------------------| -------------------------|
28- | MMX | Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes | Yes | Yes |
30+ | MMX | Yes | Yes | Yes | Yes |
2931| SSE | Yes | Yes | Yes | Yes |
30- | SSE2 | Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes | Yes | Yes |
31- | SSE3 | Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes (` -mattr=+sse3 ` ) | Yes | Yes (` -msse3 ` ) |
32+ | SSE2 | Yes | Yes | Yes | Yes |
33+ | SSE3 | Yes | Yes (` -mattr=+sse3 ` ) | Yes | Yes (` -msse3 ` ) |
3234| SSSE3 | Yes (` -mcpu ` ) | Yes (` -mattr=+ssse3 ` ) | Yes | Yes (` -mssse3 ` ) |
33- | SSE4.1| Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes (` -mattr=+sse4.1 ` ) | Yes | Yes (` -msse4.1 ` ) |
34- | SSE4.2| Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes (` -mattr=+sse4.2 ` ) | Yes (` -mattr=+crc ` ) | Yes (` -msse4.2 ` ) |
35- | BMI2 | Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes (` -mattr=+bmi2 ` ) | Yes | Yes (` -mbmi2 ` ) |
36- | AVX | Yes but ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | Yes (` -mattr=+avx ` ) | Yes | Yes (` -mavx ` ) |
37- | F16C | WIP, ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | WIP (` -mattr=+f16c ` ) | WIP | WIP (` -mf16c ` ) |
38- | AVX2 | WIP and ( [ # 42 ] ( https://github.com/AuburnSounds/intel-intrinsics/issues/42 ) ) | WIP (` -mattr=+avx2 ` ) | WIP | WIP (` -mavx2 ` ) |
35+ | SSE4.1| Yes | Yes (` -mattr=+sse4.1 ` ) | Yes | Yes (` -msse4.1 ` ) |
36+ | SSE4.2| Yes | Yes (` -mattr=+sse4.2 ` ) | Yes (` -mattr=+crc ` ) | Yes (` -msse4.2 ` ) |
37+ | BMI2 | Yes | Yes (` -mattr=+bmi2 ` ) | Yes | Yes (` -mbmi2 ` ) |
38+ | AVX | Yes | Yes (` -mattr=+avx ` ) | Yes | Yes (` -mavx ` ) |
39+ | F16C | WIP | WIP (` -mattr=+f16c ` ) | WIP | WIP (` -mf16c ` ) |
40+ | AVX2 | WIP | WIP (` -mattr=+avx2 ` ) | WIP | WIP (` -mavx2 ` ) |
3941
40- The intrinsics implemented follow the syntax and semantics at: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
42+ The intrinsics implemented follow the syntax and semantics at:
43+ - https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.htm
44+ - https://www.officedaytime.com/simd512e/
4145
4246The philosophy (and guarantee) of ` intel-intrinsics ` is:
4347 - ` intel-intrinsics ` generates optimal code else it's a bug.
@@ -54,11 +58,11 @@ The philosophy (and guarantee) of `intel-intrinsics` is:
5458
5559though most of the time you will deal with:
5660``` d
57- alias __m128 = float4;
61+ alias __m128 = float4;
5862alias __m128i = int4;
5963alias __m128d = double2;
60- alias __m64 = long1;
61- alias __m256 = float8;
64+ alias __m64 = long1;
65+ alias __m256 = float8;
6266alias __m256i = long4;
6367alias __m256d = double4;
6468```
@@ -92,15 +96,14 @@ __m128 add_4x_floats(__m128 a, __m128 b)
9296
9397### Individual element access
9498
95- It is recommended to do it in that way for maximum portability:
9699``` d
97100__m128i A;
98101
99- // recommended portable way to set a single SIMD element
100- A.ptr [0] = 42;
102+ // set a single SIMD element (here, in an int4)
103+ A[0] = 42;
101104
102- // recommended portable way to get a single SIMD element
103- int elem = A.array [0];
105+ // get a single SIMD element (here, in an int4)
106+ int elem = A[0];
104107```
105108
106109
@@ -120,9 +123,17 @@ The problem with introducing new names is that you need hundreds of new identifi
120123
121124- ** Documentation**
122125There is a convenient online guide provided by Intel:
123- https://software .intel.com/sites/landingpage/IntrinsicsGuide /
126+ https://www .intel.com/content/www/us/en/docs/intrinsics-guide /
124127Without that Intel documentation, it's impractical to write sizeable SIMD code.
125128
129+ ## Recommended for maximum reach on consumer machines
130+
131+ If you'd like to distribute software to consumers, it's safest to
132+ target SSE3 with ` dflags: ["-mattr=+sse3"] ` .
133+ - Apple Rosetta support up to AVX2.
134+ - Microsoft Prism supports up to SSE4.2.
135+
136+ ** Hence it's reach-limiting for consumer target to target above SSE4.2.**
126137
127138### Who is using it?
128139
0 commit comments