Skip to content

Commit cfe83f8

Browse files
author
Guillaume Piolat
committed
update documentation
1 parent 3f20f92 commit cfe83f8

File tree

2 files changed

+31
-20
lines changed

2 files changed

+31
-20
lines changed

README.md

Lines changed: 30 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,23 +21,27 @@
2121

2222
## Features
2323

24+
[All supported intrinsics here](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX&ssetechs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2&avxnewtechs=AVX,AVX2).
25+
2426
### SIMD intrinsics with `_mm_` prefix
2527

2628
| | DMD x86/x86_64 | LDC x86/x86_64 | LDC arm64 | GDC x86_64 |
2729
|-------|-----------------------|------------------------|----------------------|-------------------------|
28-
| MMX | Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes | Yes | Yes |
30+
| MMX | Yes | Yes | Yes | Yes |
2931
| SSE | Yes | Yes | Yes | Yes |
30-
| SSE2 | Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes | Yes | Yes |
31-
| SSE3 | Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes (`-mattr=+sse3`) | Yes | Yes (`-msse3`) |
32+
| SSE2 | Yes | Yes | Yes | Yes |
33+
| SSE3 | Yes | Yes (`-mattr=+sse3`) | Yes | Yes (`-msse3`) |
3234
| SSSE3 | Yes (`-mcpu`) | Yes (`-mattr=+ssse3`) | Yes | Yes (`-mssse3`) |
33-
| SSE4.1| Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes (`-mattr=+sse4.1`) | Yes | Yes (`-msse4.1`) |
34-
| SSE4.2| Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes (`-mattr=+sse4.2`) | Yes (`-mattr=+crc`) | Yes (`-msse4.2`) |
35-
| BMI2 | Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes (`-mattr=+bmi2`) | Yes | Yes (`-mbmi2`) |
36-
| AVX | Yes but ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | Yes (`-mattr=+avx`) | Yes | Yes (`-mavx`) |
37-
| F16C | WIP, ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | WIP (`-mattr=+f16c`) | WIP | WIP (`-mf16c`) |
38-
| AVX2 | WIP and ([#42](https://github.com/AuburnSounds/intel-intrinsics/issues/42)) | WIP (`-mattr=+avx2`) | WIP | WIP (`-mavx2`) |
35+
| SSE4.1| Yes | Yes (`-mattr=+sse4.1`) | Yes | Yes (`-msse4.1`) |
36+
| SSE4.2| Yes | Yes (`-mattr=+sse4.2`) | Yes (`-mattr=+crc`) | Yes (`-msse4.2`) |
37+
| BMI2 | Yes | Yes (`-mattr=+bmi2`) | Yes | Yes (`-mbmi2`) |
38+
| AVX | Yes | Yes (`-mattr=+avx`) | Yes | Yes (`-mavx`) |
39+
| F16C | WIP | WIP (`-mattr=+f16c`) | WIP | WIP (`-mf16c`) |
40+
| AVX2 | WIP | WIP (`-mattr=+avx2`) | WIP | WIP (`-mavx2`) |
3941

40-
The intrinsics implemented follow the syntax and semantics at: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
42+
The intrinsics implemented follow the syntax and semantics at:
43+
- https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.htm
44+
- https://www.officedaytime.com/simd512e/
4145

4246
The philosophy (and guarantee) of `intel-intrinsics` is:
4347
- `intel-intrinsics` generates optimal code else it's a bug.
@@ -54,11 +58,11 @@ The philosophy (and guarantee) of `intel-intrinsics` is:
5458

5559
though most of the time you will deal with:
5660
```d
57-
alias __m128 = float4;
61+
alias __m128 = float4;
5862
alias __m128i = int4;
5963
alias __m128d = double2;
60-
alias __m64 = long1;
61-
alias __m256 = float8;
64+
alias __m64 = long1;
65+
alias __m256 = float8;
6266
alias __m256i = long4;
6367
alias __m256d = double4;
6468
```
@@ -92,15 +96,14 @@ __m128 add_4x_floats(__m128 a, __m128 b)
9296

9397
### Individual element access
9498

95-
It is recommended to do it in that way for maximum portability:
9699
```d
97100
__m128i A;
98101
99-
// recommended portable way to set a single SIMD element
100-
A.ptr[0] = 42;
102+
// set a single SIMD element (here, in an int4)
103+
A[0] = 42;
101104
102-
// recommended portable way to get a single SIMD element
103-
int elem = A.array[0];
105+
// get a single SIMD element (here, in an int4)
106+
int elem = A[0];
104107
```
105108

106109

@@ -120,9 +123,17 @@ The problem with introducing new names is that you need hundreds of new identifi
120123

121124
- **Documentation**
122125
There is a convenient online guide provided by Intel:
123-
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
126+
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
124127
Without that Intel documentation, it's impractical to write sizeable SIMD code.
125128

129+
## Recommended for maximum reach on consumer machines
130+
131+
If you'd like to distribute software to consumers, it's safest to
132+
target SSE3 with `dflags: ["-mattr=+sse3"]`.
133+
- Apple Rosetta support up to AVX2.
134+
- Microsoft Prism supports up to SSE4.2.
135+
136+
**Hence it's reach-limiting for consumer target to target above SSE4.2.**
126137

127138
### Who is using it?
128139

source/inteli/avx2intrin.d

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* AVX2 intrinsics.
33
* https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX2
44
*
5-
* Copyright: Guillaume Piolat 2022-2024.
5+
* Copyright: Guillaume Piolat 2022-2025.
66
* Johan Engelen 2022.
77
* cet 2024.
88
* License: $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost License 1.0)

0 commit comments

Comments
 (0)