You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site/source/docs/porting/simd.rst
+84-1Lines changed: 84 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Emscripten supports the `WebAssembly SIMD <https://github.com/webassembly/simd/>
12
12
1. Enable LLVM/Clang SIMD autovectorizer to automatically target WebAssembly SIMD, without requiring changes to C/C++ source code.
13
13
2. Write SIMD code using the GCC/Clang SIMD Vector Extensions (``__attribute__((vector_size(16)))``)
14
14
3. Write SIMD code using the WebAssembly SIMD intrinsics (``#include <wasm_simd128.h>``)
15
-
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2or AVX intrinsics (``#include <*mmintrin.h>``)
15
+
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX or AVX2 intrinsics (``#include <*mmintrin.h>``)
16
16
5. Compile existing SIMD code that uses the ARM NEON intrinsics (``#include <arm_neon.h>``)
17
17
18
18
These techniques can be freely combined in a single program.
@@ -152,6 +152,7 @@ Emscripten supports compiling existing codebases that use x86 SSE instructions b
152
152
* **SSE4.1**: pass ``-msse4.1`` and ``#include <smmintrin.h>``. Use ``#ifdef __SSE4_1__`` to gate code.
153
153
* **SSE4.2**: pass ``-msse4.2`` and ``#include <nmmintrin.h>``. Use ``#ifdef __SSE4_2__`` to gate code.
154
154
* **AVX**: pass ``-mavx`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX__`` to gate code.
155
+
* **AVX2**: pass ``-mavx2`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX2__`` to gate code.
155
156
156
157
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.
157
158
@@ -1138,6 +1139,88 @@ The following table highlights the availability and expected performance of diff
1138
1139
1139
1140
Only the 128-bit wide instructions from AVX instruction set are listed. The 256-bit wide AVX instructions are emulated by two 128-bit wide instructions.
1140
1141
1142
+
The following table highlights the availability and expected performance of different AVX2 intrinsics. Refer to `Intel Intrinsics Guide on AVX2 <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avxnewtechs=AVX2>`_.
1143
+
1144
+
.. list-table:: x86 AVX2 intrinsics available via #include <immintrin.h> and -mavx2
1145
+
:widths: 20 30
1146
+
:header-rows: 1
1147
+
1148
+
* - Intrinsic name
1149
+
- WebAssembly SIMD support
1150
+
* - _mm_broadcastss_ps
1151
+
- 💡 emulated with a general shuffle
1152
+
* - _mm_broadcastsd_pd
1153
+
- 💡 emulated with a general shuffle
1154
+
* - _mm_blend_epi32
1155
+
- 💡 emulated with a general shuffle
1156
+
* - _mm_broadcastb_epi8
1157
+
- 💡 emulated with a general shuffle
1158
+
* - _mm_broadcastw_epi16
1159
+
- 💡 emulated with a general shuffle
1160
+
* - _mm_broadcastd_epi32
1161
+
- 💡 emulated with a general shuffle
1162
+
* - _mm_broadcastq_epi64
1163
+
- 💡 emulated with a general shuffle
1164
+
* - _mm256_permutevar8x32_epi32
1165
+
- ❌ scalarized
1166
+
* - _mm256_permute4x64_pd
1167
+
- 💡 emulated with two general shuffle
1168
+
* - _mm256_permutevar8x32_ps
1169
+
- ❌ scalarized
1170
+
* - _mm256_permute4x64_epi64
1171
+
- 💡 emulated with two general shuffle
1172
+
* - _mm_maskload_epi32
1173
+
- ⚠️ emulated with SIMD load+shift+and
1174
+
* - _mm_maskload_epi64
1175
+
- ⚠️ emulated with SIMD load+shift+and
1176
+
* - _mm_maskstore_epi32
1177
+
- ❌ scalarized
1178
+
* - _mm_maskstore_epi64
1179
+
- ❌ scalarized
1180
+
* - _mm_sllv_epi32
1181
+
- ❌ scalarized
1182
+
* - _mm_sllv_epi64
1183
+
- ❌ scalarized
1184
+
* - _mm_srav_epi32
1185
+
- ❌ scalarized
1186
+
* - _mm_srlv_epi32
1187
+
- ❌ scalarized
1188
+
* - _mm_srlv_epi64
1189
+
- ❌ scalarized
1190
+
* - _mm_mask_i32gather_pd
1191
+
- ❌ scalarized
1192
+
* - _mm_mask_i64gather_pd
1193
+
- ❌ scalarized
1194
+
* - _mm_mask_i32gather_ps
1195
+
- ❌ scalarized
1196
+
* - _mm_mask_i64gather_ps
1197
+
- ❌ scalarized
1198
+
* - _mm_mask_i32gather_epi32
1199
+
- ❌ scalarized
1200
+
* - _mm_mask_i64gather_epi32
1201
+
- ❌ scalarized
1202
+
* - _mm_mask_i32gather_epi64
1203
+
- ❌ scalarized
1204
+
* - _mm_mask_i64gather_epi64
1205
+
- ❌ scalarized
1206
+
* - _mm_i32gather_pd
1207
+
- ❌ scalarized
1208
+
* - _mm_i64gather_pd
1209
+
- ❌ scalarized
1210
+
* - _mm_i32gather_ps
1211
+
- ❌ scalarized
1212
+
* - _mm_i64gather_ps
1213
+
- ❌ scalarized
1214
+
* - _mm_i32gather_epi32
1215
+
- ❌ scalarized
1216
+
* - _mm_i64gather_epi32
1217
+
- ❌ scalarized
1218
+
* - _mm_i32gather_epi64
1219
+
- ❌ scalarized
1220
+
* - _mm_i64gather_epi64
1221
+
- ❌ scalarized
1222
+
1223
+
All the 128-bit wide instructions from AVX2 instruction set are listed. Only a small part of the 256-bit AVX2 instruction set are listed, most of the 256-bit wide AVX2 instructions are emulated by two 128-bit wide instructions.
0 commit comments