Skip to content

Commit 990c727

Browse files
committed
Add AVX2 support
1 parent 06cebfc commit 990c727

File tree

8 files changed

+3131
-33
lines changed

8 files changed

+3131
-33
lines changed

emcc.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
'fetchSettings'
7878
]
7979

80-
SIMD_INTEL_FEATURE_TOWER = ['-msse', '-msse2', '-msse3', '-mssse3', '-msse4.1', '-msse4.2', '-msse4', '-mavx']
80+
SIMD_INTEL_FEATURE_TOWER = ['-msse', '-msse2', '-msse3', '-mssse3', '-msse4.1', '-msse4.2', '-msse4', '-mavx', '-mavx2']
8181
SIMD_NEON_FLAGS = ['-mfpu=neon']
8282
LINK_ONLY_FLAGS = {
8383
'--bind', '--closure', '--cpuprofiler', '--embed-file',
@@ -493,6 +493,9 @@ def array_contains_any_of(hay, needles):
493493
if array_contains_any_of(user_args, SIMD_INTEL_FEATURE_TOWER[7:]):
494494
cflags += ['-D__AVX__=1']
495495

496+
if array_contains_any_of(user_args, SIMD_INTEL_FEATURE_TOWER[8:]):
497+
cflags += ['-D__AVX2__=1']
498+
496499
if array_contains_any_of(user_args, SIMD_NEON_FLAGS):
497500
cflags += ['-D__ARM_NEON__=1']
498501

site/source/docs/porting/simd.rst

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Emscripten supports the `WebAssembly SIMD <https://github.com/webassembly/simd/>
1212
1. Enable LLVM/Clang SIMD autovectorizer to automatically target WebAssembly SIMD, without requiring changes to C/C++ source code.
1313
2. Write SIMD code using the GCC/Clang SIMD Vector Extensions (``__attribute__((vector_size(16)))``)
1414
3. Write SIMD code using the WebAssembly SIMD intrinsics (``#include <wasm_simd128.h>``)
15-
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 or AVX intrinsics (``#include <*mmintrin.h>``)
15+
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX or AVX2 intrinsics (``#include <*mmintrin.h>``)
1616
5. Compile existing SIMD code that uses the ARM NEON intrinsics (``#include <arm_neon.h>``)
1717

1818
These techniques can be freely combined in a single program.
@@ -152,6 +152,7 @@ Emscripten supports compiling existing codebases that use x86 SSE instructions b
152152
* **SSE4.1**: pass ``-msse4.1`` and ``#include <smmintrin.h>``. Use ``#ifdef __SSE4_1__`` to gate code.
153153
* **SSE4.2**: pass ``-msse4.2`` and ``#include <nmmintrin.h>``. Use ``#ifdef __SSE4_2__`` to gate code.
154154
* **AVX**: pass ``-mavx`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX__`` to gate code.
155+
* **AVX2**: pass ``-mavx2`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX2__`` to gate code.
155156

156157
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.
157158

@@ -1138,6 +1139,88 @@ The following table highlights the availability and expected performance of diff
11381139

11391140
Only the 128-bit wide instructions from AVX instruction set are listed. The 256-bit wide AVX instructions are emulated by two 128-bit wide instructions.
11401141

1142+
The following table highlights the availability and expected performance of different AVX2 intrinsics. Refer to `Intel Intrinsics Guide on AVX2 <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avxnewtechs=AVX2>`_.
1143+
1144+
.. list-table:: x86 AVX2 intrinsics available via #include <immintrin.h> and -mavx2
1145+
:widths: 20 30
1146+
:header-rows: 1
1147+
1148+
* - Intrinsic name
1149+
- WebAssembly SIMD support
1150+
* - _mm_broadcastss_ps
1151+
- 💡 emulated with a general shuffle
1152+
* - _mm_broadcastsd_pd
1153+
- 💡 emulated with a general shuffle
1154+
* - _mm_blend_epi32
1155+
- 💡 emulated with a general shuffle
1156+
* - _mm_broadcastb_epi8
1157+
- 💡 emulated with a general shuffle
1158+
* - _mm_broadcastw_epi16
1159+
- 💡 emulated with a general shuffle
1160+
* - _mm_broadcastd_epi32
1161+
- 💡 emulated with a general shuffle
1162+
* - _mm_broadcastq_epi64
1163+
- 💡 emulated with a general shuffle
1164+
* - _mm256_permutevar8x32_epi32
1165+
- ❌ scalarized
1166+
* - _mm256_permute4x64_pd
1167+
- 💡 emulated with two general shuffle
1168+
* - _mm256_permutevar8x32_ps
1169+
- ❌ scalarized
1170+
* - _mm256_permute4x64_epi64
1171+
- 💡 emulated with two general shuffle
1172+
* - _mm_maskload_epi32
1173+
- ⚠️ emulated with SIMD load+shift+and
1174+
* - _mm_maskload_epi64
1175+
- ⚠️ emulated with SIMD load+shift+and
1176+
* - _mm_maskstore_epi32
1177+
- ❌ scalarized
1178+
* - _mm_maskstore_epi64
1179+
- ❌ scalarized
1180+
* - _mm_sllv_epi32
1181+
- ❌ scalarized
1182+
* - _mm_sllv_epi64
1183+
- ❌ scalarized
1184+
* - _mm_srav_epi32
1185+
- ❌ scalarized
1186+
* - _mm_srlv_epi32
1187+
- ❌ scalarized
1188+
* - _mm_srlv_epi64
1189+
- ❌ scalarized
1190+
* - _mm_mask_i32gather_pd
1191+
- ❌ scalarized
1192+
* - _mm_mask_i64gather_pd
1193+
- ❌ scalarized
1194+
* - _mm_mask_i32gather_ps
1195+
- ❌ scalarized
1196+
* - _mm_mask_i64gather_ps
1197+
- ❌ scalarized
1198+
* - _mm_mask_i32gather_epi32
1199+
- ❌ scalarized
1200+
* - _mm_mask_i64gather_epi32
1201+
- ❌ scalarized
1202+
* - _mm_mask_i32gather_epi64
1203+
- ❌ scalarized
1204+
* - _mm_mask_i64gather_epi64
1205+
- ❌ scalarized
1206+
* - _mm_i32gather_pd
1207+
- ❌ scalarized
1208+
* - _mm_i64gather_pd
1209+
- ❌ scalarized
1210+
* - _mm_i32gather_ps
1211+
- ❌ scalarized
1212+
* - _mm_i64gather_ps
1213+
- ❌ scalarized
1214+
* - _mm_i32gather_epi32
1215+
- ❌ scalarized
1216+
* - _mm_i64gather_epi32
1217+
- ❌ scalarized
1218+
* - _mm_i32gather_epi64
1219+
- ❌ scalarized
1220+
* - _mm_i64gather_epi64
1221+
- ❌ scalarized
1222+
1223+
All the 128-bit wide instructions from AVX2 instruction set are listed. Only a small part of the 256-bit AVX2 instruction set are listed, most of the 256-bit wide AVX2 instructions are emulated by two 128-bit wide instructions.
11411224

11421225
======================================================
11431226
Compiling SIMD code targeting ARM NEON instruction set

0 commit comments

Comments
 (0)