Skip to content

Commit ee32d3a

Browse files
authored
Add AVX2 support (#23035)
Followup to #22430. Each 256-bit AVX2 intrinsic is emulated on top of 128-bit intrinsics that wasm supports directly.
1 parent 2607cbf commit ee32d3a

File tree

10 files changed

+3172
-35
lines changed

10 files changed

+3172
-35
lines changed

ChangeLog.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ See docs/process.md for more on how version tagging works.
2020

2121
4.0.1 (in development)
2222
----------------------
23+
- Added support for compiling AVX2 intrinsics, 256-bit wide intrinsic is emulated
24+
on top of 128-bit Wasm SIMD instruction set. (#23035). Pass `-msimd128 -mavx2`
25+
to enable targeting AVX2.
2326
- The system JS libraries in `src/` were renamed from `library_foo.js` to
2427
`lib/libfoo.js`. They are still included via the same `-lfoo.js` flag so
2528
this should not be a user-visible change. (#23348)

emcc.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
'fetchSettings'
7777
]
7878

79-
SIMD_INTEL_FEATURE_TOWER = ['-msse', '-msse2', '-msse3', '-mssse3', '-msse4.1', '-msse4.2', '-msse4', '-mavx']
79+
SIMD_INTEL_FEATURE_TOWER = ['-msse', '-msse2', '-msse3', '-mssse3', '-msse4.1', '-msse4.2', '-msse4', '-mavx', '-mavx2']
8080
SIMD_NEON_FLAGS = ['-mfpu=neon']
8181
LINK_ONLY_FLAGS = {
8282
'--bind', '--closure', '--cpuprofiler', '--embed-file',
@@ -474,6 +474,9 @@ def array_contains_any_of(hay, needles):
474474
if array_contains_any_of(user_args, SIMD_INTEL_FEATURE_TOWER[7:]):
475475
cflags += ['-D__AVX__=1']
476476

477+
if array_contains_any_of(user_args, SIMD_INTEL_FEATURE_TOWER[8:]):
478+
cflags += ['-D__AVX2__=1']
479+
477480
if array_contains_any_of(user_args, SIMD_NEON_FLAGS):
478481
cflags += ['-D__ARM_NEON__=1']
479482

site/source/docs/porting/simd.rst

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Emscripten supports the `WebAssembly SIMD <https://github.com/webassembly/simd/>
1212
1. Enable LLVM/Clang SIMD autovectorizer to automatically target WebAssembly SIMD, without requiring changes to C/C++ source code.
1313
2. Write SIMD code using the GCC/Clang SIMD Vector Extensions (``__attribute__((vector_size(16)))``)
1414
3. Write SIMD code using the WebAssembly SIMD intrinsics (``#include <wasm_simd128.h>``)
15-
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 or AVX intrinsics (``#include <*mmintrin.h>``)
15+
4. Compile existing SIMD code that uses the x86 SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX or AVX2 intrinsics (``#include <*mmintrin.h>``)
1616
5. Compile existing SIMD code that uses the ARM NEON intrinsics (``#include <arm_neon.h>``)
1717

1818
These techniques can be freely combined in a single program.
@@ -153,6 +153,7 @@ Emscripten supports compiling existing codebases that use x86 SSE instructions b
153153
* **SSE4.1**: pass ``-msse4.1`` and ``#include <smmintrin.h>``. Use ``#ifdef __SSE4_1__`` to gate code.
154154
* **SSE4.2**: pass ``-msse4.2`` and ``#include <nmmintrin.h>``. Use ``#ifdef __SSE4_2__`` to gate code.
155155
* **AVX**: pass ``-mavx`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX__`` to gate code.
156+
* **AVX2**: pass ``-mavx2`` and ``#include <immintrin.h>``. Use ``#ifdef __AVX2__`` to gate code.
156157

157158
Currently only the SSE1, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AVX instruction sets are supported. Each of these instruction sets add on top of the previous ones, so e.g. when targeting SSE3, the instruction sets SSE1 and SSE2 are also available.
158159

@@ -1145,6 +1146,90 @@ The following table highlights the availability and expected performance of diff
11451146

11461147
Only the 128-bit wide instructions from AVX instruction set are listed. The 256-bit wide AVX instructions are emulated by two 128-bit wide instructions.
11471148

1149+
The following table highlights the availability and expected performance of different AVX2 intrinsics. Refer to `Intel Intrinsics Guide on AVX2 <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avxnewtechs=AVX2>`_.
1150+
1151+
.. list-table:: x86 AVX2 intrinsics available via #include <immintrin.h> and -mavx2
1152+
:widths: 20 30
1153+
:header-rows: 1
1154+
1155+
* - Intrinsic name
1156+
- WebAssembly SIMD support
1157+
* - _mm_broadcastss_ps
1158+
- 💡 emulated with a general shuffle
1159+
* - _mm_broadcastsd_pd
1160+
- 💡 emulated with a general shuffle
1161+
* - _mm_blend_epi32
1162+
- 💡 emulated with a general shuffle
1163+
* - _mm_broadcastb_epi8
1164+
- 💡 emulated with a general shuffle
1165+
* - _mm_broadcastw_epi16
1166+
- 💡 emulated with a general shuffle
1167+
* - _mm_broadcastd_epi32
1168+
- 💡 emulated with a general shuffle
1169+
* - _mm_broadcastq_epi64
1170+
- 💡 emulated with a general shuffle
1171+
* - _mm256_permutevar8x32_epi32
1172+
- ❌ scalarized
1173+
* - _mm256_permute4x64_pd
1174+
- 💡 emulated with two general shuffle
1175+
* - _mm256_permutevar8x32_ps
1176+
- ❌ scalarized
1177+
* - _mm256_permute4x64_epi64
1178+
- 💡 emulated with two general shuffle
1179+
* - _mm_maskload_epi32
1180+
- ❌ scalarized
1181+
* - _mm_maskload_epi64
1182+
- ❌ scalarized
1183+
* - _mm_maskstore_epi32
1184+
- ❌ scalarized
1185+
* - _mm_maskstore_epi64
1186+
- ❌ scalarized
1187+
* - _mm_sllv_epi32
1188+
- ❌ scalarized
1189+
* - _mm_sllv_epi64
1190+
- ❌ scalarized
1191+
* - _mm_srav_epi32
1192+
- ❌ scalarized
1193+
* - _mm_srlv_epi32
1194+
- ❌ scalarized
1195+
* - _mm_srlv_epi64
1196+
- ❌ scalarized
1197+
* - _mm_mask_i32gather_pd
1198+
- ❌ scalarized
1199+
* - _mm_mask_i64gather_pd
1200+
- ❌ scalarized
1201+
* - _mm_mask_i32gather_ps
1202+
- ❌ scalarized
1203+
* - _mm_mask_i64gather_ps
1204+
- ❌ scalarized
1205+
* - _mm_mask_i32gather_epi32
1206+
- ❌ scalarized
1207+
* - _mm_mask_i64gather_epi32
1208+
- ❌ scalarized
1209+
* - _mm_mask_i32gather_epi64
1210+
- ❌ scalarized
1211+
* - _mm_mask_i64gather_epi64
1212+
- ❌ scalarized
1213+
* - _mm_i32gather_pd
1214+
- ❌ scalarized
1215+
* - _mm_i64gather_pd
1216+
- ❌ scalarized
1217+
* - _mm_i32gather_ps
1218+
- ❌ scalarized
1219+
* - _mm_i64gather_ps
1220+
- ❌ scalarized
1221+
* - _mm_i32gather_epi32
1222+
- ❌ scalarized
1223+
* - _mm_i64gather_epi32
1224+
- ❌ scalarized
1225+
* - _mm_i32gather_epi64
1226+
- ❌ scalarized
1227+
* - _mm_i64gather_epi64
1228+
- ❌ scalarized
1229+
1230+
All the 128-bit wide instructions from AVX2 instruction set are listed.
1231+
Only a small part of the 256-bit AVX2 instruction set are listed, most of the
1232+
256-bit wide AVX2 instructions are emulated by two 128-bit wide instructions.
11481233

11491234
======================================================
11501235
Compiling SIMD code targeting ARM NEON instruction set

0 commit comments

Comments
 (0)