Skip to content

Commit b1caca2

Browse files
committed
Squashed 'lib/simde/simde/' changes from cbef1c152..0faa907b2
0faa907b2 gcc pedantic: fp16 is not part of ISO C, silence the warning f53a9cf79 gcc pedantic: also silence this other warning about __int128 59f779845 arm neon: Add float16 multi-vectors to native aliases 4b279d62e https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 was fixed in GCC 15.x 5c8f50ec1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782 was fixed in GCC 13 677f2cbee Avoid undefined behaviour with signed integer multiplication (#1296) 2096f755e arm64 gcc FRINT: skip native call on GCC 3ea330475 x86 sse2 for loongarch: fix GCC build failure (#1287) a532a12ca riscv64: Fallback to autovec without mrvv-vector-bits flag. (#1282) 85632ca82 arm neon riscv64: add min.h and max.h RVV implementations. (#1283) ca1e942d9 neon riscv64: Enable RVV segment load/store only when we have `__riscv_zvlsseg` flag. (#1285) cf8e6a73d riscv64: Enable V feature when both zve64d and zvl128b are present (#1284) c7f26b73b x86 avx for loongarch: use vfcmp_clt to save one instruction in `_mm_cmp_{sd,ss}` and `_mm256_cmp_pd` a8ae10d96 x86 sse2,avx2 loongarch impl: let compiler to generate instructions based on imm8 bb0282e3b x86 misc fixes for AVX512{F,VL}_NATIVE d458d8fdd x86 sse2,sse3, avx: silence some false-positive warnings about unitialized structs 4184e0d42 start preparing to release SIMDe 0.8.4 87ecd64a5 x86 sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high ca9449c1e Fix incorrect UQRSHL implementation. 8d90b0411 arm neon: fix `cmla{_rot{90,180,270},}_lane` with correct test-suite on ARMv8.3 system 500454a2a arm neon: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. 02ba92220 arm neon gcc-12 FRINT workaround 02d815773 arm neon FCMLA with 16-bit floats, requires the FP16 feature 8caaee795 arm neon: FRINT{32,64}{X,Z} native calls require ARMv8.5 438ddcff6 remove extraneous semicolons from many macro-defined functions 9f73373ff wasm simd128: fix a FAST_NANS error on arm64 0bd19a993 Fix vqdmulhs_s32 native alias. 62f40d4b8 x86 avx2: small fixes for loongarch d656b4d7e x86 sse2: small fixes for loongarch 8f56d4ff1 Remove incorrect qrdmulh SSE code. 8c421df17 arm neon: define native alias only under the inverse of the conditions of a pass-through 25e70ce71 simde-aes: gcc 13.2+ ignore unused variable warnings 69c9cd5c3 arm neon qdmlal: fix saturation (#1194) 34136823c Fix vqshlud_n_s64 implementation to be 64-bit. 483a4bccf Fix qdmlsl instructions f275fffd9 arm neon qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) d95bd9d76 arm neon qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 4b9007046 x86 sse2: fix `_mm_pause` for RISCV systems 0be41ec7c risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions 70fc574b2 arm: Rename ARM ROL/ROR functions with a SIMDE prefix. a39bd6dde arm neon sli_n: Fix invalid shift warnings (#1253) 7bd2bb70e arm neon `_vext_p6`: reverse logic to avoid GCC14 i586 bug (#1251) b4bf72e14 x86 clmul simde_x_bitreverse_u64: add loongarch implementation (#1249) 04f9b4ca6 x86 avx: reoptimized simde_mm256_addsub_ps/d with lasx 54d352981 x86 fma: add loongarch lasx optimized implementations adefb8dcb x86 f16c: add loongarch lasx optimized implementations b720dcb7d x86 avx512f: added fmaddsub implementation (#1246) 5c9f6aa19 x86 sse4.2: add loongarch lsx optimized implementations 783703714 x86 sse4.1: add loongarch lsx optimized implementations 0bfc2312f x86 ssse3: add loongarch lsx optimized implementations fcae0eee0 x86 sse3: add loongarch lsx optimized implementations af6467260 x86 sse: add loongarch lsx optimized implementations 2ad64c9f7 x86 avx2: add loongarch lasx optimized implementations (#1241) 5cae2261b x86 avx: add loongarch lasx optimized implementations (#1239) 484fcce25 x86 avx: use INT64_C when the destination is i64 (#1238) 5e225b1c6 loongarch: add lsx support for sse2.h 665d7f93b fix clang type redef error b0fcc6176 Whoops, missing comma fe262fb0e loongarch float16: use a portable version to avoid compilation errors 1a09d3bc9 x86: move definition of 'value' to correct branch in _mm_loadl_epi64 aac583326 x86: some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ d1afb3db1 arm crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 592f8f0c4 _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes de4337e8d gcc-14 -O3 complained about some possible unitialized values 8b0937a3e neon/cvz z/Arch: stop using deprecated functions. e18dcd7d0 arm neon: avoid GCC 11 vst1_*_x4 built-in functions 848fb7777 arm neon: fix arm64 gcc11 build excess elements in vector failure 0aaf78298 x86/sse: Fix type convert error for LSX. 29c96207c arm wasm: add vst2_u8 translation to Wasm SIMD 375ad48fd arm wasm: add vshll translations to Wasm SIMD d5697fa99 arm wasm: add vst4_u8 translation to Wasm SIMD e235b2eb1 math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF cb4b08c47 wasm AltiVec: add u16x8 and u8x16 avgr translations 90237caba wasm NEON: add u16x8 and u8x16 avgr translations 6050906e9 arm neon vminnmv_f16: remove duplicate statement (#1208) a3d20d145 x86 wasm: Wasm SIMD version of `_mm_sad_epu8` 32650204e msvc: add simde_MemoryBarrier to avoid including <windows.h> 7ca5a3e0b x86/fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) 2ec1f51f8 pow: consistently use simde_math_pow 80f655739 x86: remove redundant mm_add_pd translation for WASM (#1190) 249b9dc03 arm/neon riscv64: additional RVV implementations - part 2. (#1189) 408d06a35 arm/neon riscv64: additional RVV implementations - part1 (#1188) da5cf1f54 Use _Float16 in C++ on aarch64 with GCC 13+ 39f436a9e Don't use _Float16 on non-SSE2 x86 985c27100 Don't use _Float16 on s390x 787830467 x86: Apply half tabular method in _mm_crc32 family d8a0c764f arm: improve performance in vqadd and vmvn in risc-v 99c63a427 neon: avoid warnings when "__ARM_NEON_FP" is not defined. e98cbcc70 start next development cycle: v0.8.3 3442dbf2d prepare to release 0.8.0 e6afb7bec arm neon: Fully remove the problematic FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics fb73a3182 arm: improve performance in vabd_xxx for risc-v 8a4ff7a8b arm: improve performance in vhadd_xxx for risc-v 52f1087ad arm: Add neon2rvv support in vand series intrinsics 737e3b33f arm: fix some neon2rvv intrinsic function error 5242a77dc arm: enable more intrinsic function for armv7 8f123e5c0 wasm x86 impl: some were incorrectly marked SSE instead of SSE2 2b9b01269 arm x86 implementations: allow _m128 access from SSE 6679ff018 svml: SSE is good enough for native m128i and m128d types & functions 68aac3b9a sse2 MSVC `_mm_pause` implementaiton for x86 e76f4331e typo fixes from codespell 73160356b x86 xop: fix some native functions 4ecf271be emscripten; use `__builtin_roundeven{f,}` from version 3.1.43 onwards 347e2b699 arm 32 bits: native def fixes; workarounds for gcc 61d1addce apple clang arm64: ignore SHA2 b58359225 arm platform: cleanup feature detection. e38f25685 arm neon sm3: check constant range ac2b229a1 arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 1d7848cf9 arm neon clang: skip vrnd native before clang v18 bb11054b5 clang: detect versions 18 & 19 647bb87de Initial Support for the RISC-V Vector Extension in ARM NEON (#1130) 83479bd70 start next development cycle: v0.8.1 22a493c26 arm/neon abs: negating INT_MIN is undefined behavior 453dec209 simde-detect-clang.h: add clang 17 detection (#1132) e6fab1296 Update simde-detect-clang.h (#1131) e29a4fab5 typo: XCode -> Xcode (#1129) 8392c69a1 Improve performance of simde_mm512_add_epi32 (#1126) ddaab3759 neon {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 8e9d432a6 correction of simde_mm256_sign_epi{8,16,32}. (#1123) 43ec909bb avx512 abs: refine GCC compiler checks for `_mm512{,_mask}_abs_pd` (#1118) 24be11d00 gh-actions: test mips64el using qemu on gcc12/clang16 f0bd155cf wasm relaxed: add f{32x4,64x2}_relaxed_{min,max} 459abf9f7 wasm simd128/relaxed: begin MIPS implementations ffe050ce9 wasm relaxed: updated names; reordered FMA operations 762d7ad22 wasm: detect support for Relaxed SIMD mode e96949e3f prepare to release 0.8.0 f73d72e4e NEON: implement all bf16-related intrinsics (#1110) 72f6d30fe neon: add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 d6271b3fe NEON: implement all intrinsics supported by architecture A64-remaining part (#1093) 7904fc3cf sse2 mm_pause: more archs, add a basic test 260adca59 arm neon ld2: silence warnings at -O3 on gcc risc-v d1578a0ce simde_float16: prefer __fp16 if available 064b80493 svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl 790e8d6c3 fp16: don't use _Float16 on ClangCL if not supported 87f7d3317 neon: Modified simde_float16 to simde_float16_t (#1100) 2c58d6d05 Reuse unoptimized implementations of vaesimcq_u8 from x86 be7e377cb [NEON] Add AES instructions. a686efde4 x86 sse4.1 mm_testz_si128: fix backwards short circuit logic bc206e4fa wasm f{32x4,64x2}_min: add workaround for a gcc<6 issue f2e82c961 x86 pclmul: fix natives, some require VPCLMULQDQ 0adef454b avx512 gather: add MSVC native fallbacks ecc469297 avx512 set: add simde_x_mm512_set_m256{,d} 70f702627 NEON: part 1 of implement all intrinsics supported by architecture A64 (#1090) aefb342e9 avx512 types: avoid using native AVX512 types on MSVC unless required 833a87750 svml: enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ db326c75c sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 3f0321ba2 sse _mm_movemask_ps: remove unused code 4c7c77217 gh-actions: test with clang-16 87cf105ab neon/st1{,q}_*_x{2,3,4}: initial implementation (#1082) 5634cec09 NEON: more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) cfd917230 arm neon: Complex operations from Armv8.3-a (#1077) 389f360a6 arm aes: add neon implementation using the crypto extension 4adb6591f arm: use SIMDE_ARCH_ARM_FMA faeb00a70 avx512: fix many native aliases 98eb64b85 sse: implement _mm_movelh_ps for Arm64 57197e8db aes: initial implementation of most aes instructions (#1072) 2fbc63391 NEON: Implement some f16XN types and f16 related intrinsics. (#1071) 64f94c681 avx: simde_mm256_shuffle_pd fix for natural vector size < 128 d41408997 Add workaround for GCC bug 111609 665676042 Extend constant range in simde_vshll_n_XXX intrinsics (#1064) 33c4480de Remove non-working MMX specialization from simde_vmin_s16 6f4afd634 Fix issues related to MXCSR register (#1060) 82be3395b fix SIMDE_ARCH_X86_SSE4_2 define 38580983e riscv64 clang: doesn't support _Float16 or __fp16 properly yet a39f2c3b3 avx512/shuffle: mm512_{shuffle_epi32,shuffle{hi,lo}_epi16} a202d0116 avx512/gather: mm512_{mask_,}i64gather_{epi32,epi64,ps,pd) 95a6d0813 avx512 new families started: gather/reduce + other additional funcs ef8931287 avx512 cmp,cvt,cvts,cvtt,cvtus,gather,kand,permutex,rcp: new ops for intgemm 4a29d21ff avx512: start supporting AVX512FP16 / m512h f686d38f1 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 7760aabd1 GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 436dd4cc1 GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 843112308 GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ e140ac4e2 x86/avx512 fpclass: improve fallback implementation 5950c402c gh-actions: re-order ccache; add old clang/gcc versions faf228937 avx512/loadu: fix native detection b3341922b simde-f16: improve _Float16 usage; better INFHF/NANHF defs 5e632b09d avx512: naive implementation of fpclass b71b58c27 [NEON/A32V7]: Don't trust clang for load multiple on A32V7 c5de4d090 neon: Add qtbl/qtbx polyfills for A32V7 3bda0d7c6 neon/cvtn: vcvtnq_u32_f32 is a V8 function 73910b60c msa neon impl: float64x2_t is not avail in A32V7 0540d7fc2 clang aarch64: optimization bug 45541 was fixed in clang-15 d315aac71 clmul: aarch64 clang has difficulties with poly64x1_t a2eeb9ef1 sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 e676d9982 clang powerpc: vec_bperm bug was fixed in clang-14 0e3290e86 neon/st1: disable last remaining AltiVec implementation db0649e1d wasm simd128: more powerpc fixes bbdb2a1f5 sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC c6b6ac500 wasm/simd128: add missing unsigned functions 78faeab11 wasm/simd128: fix altivec_p7 version of wasm_f64x2_pmin 1f359106b We are in a dev period again: v0.7.7 9135bd049 neon/cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations 6a1db3a5a neon/cvtn: basic implementation of a few functions 1cf65cb0a mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 4ab8749df sse{,2,3,4.1},avx: more WASM shuffle implementations b49fa29d5 avx512: arghhh: really fix typedef of __mmask64 6244ab92e avx512: typo fix for typedef of __mmask64 20c5200d6 avx512/madd: fix native alias arguments for _mm512_madd_epi16 cc476f364 neon/qabs: restore SSE2 impl for vqabsq_s8 a7682611d neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations ca523adb7 sse: allow native _mm_loadh_pi on MSVC x64 ac526659e test: appease GCC 5.x & clang 01ea9a8d3 start release process for 0.7.6 28a6001f6 x86/sse*,avx: add additional SIMD128 implementations aca2f0ae6 neon/shl,rshl: fix avx include to unbreak amalgamated hearders f60a9d8df neon/mla_lane: initial implementation using mla+dup f982cfd51 Update clang version detection for 14..16 and add link b45a14ccc simde-arch: include hedley for setting F16C for MSVC 2022+ with AVX2 3ce91d4cd 0.7.5 dev cycle on the road to 0.7.6/0.8.0 02c7a67ed sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH b0b370a4b x86/sse: Add LoongArch LSX support 2338f175d arch: Add LoongArch LASX/LSX support 90d95fae4 avx512: define __mask64 & __mask32 if not yet defined 42a43fa57 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 20f98da6f sve/whilelt: correct type-o in __mmask32 initialization 47a1500f7 sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 cd93fcc9e avx512/knot,kxor: native calls not availabe on MSVC 2017 ba6324b6b avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 2f6fe9c64 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 91fda2cc9 axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions a397b74b3 __builtin_signbit: add cast to double for old Clang versions e016050b2 clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 7e353c009 Wasm q15mulr_sat_s: match Wasm spec ce375861c Wasm f32/f64 nearest: match Wasm spec 96d5e0346 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec 5676a1ba7 Wasm f32/f64 abs: match Wasm spec aa299c08b Wasm f32/f64 max: match Wasm spec 433d2b951 Wasm f32/f64 min: match Wasm spec cf1ac40b8 avx{,2}: some intrinsics are missing from older MSVC versions bff9b1b3c simd128: move unary minus to appease msvc native arm64 efc512a49 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 091250e81 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC 4b3053606 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 2dedbd9bf skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX a04ea7bc9 f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph e8ee041ab ci appveyor: build tests with AVX{,2}, but don't run them 2188c9728 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 186f12f17 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} 6a40fdeb5 arm/neon/rnd: use correct SVML function for simde_vrndq_f64 9a0705b06 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE c298a7ec2 msvc avx512/roundscale_round: quiet a false positive warning 01d9c5def sse: remove errant MMX requirement from simde_mm_movemask_ps c675aa08d x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 097af509e msvc 2022: enable F16C if AVX2 present 91cd7b64b avx{,2}: fix maskload illegal mem access 2caa25b85 Fixed simde_mm_prefetch warnings 96bdf5234 Fixed parameters to _mm_clflush 4d560e418 emscripten; don't use __builtin_roundeven{f,} even if defined 511a01e7d avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 a22b63dc9 avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 3d87469f6 wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type 56ca5bd89 Suppress min/max macro definitions from windows.h f2cea4d33 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 3698cef9b neon/cvt: clang bug 46844 was fixed in clang 12.0 9369cea4a simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} ce27bd09a gcc power: vec_cpsgn argument reversal fixed in 12.0 20fd5b94b gcc power: bugs 1007[012] fixed in GCC 12.1 5e25de133 gcc sse2: bug 99754 was fixed in GCC 12.1 e69796025 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error 359c3ff47 clang wasm simde: add workaround to fix wasm_i64x2_shl bug b767f5edc arm/neon: workaround on ARM64 windows bug 599b1fbf4 mips/msa: fix for Windows ARM64 c6f4821ed arm64 windows: fix simd128.h build error 782e7c73e prepare to release 0.7.4 6e9ac2457 fix A32V7 version of _mm_test{nz,}c_si128 776f7a699 test with Debian default flags, also for armel a240d951a x86: fix AVX native → SSE4.2 native 5a73c2ce5 _mm_insert_ps: incorrect handling of the control 597a1c9e4 neon/ld1[q]_*_x2: initial implementation 4550faeac wasm: f32x4 and f64x2 nearest roundeven 5e0686459 Add missing `static const` in simde-math.h. NFC da02f2cee avx512/setzero: fix native aliases 89762e11b Fixed FMA detection macro on msvc b0fda5cf2 avx512/load_pd: initial implementation a61af0778 avx512/load_ps: initial implementation 4126bde01 Properly map __mm functions to __simde_mm 2e76b7a69 neon ld2: gcc-12 fixes 604a53de3 fix wrong size e5e085ff8 AVX: add native calls for _mm256_insertf128_{pd,ps,si256} ee3bd005b aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a060c461a wasm: load lane memcpy instead of cast to address UBSAN issues git-subtree-dir: lib/simde/simde git-subtree-split: 0faa907b261001f89ac89becaea20beddd675468
1 parent 845e09b commit b1caca2

File tree

366 files changed

+63573
-4013
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

366 files changed

+63573
-4013
lines changed

arm/neon.h

Lines changed: 113 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
*
2323
* Copyright:
2424
* 2020 Evan Nemerson <evan@nemerson.com>
25+
* 2023 Yi-Yen Chung <eric681@andestech.com> (Copyright owned by Andes Technology)
2526
*/
2627

2728
#if !defined(SIMDE_ARM_NEON_H)
@@ -30,23 +31,32 @@
3031
#include "neon/types.h"
3132

3233
#include "neon/aba.h"
34+
#include "neon/abal.h"
35+
#include "neon/abal_high.h"
3336
#include "neon/abd.h"
3437
#include "neon/abdl.h"
38+
#include "neon/abdl_high.h"
3539
#include "neon/abs.h"
3640
#include "neon/add.h"
3741
#include "neon/addhn.h"
42+
#include "neon/addhn_high.h"
3843
#include "neon/addl.h"
3944
#include "neon/addlv.h"
4045
#include "neon/addl_high.h"
4146
#include "neon/addv.h"
4247
#include "neon/addw.h"
4348
#include "neon/addw_high.h"
49+
#include "neon/aes.h"
4450
#include "neon/and.h"
4551
#include "neon/bcax.h"
4652
#include "neon/bic.h"
4753
#include "neon/bsl.h"
54+
#include "neon/cadd_rot270.h"
55+
#include "neon/cadd_rot90.h"
4856
#include "neon/cage.h"
4957
#include "neon/cagt.h"
58+
#include "neon/cale.h"
59+
#include "neon/calt.h"
5060
#include "neon/ceq.h"
5161
#include "neon/ceqz.h"
5262
#include "neon/cge.h"
@@ -60,13 +70,24 @@
6070
#include "neon/cltz.h"
6171
#include "neon/clz.h"
6272
#include "neon/cmla.h"
63-
#include "neon/cmla_rot90.h"
73+
#include "neon/cmla_lane.h"
6474
#include "neon/cmla_rot180.h"
75+
#include "neon/cmla_rot180_lane.h"
6576
#include "neon/cmla_rot270.h"
77+
#include "neon/cmla_rot270_lane.h"
78+
#include "neon/cmla_rot90.h"
79+
#include "neon/cmla_rot90_lane.h"
6680
#include "neon/cnt.h"
6781
#include "neon/cvt.h"
82+
#include "neon/cvt_n.h"
83+
#include "neon/cvtm.h"
84+
#include "neon/cvtn.h"
85+
#include "neon/cvtp.h"
6886
#include "neon/combine.h"
87+
#include "neon/copy_lane.h"
88+
#include "neon/crc32.h"
6989
#include "neon/create.h"
90+
#include "neon/div.h"
7091
#include "neon/dot.h"
7192
#include "neon/dot_lane.h"
7293
#include "neon/dup_lane.h"
@@ -76,6 +97,11 @@
7697
#include "neon/fma.h"
7798
#include "neon/fma_lane.h"
7899
#include "neon/fma_n.h"
100+
#include "neon/fmlal.h"
101+
#include "neon/fmlsl.h"
102+
#include "neon/fms.h"
103+
#include "neon/fms_lane.h"
104+
#include "neon/fms_n.h"
79105
#include "neon/get_high.h"
80106
#include "neon/get_lane.h"
81107
#include "neon/get_low.h"
@@ -84,30 +110,48 @@
84110
#include "neon/ld1.h"
85111
#include "neon/ld1_dup.h"
86112
#include "neon/ld1_lane.h"
113+
#include "neon/ld1_x2.h"
114+
#include "neon/ld1_x3.h"
115+
#include "neon/ld1_x4.h"
116+
#include "neon/ld1q_x2.h"
117+
#include "neon/ld1q_x3.h"
118+
#include "neon/ld1q_x4.h"
87119
#include "neon/ld2.h"
120+
#include "neon/ld2_dup.h"
121+
#include "neon/ld2_lane.h"
88122
#include "neon/ld3.h"
123+
#include "neon/ld3_dup.h"
124+
#include "neon/ld3_lane.h"
89125
#include "neon/ld4.h"
126+
#include "neon/ld4_dup.h"
90127
#include "neon/ld4_lane.h"
91128
#include "neon/max.h"
92129
#include "neon/maxnm.h"
130+
#include "neon/maxnmv.h"
93131
#include "neon/maxv.h"
94132
#include "neon/min.h"
95133
#include "neon/minnm.h"
134+
#include "neon/minnmv.h"
96135
#include "neon/minv.h"
97136
#include "neon/mla.h"
137+
#include "neon/mla_lane.h"
98138
#include "neon/mla_n.h"
99139
#include "neon/mlal.h"
100140
#include "neon/mlal_high.h"
141+
#include "neon/mlal_high_lane.h"
101142
#include "neon/mlal_high_n.h"
102143
#include "neon/mlal_lane.h"
103144
#include "neon/mlal_n.h"
104145
#include "neon/mls.h"
146+
#include "neon/mls_lane.h"
105147
#include "neon/mls_n.h"
106148
#include "neon/mlsl.h"
107149
#include "neon/mlsl_high.h"
150+
#include "neon/mlsl_high_lane.h"
108151
#include "neon/mlsl_high_n.h"
109152
#include "neon/mlsl_lane.h"
110153
#include "neon/mlsl_n.h"
154+
#include "neon/mmlaq.h"
111155
#include "neon/movl.h"
112156
#include "neon/movl_high.h"
113157
#include "neon/movn.h"
@@ -117,8 +161,13 @@
117161
#include "neon/mul_n.h"
118162
#include "neon/mull.h"
119163
#include "neon/mull_high.h"
164+
#include "neon/mull_high_lane.h"
165+
#include "neon/mull_high_n.h"
120166
#include "neon/mull_lane.h"
121167
#include "neon/mull_n.h"
168+
#include "neon/mulx.h"
169+
#include "neon/mulx_lane.h"
170+
#include "neon/mulx_n.h"
122171
#include "neon/mvn.h"
123172
#include "neon/neg.h"
124173
#include "neon/orn.h"
@@ -127,59 +176,117 @@
127176
#include "neon/padd.h"
128177
#include "neon/paddl.h"
129178
#include "neon/pmax.h"
179+
#include "neon/pmaxnm.h"
130180
#include "neon/pmin.h"
181+
#include "neon/pminnm.h"
131182
#include "neon/qabs.h"
132183
#include "neon/qadd.h"
184+
#include "neon/qdmlal.h"
185+
#include "neon/qdmlal_high.h"
186+
#include "neon/qdmlal_high_lane.h"
187+
#include "neon/qdmlal_high_n.h"
188+
#include "neon/qdmlal_lane.h"
189+
#include "neon/qdmlal_n.h"
190+
#include "neon/qdmlsl.h"
191+
#include "neon/qdmlsl_high.h"
192+
#include "neon/qdmlsl_high_lane.h"
193+
#include "neon/qdmlsl_high_n.h"
194+
#include "neon/qdmlsl_lane.h"
195+
#include "neon/qdmlsl_n.h"
133196
#include "neon/qdmulh.h"
134197
#include "neon/qdmulh_lane.h"
135198
#include "neon/qdmulh_n.h"
136199
#include "neon/qdmull.h"
200+
#include "neon/qdmull_high.h"
201+
#include "neon/qdmull_high_lane.h"
202+
#include "neon/qdmull_high_n.h"
203+
#include "neon/qdmull_lane.h"
204+
#include "neon/qdmull_n.h"
205+
#include "neon/qrdmlah.h"
206+
#include "neon/qrdmlah_lane.h"
207+
#include "neon/qrdmlsh.h"
208+
#include "neon/qrdmlsh_lane.h"
137209
#include "neon/qrdmulh.h"
138210
#include "neon/qrdmulh_lane.h"
139211
#include "neon/qrdmulh_n.h"
212+
#include "neon/qrshl.h"
213+
#include "neon/qrshrn_high_n.h"
140214
#include "neon/qrshrn_n.h"
215+
#include "neon/qrshrun_high_n.h"
141216
#include "neon/qrshrun_n.h"
142217
#include "neon/qmovn.h"
143-
#include "neon/qmovun.h"
144218
#include "neon/qmovn_high.h"
219+
#include "neon/qmovun.h"
220+
#include "neon/qmovun_high.h"
145221
#include "neon/qneg.h"
146222
#include "neon/qsub.h"
147223
#include "neon/qshl.h"
224+
#include "neon/qshl_n.h"
148225
#include "neon/qshlu_n.h"
226+
#include "neon/qshrn_high_n.h"
149227
#include "neon/qshrn_n.h"
228+
#include "neon/qshrun_high_n.h"
150229
#include "neon/qshrun_n.h"
151230
#include "neon/qtbl.h"
152231
#include "neon/qtbx.h"
232+
#include "neon/raddhn.h"
233+
#include "neon/raddhn_high.h"
234+
#include "neon/rax.h"
153235
#include "neon/rbit.h"
154236
#include "neon/recpe.h"
155237
#include "neon/recps.h"
238+
#include "neon/recpx.h"
156239
#include "neon/reinterpret.h"
157240
#include "neon/rev16.h"
158241
#include "neon/rev32.h"
159242
#include "neon/rev64.h"
160243
#include "neon/rhadd.h"
161244
#include "neon/rnd.h"
245+
#include "neon/rnd32x.h"
246+
#include "neon/rnd32z.h"
247+
#include "neon/rnd64x.h"
248+
#include "neon/rnd64z.h"
249+
#include "neon/rnda.h"
162250
#include "neon/rndm.h"
163251
#include "neon/rndi.h"
164252
#include "neon/rndn.h"
165253
#include "neon/rndp.h"
254+
#include "neon/rndx.h"
166255
#include "neon/rshl.h"
167256
#include "neon/rshr_n.h"
257+
#include "neon/rshrn_high_n.h"
168258
#include "neon/rshrn_n.h"
169259
#include "neon/rsqrte.h"
170260
#include "neon/rsqrts.h"
171261
#include "neon/rsra_n.h"
262+
#include "neon/rsubhn.h"
263+
#include "neon/rsubhn_high.h"
172264
#include "neon/set_lane.h"
265+
#include "neon/sha1.h"
266+
#include "neon/sha256.h"
267+
#include "neon/sha512.h"
173268
#include "neon/shl.h"
174269
#include "neon/shl_n.h"
270+
#include "neon/shll_high_n.h"
175271
#include "neon/shll_n.h"
176272
#include "neon/shr_n.h"
273+
#include "neon/shrn_high_n.h"
177274
#include "neon/shrn_n.h"
275+
#include "neon/sli_n.h"
276+
#include "neon/sm3.h"
277+
#include "neon/sm4.h"
178278
#include "neon/sqadd.h"
279+
#include "neon/sqrt.h"
179280
#include "neon/sra_n.h"
180281
#include "neon/sri_n.h"
181282
#include "neon/st1.h"
182283
#include "neon/st1_lane.h"
284+
#include "neon/st1_x2.h"
285+
#include "neon/st1_x3.h"
286+
#include "neon/st1_x4.h"
287+
#include "neon/st1q_x2.h"
288+
#include "neon/st1q_x3.h"
289+
#include "neon/st1q_x4.h"
183290
#include "neon/st2.h"
184291
#include "neon/st2_lane.h"
185292
#include "neon/st3.h"
@@ -188,17 +295,21 @@
188295
#include "neon/st4_lane.h"
189296
#include "neon/sub.h"
190297
#include "neon/subhn.h"
298+
#include "neon/subhn_high.h"
191299
#include "neon/subl.h"
192300
#include "neon/subl_high.h"
193301
#include "neon/subw.h"
194302
#include "neon/subw_high.h"
303+
#include "neon/sudot_lane.h"
195304
#include "neon/tbl.h"
196305
#include "neon/tbx.h"
197306
#include "neon/trn.h"
198307
#include "neon/trn1.h"
199308
#include "neon/trn2.h"
200309
#include "neon/tst.h"
201310
#include "neon/uqadd.h"
311+
#include "neon/usdot.h"
312+
#include "neon/usdot_lane.h"
202313
#include "neon/uzp.h"
203314
#include "neon/uzp1.h"
204315
#include "neon/uzp2.h"

0 commit comments

Comments
 (0)