feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60
feat: add POWER ISA 3.0 (VSX/Altivec) SIMD support for base64-simd#60runlevel5 wants to merge 10 commits intoNugine:mainfrom
Conversation
Add VSX backend to vsimd and wire it up in base64-simd for SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+). vsimd changes: - VSX ISA type with runtime feature detection - V64/V128/V256 vector types using vector_unsigned_char - All unified ops (splat, add, sub, eq, lt, sat, max, min, bitwise) with scalar fallbacks for u64 (Altivec limitation) - SIMD128 trait: load/store (vec_xl/vec_xst), swizzle (vec_perm), shifts, multiply (vec_mladd/vec_mul), zip/unzip (vec_mergeh/mergel + vec_perm), avgr (vec_avg), bsl, bswap, all_zero/any_zero - SIMD256: split-to-2x128 path for all 256-bit ops - Mask/highbit operations following NEON arm32 pattern - Table lookup with correct high-bit-only (0x80) OOB masking - dispatch! macro arms for compile/resolve static+dynamic - Native runtime detection and dispatch base64-simd changes: - VSX added to encode (split_bits) and decode (merge_bits) using the NEON/WASM128 shift-and-mask path - All 4 dispatch entries (encode/decode/check/find_non_ascii_whitespace) - powerpc_target_feature nightly feature gate Gated behind feature="unstable" + target_arch="powerpc64". Tested on POWER9 (ppc64le) with nightly-2026-02-07.
|
I am setting this to draft for the time-being. One thing I want to work on first is to set up benchmarking to be 100% we do not cause any regression. |
Restructure the VSX u8x16xn_lookup to reduce instruction count: - Before: extract high bit, extract low nibble, compare, create OOB mask, combine index, vec_perm (6 ops per V128 lookup) - After: extract low nibble, vec_perm, signed compare, andnot (4 ops per V128 lookup) This eliminates 16 vector ops per decode iteration (4 lookups x 2 halves), fixing the decode regression and improving all ALSW-based operations: - decode: 1662 -> 2111 MB/s (+27%, now 1.15x over scalar) - check: 3087 -> 4453 MB/s (+44%, now 1.69x over scalar) - encode: unchanged (does not use table lookup)
There was a problem hiding this comment.
Pull request overview
Adds a new POWER/ppc64le VSX (Altivec) SIMD backend to vsimd and wires it into base64-simd’s multiversion dispatch so base64 encode/decode/check/non-ascii scanning can use VSX when available (behind feature="unstable" on powerpc64).
Changes:
- Introduce
VSXas a new instruction set with runtime detection and dispatch integration (isa,native,macros). - Implement VSX vector types and a broad set of unified/SIMD128/SIMD256 operations using
core::arch::powerpc64::*. - Enable
base64-simdencode/decode paths to use VSX via existing NEON/WASM-style shift+mask logic and addvsxto dispatch target lists.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/vsimd/src/vector.rs | Adds ppc64 VSX-backed V64/V128/V256/V512 vector representations. |
| crates/vsimd/src/unified.rs | Implements VSX branches for unified ops (splat/add/sub/eq/lt/sat/max/min/bitwise). |
| crates/vsimd/src/table.rs | Adds VSX-aware lookup semantics to match the crate’s table-lookup contract. |
| crates/vsimd/src/simd256.rs | Includes VSX in relevant SIMD256 paths and provides VSX-specific unzip behavior. |
| crates/vsimd/src/simd128.rs | Implements many VSX SIMD128 operations (load/store, swizzle, bswap helpers, zip/unzip, avgr, etc.). |
| crates/vsimd/src/native.rs | Adds runtime feature detection + dispatch for VSX. |
| crates/vsimd/src/mask.rs | Extends mask/highbit helpers to work with VSX. |
| crates/vsimd/src/macros.rs | Adds dispatch! compile/resolve support for the "vsx" target. |
| crates/vsimd/src/lib.rs | Adds nightly feature gates required for ppc64 stdarch intrinsics; adjusts portable_simd gating. |
| crates/vsimd/src/isa.rs | Defines VSX ISA type id, enables detection plumbing, and implements InstructionSet for VSX. |
| crates/base64-simd/src/multiversion.rs | Adds "vsx" to multiversion dispatch target lists for encode/decode/check/scan. |
| crates/base64-simd/src/lib.rs | Enables powerpc_target_feature for ppc64 under unstable. |
| crates/base64-simd/src/encode.rs | Treats VSX like NEON/WASM128 for the shift+mask encode implementation. |
| crates/base64-simd/src/decode.rs | Treats VSX like NEON/WASM128 for the shift+mask decode implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…dices vec_perm masks indices with & 0x1f, so sentinel values like 0x80 wrap to 0-15 and select bytes from the data vector instead of producing zero. Preprocess the index vector so any lane >= 16 maps to index 16, which selects from the zero vector passed as vec_perm's second argument. Add test covering identity, reverse, boundary (16), sentinel (0x80, 0xFF), mixed valid/OOB, and varied OOB values.
SSSE3 _mm_shuffle_epi8 uses idx & 0x0f for indices 16-127 and only zeroes on bit 7, while NEON/WASM zero for any index >= 16. Remove the index-16 boundary test and restrict out-of-range tests to values with bit 7 set (>= 128), which is the common contract all backends agree on and the only range used by callers in this codebase.
Nugine
left a comment
There was a problem hiding this comment.
The CI test failed. I'm not familiar with the PowerPC architecture.
Could you take a look?
Sure |
|
@Nugine coud you please also apply for a native ppc64le CI runner by lodging a request at https://github.com/IBM/actionspz? |
- Split stdarch_powerpc_feature_detection feature gate to require feature="detect" (E0635: unknown feature in no_std builds) - Replace vec_cmpge with vec_cmpgt for vector_unsigned_char comparison (vec_cmpge only supports vector_float) - Add powerpc64le-unknown-linux-gnu to CI test matrix and QEMU setup - Add powerpc mode to testgen.py with VSX rustflags
Add VSX backend to vsimd and wire it up in base64-simd for SIMD-accelerated base64 encode/decode/check on ppc64le (POWER7+).
vsimd changes:
vector_unsigned_charsplat, add, sub, eq, lt, sat, max, min, bitwise) with scalar fallbacks for u64 (Altivec limitation)load/store(vec_xl/vec_xst),swizzle(vec_perm),shifts,multiply(vec_mladd/vec_mul),zip/unzip(vec_mergeh/mergel+vec_perm),avgr(vec_avg),bsl,bswap,all_zero/any_zerobase64-simd changes:
Gated behind
feature="unstable" + target_arch="powerpc64". Tested on POWER9 (ppc64le) with nightly-2026-02-07.What's next?
Get feedbacks and doing the VSX work for other crates