This directory contains a lightweight C++ wrapper over Google's Highway SIMD library, designed specifically for NumPy's needs.
Note: This directory also contains the C interface of universal intrinsics (under
simd.h) which is no longer supported. The Highway wrapper described in this document should be used instead for all new SIMD code.
The wrapper simplifies Highway's SIMD interface by eliminating class tags and using lane types directly, which can be deduced from arguments in most cases. This design makes the SIMD code more intuitive and easier to maintain while still leveraging Highway generic intrinsics.
The wrapper consists of two main headers:
simd.hpp: The main header that defines namespaces and includes configuration macrossimd.inc.hpp: Implementation details included bysimd.hppmultiple times for different namespaces
Additionally, this directory contains legacy C interface files for universal intrinsics (simd.h and related files) which are deprecated and should not be used for new code. All new SIMD code should use the Highway wrapper.
#include "simd/simd.hpp"
// Use np::simd for maximum width SIMD operations
using namespace np::simd;
float *data = /* ... */;
Vec<float> v = LoadU(data);
v = Add(v, v);
StoreU(v, data);
// Use np::simd128 for fixed 128-bit SIMD operations
using namespace np::simd128;
Vec<float> v128 = LoadU(data);
v128 = Add(v128, v128);
StoreU(v128, data);#include "simd/simd.hpp"
// Check if SIMD is enabled
#if NPY_SIMDX
// SIMD code
#else
// Scalar fallback code
#endif
// Check for float64 support
#if NPY_SIMDX_F64
// Use float64 SIMD operations
#endif
// Check for FMA support
#if NPY_SIMDX_FMA
// Use FMA operations
#endifThe wrapper provides type constraints to help with SFINAE (Substitution Failure Is Not An Error) and compile-time type checking:
-
kSupportLane<TLane>: Determines whether the specified lane type is supported by the SIMD extension.// Base template - always defined, even when SIMD is not enabled (for SFINAE) template <typename TLane> constexpr bool kSupportLane = NPY_SIMDX != 0; template <> constexpr bool kSupportLane<double> = NPY_SIMDX_F64 != 0;
-
kMaxLanes<TLane>: Maximum number of lanes supported by the SIMD extension for the specified lane type.template <typename TLane> constexpr size_t kMaxLanes = HWY_MAX_LANES_D(_Tag<TLane>);
#include "simd/simd.hpp"
// Check if float64 operations are supported
if constexpr (np::simd::kSupportLane<double>) {
// Use float64 operations
}These constraints allow for compile-time checking of which lane types are supported, which can be used in SFINAE contexts to enable or disable functions based on type support.
The wrapper provides the following common operations that are used in NumPy:
-
Vector creation operations:
Zero: Returns a vector with all lanes set to zeroSet: Returns a vector with all lanes set to the given valueUndefined: Returns an uninitialized vector
-
Memory operations:
LoadU: Unaligned load of a vector from memoryStoreU: Unaligned store of a vector to memory
-
Vector information:
Lanes: Returns the number of vector lanes based on the lane type
-
Type conversion:
BitCast: Reinterprets a vector to a different type without modifying the underlying dataVecFromMask: Converts a mask to a vector
-
Comparison operations:
Eq: Element-wise equality comparisonLe: Element-wise less than or equal comparisonLt: Element-wise less than comparisonGt: Element-wise greater than comparisonGe: Element-wise greater than or equal comparison
-
Arithmetic operations:
Add: Element-wise additionSub: Element-wise subtractionMul: Element-wise multiplicationDiv: Element-wise divisionMin: Element-wise minimumMax: Element-wise maximumAbs: Element-wise absolute valueSqrt: Element-wise square root
-
Logical operations:
And: Bitwise ANDOr: Bitwise ORXor: Bitwise XORAndNot: Bitwise AND NOT (a & ~b)
Additional Highway operations can be accessed via the hn namespace alias inside the simd or simd128 namespaces.
To add more operations from Highway:
-
Import them in the
simd.inc.hppfile using theusingdirective if they don't require a tag:// For operations that don't require a tag using hn::FunctionName;
-
Define wrapper functions for intrinsics that require a class tag:
// For operations that require a tag template <typename TLane> HWY_API ReturnType FunctionName(Args... args) { return hn::FunctionName(_Tag<TLane>(), args...); }
-
Add appropriate documentation and SFINAE constraints if needed
The SIMD wrapper automatically disables SIMD operations when optimizations are disabled:
- When
NPY_DISABLE_OPTIMIZATIONis defined, SIMD operations are disabled - SIMD is enabled only when the Highway target is not scalar (
HWY_TARGET != HWY_SCALAR)
-
Why avoid Highway scalar operations?
- NumPy already provides kernels for scalar operations
- Compilers can better optimize standard library implementations
- Not all Highway intrinsics are fully supported in scalar mode
-
Legacy Universal Intrinsics
- The older universal intrinsics C interface (in
simd.hand accessible viaNPY_SIMDmacros) is deprecated - All new SIMD code should use this Highway-based wrapper (accessible via
NPY_SIMDXmacros) - The legacy code is maintained for compatibility but will eventually be removed
- The older universal intrinsics C interface (in
-
Feature Detection Constants vs. Highway Constants
- NumPy-specific constants (
NPY_SIMDX_F16,NPY_SIMDX_F64,NPY_SIMDX_FMA) provide additional safety beyond raw Highway constants - Highway constants (e.g.,
HWY_HAVE_FLOAT16) only check platform capabilities but don't consider NumPy's build configuration - Our constants combine both checks:
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
- This ensures SIMD features won't be used when:
- Platform supports it but NumPy optimization is disabled via meson option:
option('disable-optimization', type: 'boolean', value: false, description: 'Disable CPU optimized code (dispatch,simd,unroll...)') - Highway target is scalar (
HWY_TARGET == HWY_SCALAR)
- Platform supports it but NumPy optimization is disabled via meson option:
- Using these constants ensures consistent behavior across different compilation settings
- Without this additional layer, code might incorrectly try to use SIMD paths in scalar mode
- NumPy-specific constants (
-
Namespace Design
np::simd: Maximum width SIMD operations (scalable)np::simd128: Fixed 128-bit SIMD operationshn: Highway namespace alias (available within the SIMD namespaces)
-
Why Namespaces and Why Not Just Use Highway Directly?
-
Highway's design uses class tag types as template parameters (e.g.,
Vec<ScalableTag<float>>) when defining vector types -
Many Highway functions require explicitly passing a tag instance as the first parameter
-
This class tag-based approach increases verbosity and complexity in user code
-
Our wrapper eliminates this by internally managing tags through namespaces, letting users directly use types e.g.
Vec<float> -
Simple example with raw Highway:
// Highway's approach float *data = /* ... */; namespace hn = hwy::HWY_NAMESPACE; using namespace hn; // Full-width operations ScalableTag<float> df; // Create a tag instance Vec<decltype(df)> v = LoadU(df, data); // LoadU requires a tag instance StoreU(v, df, data); // StoreU requires a tag instance // 128-bit operations Full128<float> df128; // Create a 128-bit tag instance Vec<decltype(df128)> v128 = LoadU(df128, data); // LoadU requires a tag instance StoreU(v128, df128, data); // StoreU requires a tag instance
-
Simple example with our wrapper:
// Our wrapper approach float *data = /* ... */; // Full-width operations using namespace np::simd; Vec<float> v = LoadU(data); // Full-width vector load StoreU(v, data); // 128-bit operations using namespace np::simd128; Vec<float> v128 = LoadU(data); // 128-bit vector load StoreU(v128, data);
-
The namespaced approach simplifies code, reduces errors, and provides a more intuitive interface
-
It preserves all Highway operations benefits while reducing cognitive overhead
-
-
Why Namespaces Are Essential for This Design?
- Namespaces allow us to define different internal tag types (
hn::ScalableTag<TLane>innp::simdvshn::Full128<TLane>innp::simd128) - This provides a consistent type-based interface (
Vec<float>) without requiring users to manually create tags - Enables using the same function names (like
LoadU) with different implementations based on SIMD width - Without namespaces, we'd have to either reintroduce tags (defeating the purpose of the wrapper) or create different function names for each variant (e.g.,
LoadUvsLoadU128)
- Namespaces allow us to define different internal tag types (
-
Template Type Parameters
TLane: The scalar type for each vector lane (e.g., uint8_t, float, double)
- C++17 or later
- Google Highway library
Same as NumPy's license