Skip to content

Commit a8c06aa

Browse files
ENH: Initial implementation of Highway wrapper (#12)
* ENH, SIMD: Initial implementation of Highway wrapper A thin wrapper over Google's Highway SIMD library to simplify its interface. This commit provides the implementation of that wrapper, consisting of: - simd.hpp: Main header defining the SIMD namespaces and configuration - simd.inc.hpp: Template header included multiple times with different namespaces The wrapper eliminates Highway's class tags by: - Using lane types directly which can be deduced from arguments - Leveraging namespaces (np::simd and np::simd128) for different register widths A README is included to guide usage and document design decisions. * SIMD: Update wrapper with improved docs and type support - Fix hardware/platform terminology in documentation for clarity - Add support for long double in template specializations - Add kMaxLanes constant to expose maximum vector width information - Follows clang formatting style for consistency with NumPy codebase. * SIMD: Improve isolation and constexpr handling in wrapper - Add anonymous namespace around implementation to ensure each translation unit gets its own constants based on local flags - Use HWY_LANES_CONSTEXPR for Lanes function to ensure proper constexpr evaluation across platforms * Update Highway submodule to latest master * SIMD: Fix compile error by using MaxLanes instead of Lanes for array size Replace hn::Lanes(f64) with hn::MaxLanes(f64) when defining the index array size to fix error C2131: "expression did not evaluate to a constant". This error occurs because Lanes() isn't always constexpr compatible, especially with scalable vector extensions. MaxLanes() provides a compile-time constant value suitable for static array allocation and should be used with non-scalable SIMD extensions when defining fixed-size arrays. --------- Co-authored-by: Sayed Adel <seiko@imavr.com>
1 parent c458e69 commit a8c06aa

File tree

6 files changed

+480
-3
lines changed

6 files changed

+480
-3
lines changed
Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
# NumPy SIMD Wrapper for Highway
2+
3+
This directory contains a lightweight C++ wrapper over Google's [Highway](https://github.com/google/highway) SIMD library, designed specifically for NumPy's needs.
4+
5+
> **Note**: This directory also contains the C interface of universal intrinsics (under `simd.h`) which is no longer supported. The Highway wrapper described in this document should be used instead for all new SIMD code.
6+
7+
## Overview
8+
9+
The wrapper simplifies Highway's SIMD interface by eliminating class tags and using lane types directly, which can be deduced from arguments in most cases. This design makes the SIMD code more intuitive and easier to maintain while still leveraging Highway generic intrinsics.
10+
11+
## Architecture
12+
13+
The wrapper consists of two main headers:
14+
15+
1. `simd.hpp`: The main header that defines namespaces and includes configuration macros
16+
2. `simd.inc.hpp`: Implementation details included by `simd.hpp` multiple times for different namespaces
17+
18+
Additionally, this directory contains legacy C interface files for universal intrinsics (`simd.h` and related files) which are deprecated and should not be used for new code. All new SIMD code should use the Highway wrapper.
19+
20+
21+
## Usage
22+
23+
### Basic Usage
24+
25+
```cpp
26+
#include "simd/simd.hpp"
27+
28+
// Use np::simd for maximum width SIMD operations
29+
using namespace np::simd;
30+
float *data = /* ... */;
31+
Vec<float> v = LoadU(data);
32+
v = Add(v, v);
33+
StoreU(v, data);
34+
35+
// Use np::simd128 for fixed 128-bit SIMD operations
36+
using namespace np::simd128;
37+
Vec<float> v128 = LoadU(data);
38+
v128 = Add(v128, v128);
39+
StoreU(v128, data);
40+
```
41+
42+
### Checking for SIMD Support
43+
44+
```cpp
45+
#include "simd/simd.hpp"
46+
47+
// Check if SIMD is enabled
48+
#if NPY_SIMDX
49+
// SIMD code
50+
#else
51+
// Scalar fallback code
52+
#endif
53+
54+
// Check for float64 support
55+
#if NPY_SIMDX_F64
56+
// Use float64 SIMD operations
57+
#endif
58+
59+
// Check for FMA support
60+
#if NPY_SIMDX_FMA
61+
// Use FMA operations
62+
#endif
63+
```
64+
65+
## Type Support and Constraints
66+
67+
The wrapper provides type constraints to help with SFINAE (Substitution Failure Is Not An Error) and compile-time type checking:
68+
69+
- `kSupportLane<TLane>`: Determines whether the specified lane type is supported by the SIMD extension.
70+
```cpp
71+
// Base template - always defined, even when SIMD is not enabled (for SFINAE)
72+
template <typename TLane>
73+
constexpr bool kSupportLane = NPY_SIMDX != 0;
74+
template <>
75+
constexpr bool kSupportLane<double> = NPY_SIMDX_F64 != 0;
76+
```
77+
78+
- `kMaxLanes<TLane>`: Maximum number of lanes supported by the SIMD extension for the specified lane type.
79+
```cpp
80+
template <typename TLane>
81+
constexpr size_t kMaxLanes = HWY_MAX_LANES_D(_Tag<TLane>);
82+
```
83+
84+
```cpp
85+
#include "simd/simd.hpp"
86+
87+
// Check if float64 operations are supported
88+
if constexpr (np::simd::kSupportLane<double>) {
89+
// Use float64 operations
90+
}
91+
```
92+
93+
These constraints allow for compile-time checking of which lane types are supported, which can be used in SFINAE contexts to enable or disable functions based on type support.
94+
95+
## Available Operations
96+
97+
The wrapper provides the following common operations that are used in NumPy:
98+
99+
- Vector creation operations:
100+
- `Zero`: Returns a vector with all lanes set to zero
101+
- `Set`: Returns a vector with all lanes set to the given value
102+
- `Undefined`: Returns an uninitialized vector
103+
104+
- Memory operations:
105+
- `LoadU`: Unaligned load of a vector from memory
106+
- `StoreU`: Unaligned store of a vector to memory
107+
108+
- Vector information:
109+
- `Lanes`: Returns the number of vector lanes based on the lane type
110+
111+
- Type conversion:
112+
- `BitCast`: Reinterprets a vector to a different type without modifying the underlying data
113+
- `VecFromMask`: Converts a mask to a vector
114+
115+
- Comparison operations:
116+
- `Eq`: Element-wise equality comparison
117+
- `Le`: Element-wise less than or equal comparison
118+
- `Lt`: Element-wise less than comparison
119+
- `Gt`: Element-wise greater than comparison
120+
- `Ge`: Element-wise greater than or equal comparison
121+
122+
- Arithmetic operations:
123+
- `Add`: Element-wise addition
124+
- `Sub`: Element-wise subtraction
125+
- `Mul`: Element-wise multiplication
126+
- `Div`: Element-wise division
127+
- `Min`: Element-wise minimum
128+
- `Max`: Element-wise maximum
129+
- `Abs`: Element-wise absolute value
130+
- `Sqrt`: Element-wise square root
131+
132+
- Logical operations:
133+
- `And`: Bitwise AND
134+
- `Or`: Bitwise OR
135+
- `Xor`: Bitwise XOR
136+
- `AndNot`: Bitwise AND NOT (a & ~b)
137+
138+
Additional Highway operations can be accessed via the `hn` namespace alias inside the `simd` or `simd128` namespaces.
139+
140+
## Extending
141+
142+
To add more operations from Highway:
143+
144+
1. Import them in the `simd.inc.hpp` file using the `using` directive if they don't require a tag:
145+
```cpp
146+
// For operations that don't require a tag
147+
using hn::FunctionName;
148+
```
149+
150+
2. Define wrapper functions for intrinsics that require a class tag:
151+
```cpp
152+
// For operations that require a tag
153+
template <typename TLane>
154+
HWY_API ReturnType FunctionName(Args... args) {
155+
return hn::FunctionName(_Tag<TLane>(), args...);
156+
}
157+
```
158+
159+
3. Add appropriate documentation and SFINAE constraints if needed
160+
161+
162+
## Build Configuration
163+
164+
The SIMD wrapper automatically disables SIMD operations when optimizations are disabled:
165+
166+
- When `NPY_DISABLE_OPTIMIZATION` is defined, SIMD operations are disabled
167+
- SIMD is enabled only when the Highway target is not scalar (`HWY_TARGET != HWY_SCALAR`)
168+
169+
## Design Notes
170+
171+
1. **Why avoid Highway scalar operations?**
172+
- NumPy already provides kernels for scalar operations
173+
- Compilers can better optimize standard library implementations
174+
- Not all Highway intrinsics are fully supported in scalar mode
175+
176+
2. **Legacy Universal Intrinsics**
177+
- The older universal intrinsics C interface (in `simd.h` and accessible via `NPY_SIMD` macros) is deprecated
178+
- All new SIMD code should use this Highway-based wrapper (accessible via `NPY_SIMDX` macros)
179+
- The legacy code is maintained for compatibility but will eventually be removed
180+
181+
3. **Feature Detection Constants vs. Highway Constants**
182+
- NumPy-specific constants (`NPY_SIMDX_F16`, `NPY_SIMDX_F64`, `NPY_SIMDX_FMA`) provide additional safety beyond raw Highway constants
183+
- Highway constants (e.g., `HWY_HAVE_FLOAT16`) only check platform capabilities but don't consider NumPy's build configuration
184+
- Our constants combine both checks:
185+
```cpp
186+
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
187+
```
188+
- This ensures SIMD features won't be used when:
189+
- Platform supports it but NumPy optimization is disabled via meson option:
190+
```
191+
option('disable-optimization', type: 'boolean', value: false,
192+
description: 'Disable CPU optimized code (dispatch,simd,unroll...)')
193+
```
194+
- Highway target is scalar (`HWY_TARGET == HWY_SCALAR`)
195+
- Using these constants ensures consistent behavior across different compilation settings
196+
- Without this additional layer, code might incorrectly try to use SIMD paths in scalar mode
197+
198+
4. **Namespace Design**
199+
- `np::simd`: Maximum width SIMD operations (scalable)
200+
- `np::simd128`: Fixed 128-bit SIMD operations
201+
- `hn`: Highway namespace alias (available within the SIMD namespaces)
202+
203+
5. **Why Namespaces and Why Not Just Use Highway Directly?**
204+
- Highway's design uses class tag types as template parameters (e.g., `Vec<ScalableTag<float>>`) when defining vector types
205+
- Many Highway functions require explicitly passing a tag instance as the first parameter
206+
- This class tag-based approach increases verbosity and complexity in user code
207+
- Our wrapper eliminates this by internally managing tags through namespaces, letting users directly use types e.g. `Vec<float>`
208+
- Simple example with raw Highway:
209+
```cpp
210+
// Highway's approach
211+
float *data = /* ... */;
212+
213+
namespace hn = hwy::HWY_NAMESPACE;
214+
using namespace hn;
215+
216+
// Full-width operations
217+
ScalableTag<float> df; // Create a tag instance
218+
Vec<decltype(df)> v = LoadU(df, data); // LoadU requires a tag instance
219+
StoreU(v, df, data); // StoreU requires a tag instance
220+
221+
// 128-bit operations
222+
Full128<float> df128; // Create a 128-bit tag instance
223+
Vec<decltype(df128)> v128 = LoadU(df128, data); // LoadU requires a tag instance
224+
StoreU(v128, df128, data); // StoreU requires a tag instance
225+
```
226+
227+
- Simple example with our wrapper:
228+
```cpp
229+
// Our wrapper approach
230+
float *data = /* ... */;
231+
232+
// Full-width operations
233+
using namespace np::simd;
234+
Vec<float> v = LoadU(data); // Full-width vector load
235+
StoreU(v, data);
236+
237+
// 128-bit operations
238+
using namespace np::simd128;
239+
Vec<float> v128 = LoadU(data); // 128-bit vector load
240+
StoreU(v128, data);
241+
```
242+
243+
- The namespaced approach simplifies code, reduces errors, and provides a more intuitive interface
244+
- It preserves all Highway operations benefits while reducing cognitive overhead
245+
246+
5. **Why Namespaces Are Essential for This Design?**
247+
- Namespaces allow us to define different internal tag types (`hn::ScalableTag<TLane>` in `np::simd` vs `hn::Full128<TLane>` in `np::simd128`)
248+
- This provides a consistent type-based interface (`Vec<float>`) without requiring users to manually create tags
249+
- Enables using the same function names (like `LoadU`) with different implementations based on SIMD width
250+
- Without namespaces, we'd have to either reintroduce tags (defeating the purpose of the wrapper) or create different function names for each variant (e.g., `LoadU` vs `LoadU128`)
251+
252+
6. **Template Type Parameters**
253+
- `TLane`: The scalar type for each vector lane (e.g., uint8_t, float, double)
254+
255+
256+
## Requirements
257+
258+
- C++17 or later
259+
- Google Highway library
260+
261+
## License
262+
263+
Same as NumPy's license
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
#ifndef NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
2+
#define NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
3+
4+
/**
5+
* This header provides a thin wrapper over Google's Highway SIMD library.
6+
*
7+
* The wrapper aims to simplify the SIMD interface of Google's Highway by
8+
* get ride of its class tags and use lane types directly which can be deduced
9+
* from the args in most cases.
10+
*/
11+
/**
12+
* Since `NPY_SIMD` is only limited to NumPy C universal intrinsics,
13+
* `NPY_SIMDX` is defined to indicate the SIMD availability for Google's Highway
14+
* C++ code.
15+
*
16+
* Highway SIMD is only available when optimization is enabled.
17+
* When NPY_DISABLE_OPTIMIZATION is defined, SIMD operations are disabled
18+
* and the code falls back to scalar implementations.
19+
*/
20+
#ifndef NPY_DISABLE_OPTIMIZATION
21+
#include <hwy/highway.h>
22+
23+
/**
24+
* We avoid using Highway scalar operations for the following reasons:
25+
* 1. We already provide kernels for scalar operations, so falling back to
26+
* the NumPy implementation is more appropriate. Compilers can often
27+
* optimize these better since they rely on standard libraries.
28+
* 2. Not all Highway intrinsics are fully supported in scalar mode.
29+
*
30+
* Therefore, we only enable SIMD when the Highway target is not scalar.
31+
*/
32+
#define NPY_SIMDX (HWY_TARGET != HWY_SCALAR)
33+
34+
// Indicates if the SIMD operations are available for float16.
35+
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
36+
// Note: Highway requires SIMD extentions with native float32 support, so we don't need
37+
// to check for it.
38+
39+
// Indicates if the SIMD operations are available for float64.
40+
#define NPY_SIMDX_F64 (NPY_SIMDX && HWY_HAVE_FLOAT64)
41+
42+
// Indicates if the SIMD floating operations are natively supports fma.
43+
#define NPY_SIMDX_FMA (NPY_SIMDX && HWY_NATIVE_FMA)
44+
45+
#else
46+
#define NPY_SIMDX 0
47+
#define NPY_SIMDX_F16 0
48+
#define NPY_SIMDX_F64 0
49+
#define NPY_SIMDX_FMA 0
50+
#endif
51+
52+
namespace np {
53+
54+
/// Represents the max SIMD width supported by the platform.
55+
namespace simd {
56+
#if NPY_SIMDX
57+
/// The highway namespace alias.
58+
/// We can not import all the symbols from the HWY_NAMESPACE because it will
59+
/// conflict with the existing symbols in the numpy namespace.
60+
namespace hn = hwy::HWY_NAMESPACE;
61+
// internaly used by the template header
62+
template <typename TLane>
63+
using _Tag = hn::ScalableTag<TLane>;
64+
#endif
65+
#include "simd.inc.hpp"
66+
} // namespace simd
67+
68+
/// Represents the 128-bit SIMD width.
69+
namespace simd128 {
70+
#if NPY_SIMDX
71+
namespace hn = hwy::HWY_NAMESPACE;
72+
template <typename TLane>
73+
using _Tag = hn::Full128<TLane>;
74+
#endif
75+
#include "simd.inc.hpp"
76+
} // namespace simd128
77+
78+
} // namespace np
79+
80+
#endif // NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_

0 commit comments

Comments
 (0)