You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This adds guidance table for precision flag selection across common
domains:
- Graphics/Rendering (Blender reference)
- DSP/Audio, Cryptography
- Scientific/Numerical, Game Physics, Machine Learning
Copy file name to clipboardExpand all lines: README.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -125,6 +125,30 @@ Define these macros as `1` before including `sse2neon.h` to enable precise (but
125
125
126
126
All precision flags are disabled by default to maximize performance.
127
127
128
+
### Recommended Flag Combinations by Use Case
129
+
130
+
| Use Case | Flags | Rationale |
131
+
|----------|-------|-----------|
132
+
| Graphics/Rendering |`MINMAX`, `SQRT` (+`DIV` if using `rcp`) | NaN handling and normalization; [Blender](https://github.com/blender/blender/blob/main/source/blender/blenlib/BLI_simd.hh) enables all three |
| Scientific/Numerical |`MINMAX`, `SQRT`, `DP`| Reduces x86 divergence; see caveats below |
136
+
| Game Physics |`MINMAX`| Prevents NaN propagation in collision detection |
137
+
| Machine Learning |`MINMAX` for inference | NaN handling for determinism; training tolerates defaults |
138
+
139
+
Flags use `SSE2NEON_PRECISE_` prefix (e.g., `MINMAX` → `SSE2NEON_PRECISE_MINMAX`).
140
+
141
+
**Architecture notes**:
142
+
-`DIV` flag affects `_mm_rcp*` (reciprocal approximation), not `_mm_div*` which uses native IEEE-754 division on ARMv8. Enable `DIV` on ARMv7 or when using reciprocal intrinsics.
143
+
- For strict determinism, also define `SSE2NEON_UNDEFINED_ZERO=1`. Some divergences (FTZ/DAZ, NaN payloads) cannot be fully eliminated.
144
+
145
+
Example configuration for graphics applications:
146
+
```c
147
+
#defineSSE2NEON_PRECISE_MINMAX 1
148
+
#define SSE2NEON_PRECISE_SQRT 1
149
+
#include "sse2neon.h"
150
+
```
151
+
128
152
### Memory Allocation
129
153
130
154
Memory from `_mm_malloc()` must be freed with `_mm_free()`, not `free()`. Mixing allocators causes heap corruption on Windows.
0 commit comments