Skip to content

Commit 8d1d9f1

Browse files
committed
Document recommended precision flag by use case
This adds guidance table for precision flag selection across common domains: - Graphics/Rendering (Blender reference) - DSP/Audio, Cryptography - Scientific/Numerical, Game Physics, Machine Learning
1 parent 454b8ef commit 8d1d9f1

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,30 @@ Define these macros as `1` before including `sse2neon.h` to enable precise (but
125125

126126
All precision flags are disabled by default to maximize performance.
127127

128+
### Recommended Flag Combinations by Use Case
129+
130+
| Use Case | Flags | Rationale |
131+
|----------|-------|-----------|
132+
| Graphics/Rendering | `MINMAX`, `SQRT` (+`DIV` if using `rcp`) | NaN handling and normalization; [Blender](https://github.com/blender/blender/blob/main/source/blender/blenlib/BLI_simd.hh) enables all three |
133+
| DSP/Audio | None (defaults) | Throughput over precision; inaudible differences |
134+
| Cryptography | None (defaults) | Integer-focused; FP precision irrelevant |
135+
| Scientific/Numerical | `MINMAX`, `SQRT`, `DP` | Reduces x86 divergence; see caveats below |
136+
| Game Physics | `MINMAX` | Prevents NaN propagation in collision detection |
137+
| Machine Learning | `MINMAX` for inference | NaN handling for determinism; training tolerates defaults |
138+
139+
Flags use `SSE2NEON_PRECISE_` prefix (e.g., `MINMAX``SSE2NEON_PRECISE_MINMAX`).
140+
141+
**Architecture notes**:
142+
- `DIV` flag affects `_mm_rcp*` (reciprocal approximation), not `_mm_div*` which uses native IEEE-754 division on ARMv8. Enable `DIV` on ARMv7 or when using reciprocal intrinsics.
143+
- For strict determinism, also define `SSE2NEON_UNDEFINED_ZERO=1`. Some divergences (FTZ/DAZ, NaN payloads) cannot be fully eliminated.
144+
145+
Example configuration for graphics applications:
146+
```c
147+
#define SSE2NEON_PRECISE_MINMAX 1
148+
#define SSE2NEON_PRECISE_SQRT 1
149+
#include "sse2neon.h"
150+
```
151+
128152
### Memory Allocation
129153
130154
Memory from `_mm_malloc()` must be freed with `_mm_free()`, not `free()`. Mixing allocators causes heap corruption on Windows.

0 commit comments

Comments
 (0)