Commit 248b6a2
optimization: peel align-head and unroll body to 64 bytes
Benchmarks indicated that obfuscating multiple bytes already gives an order of magnitude speed-up, but:
* GCC still emitted scalar code;
* Clang’s auto-vectorized loop ran on the slow unaligned-load path.
Fix contains:
* peeling the misaligned head enabled the hot loop starting at an 8-byte address;
* `std::assume_aligned<8>` tells the optimizer the promise holds - required to keep Apple Clang happy;
* manually unrolling the body to 64 bytes enabled GCC to auto-vectorize.
Note that `target.size() > KEY_SIZE` condition is just an optimization, the aligned and unaligned loops work without it as well - it's why the alignment calculation still contains `std::min`.
> C++ compiler .......................... GNU 14.2.0
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.03 | 32,464,658,919.11 | 0.0% | 0.50 | 0.11 | 4.474 | 0.08 | 0.0% | 5.29 | `ObfuscationBench`
> C++ compiler .......................... Clang 20.1.7
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.02 | 41,231,547,045.17 | 0.0% | 0.30 | 0.09 | 3.463 | 0.02 | 0.0% | 5.47 | `ObfuscationBench`
Co-authored-by: Hodlinator <[email protected]>1 parent e7114fc commit 248b6a2
1 file changed
+21
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | | - | |
37 | | - | |
38 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
39 | 57 | | |
40 | 58 | | |
41 | 59 | | |
| |||
0 commit comments