Commit d7a409a
Add AVX-512BW path for
Adds a 64-byte-wide back-end (`_mm512_cmpgt_epi8_mask` + `_mm512_movm_epi8`)
to `utils::simd::ascii_lower`. The dispatcher only routes to it when
`avx512f`, `avx512bw`, and `avx512fp16` are all detected.
`avx512fp16` is used as a proxy for "AVX-512 without meaningful license-mode
downclock":
- present on Intel Sapphire Rapids / Emerald Rapids / Granite Rapids
- present on AMD Zen 4 (Ryzen 7000 / EPYC Genoa) and Zen 5 (Turin)
- absent on Skylake-X, Cascade Lake, Cooper Lake, Ice Lake-SP, Rocket
Lake — exactly the generations where 512-bit ops cause measurable
frequency throttling
Older AVX-512-capable hardware therefore stays on the AVX2 path, where
the 256-bit work is already memory-bandwidth-bound on long buffers.
Verified with `cargo check --target x86_64-unknown-linux-gnu` plus the
existing 207-test lib suite on aarch64. The AVX-512 path itself is
exercised at runtime only on capable hosts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>ascii_lower, gated on avx512fp16
1 parent 906ceca commit d7a409a
1 file changed
Lines changed: 41 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
15 | 27 | | |
16 | 28 | | |
17 | 29 | | |
| |||
38 | 50 | | |
39 | 51 | | |
40 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
41 | 82 | | |
42 | 83 | | |
43 | 84 | | |
| |||
0 commit comments