Commit ccf3a19
committed
feat[fastlanes]: add AVX-512 VBMI transpose with 7.5x speedup
Add AVX-512 VBMI optimized transpose implementation using vpermi2b/vpermb
for vectorized gather and scatter operations.
Performance improvements:
- VBMI: 13.6 cycles/call (7.5x faster than avx512_gfni at 102.6 cycles)
- VBMI: 240x faster than baseline (3276 cycles)
Key optimizations:
- Use vpermi2b to gather 8 bytes at stride-16 in parallel
- Use vpermb for 8x8 byte transpose during scatter phase
- Static permutation tables to avoid stack allocation
Also adds:
- Dual-block transpose_1024x2_avx512 for batch processing
- VBMI detection via has_vbmi() function
- Updated dispatch to prefer VBMI when available
Signed-off-by: Claude <noreply@anthropic.com>1 parent 7282427 commit ccf3a19
File tree
2 files changed
+645
-0
lines changed- encodings/fastlanes
- examples
- src/transpose
2 files changed
+645
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
122 | 149 | | |
123 | 150 | | |
124 | 151 | | |
| |||
198 | 225 | | |
199 | 226 | | |
200 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
201 | 264 | | |
202 | 265 | | |
203 | 266 | | |
| |||
0 commit comments