Commit 7a38744
### Rationale for this change
Refactor the unpack function code generator and dispatcher to accommodate more use cases:
- ~`uint16_t`~ and `uint64_t` new sizes
- ~A *dispatch once* function returning a function pointer to the correct bit width~
*Update 2025-10-13*: the dispatch once and `uint16_t` implementation were removed as they turned out to be slower.
### What changes are included in this PR?
The diff are hard to look at but the important files to look at are:
- The two python files for code generation accommodate new sizes (~with the exception of `uint16` on Avx512 for which the algorithms assumptions break~);
- The two code generators have a uniform structure for the "batch unpackers" they generate: each one of them is a specialization of a struct template `unpack29_32` > `Unpacker<uint32_t, 29>::unpack`
- Using specialization instead of hard coded number in function names makes it possible to use them in more generic code
- Wrapping the functions in a struct makes it possible to carry information along the function (such as the number of values that said function unpacks) and to leave the door open for partial specialization for future improvements.
- The public functions in `bpacking_internal.h` are also template specialization `unpack32` -> `unpack<uint32_t>`
- The large `switch` statements and for loops used to dispatch the bit width to its appropriate implementation are now all generic with a `constexpr` generated jump table. The logic is in `bpacking_dispatch_internal.h`
From a performance perspective:
- there is no improvement to the individual "batch unpacker" generated
- The SIMD code is actually doing worst that scalar on SSE4.2 OR `uint16_t`
- there are new sizes that can bring improvements
- `unpack<uint64_t>` has an SIMD implementation that should benefit `DeltaBitPackDecoder`
- ~Novel `unpack<uint16_t>` should benefit the level decoders~
- ~The dispatch once mechanism should benefit all repeated calls to unpack functions (still need mixing with dynamic dispatch, but see `get_unpack_fn` for the building block).~
*Update 2025-10-13*:
- Added an unpack implementation for `uint8_t` and `uint16_t` that call the `uint32_t` version on a local buffer combined with a `static_cast` loop (what was done on the call site before).
- The performances should remain on par with previous implementations. The PR as it is now mainly changes the interface of unpack functions for future iterations and cleaner use.
### Are these changes tested?
Very much.
### Are there any user-facing changes?
* GitHub Issue: #47572
Lead-authored-by: AntoinePrv <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
1 parent fc5fd48 commit 7a38744
File tree
26 files changed
+58822
-14978
lines changed- cpp
- apidoc
- src/arrow
- util
26 files changed
+58822
-14978
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1095 | 1095 | | |
1096 | 1096 | | |
1097 | 1097 | | |
| 1098 | + | |
1098 | 1099 | | |
1099 | 1100 | | |
1100 | 1101 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
| 493 | + | |
| 494 | + | |
493 | 495 | | |
494 | 496 | | |
495 | 497 | | |
| |||
533 | 535 | | |
534 | 536 | | |
535 | 537 | | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
| 538 | + | |
| 539 | + | |
541 | 540 | | |
542 | 541 | | |
543 | 542 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
296 | 296 | | |
297 | 297 | | |
298 | 298 | | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
299 | 311 | | |
300 | 312 | | |
301 | 313 | | |
| |||
323 | 335 | | |
324 | 336 | | |
325 | 337 | | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
367 | 344 | | |
368 | 345 | | |
369 | 346 | | |
| |||
0 commit comments