Skip to content

[C++][Parquet] Unpack function epilog #47895

@AntoinePrv

Description

@AntoinePrv

Describe the enhancement requested

Right now, the unpack family of functions extract fewer elements than requested.
This is because it relies on batch extraction that must process many inputs at once.
Instead the BitReader::GetBatch is responsible for handling inputs before (prolog) and after (epilog) unpack.

This has two downsides:

  • It makes the general parquet C++ logic harder to understand, as related functions are spread apart;
  • I makes unpack harder to (re)use as it does not fully extract all that is needed. In particular, it makes it hard to iterate on these functions because the tests/benchmarks would need to adapt to the number of element that the function can work with.

The prolog and epilog should be moved to the unpack functions so that one function is fully responsible for unpacking integers without extra complexity.

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions