Skip to content

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296

Open
pitrou wants to merge 1 commit intoapache:mainfrom
pitrou:delta-zero-opt
Open

GH-49266: [C++][Parquet] Optimize delta bit-packed decoding when bit-width = 0#49296
pitrou wants to merge 1 commit intoapache:mainfrom
pitrou:delta-zero-opt

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Feb 16, 2026

Rationale for this change

DELTA_BINARY_PACKED decoding has limited performance due to a back-to-back dependency between the computations of value N and value N+1.

However, we can do better if we know that all deltas are 0 in a miniblock. This happens when a miniblock's delta bit width.

What changes are included in this PR?

Avoid reading and accumulating deltas when we the delta bit width is 0. Instead, use a condensed formula that allows to compute a value without waiting for the previous one.

Benchmark results on constant ranges of integers (on my local machine, AMD Zen 2 CPU):

                                                                 benchmark        baseline        contender  change %                                                                                                                                                                                                               counters
                                 BM_DeltaBitPackingDecode_Int32_Fixed/4096   3.821 GiB/sec   12.164 GiB/sec   218.323                              {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 70160}
                                BM_DeltaBitPackingDecode_Int32_Fixed/65536   3.897 GiB/sec   12.378 GiB/sec   217.678                              {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 4487}
                                BM_DeltaBitPackingDecode_Int32_Fixed/32768   3.909 GiB/sec   12.325 GiB/sec   215.309                              {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 9004}
                                 BM_DeltaBitPackingDecode_Int32_Fixed/1024   3.542 GiB/sec   10.468 GiB/sec   195.538                             {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int32_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259535}
                                BM_DeltaBitPackingDecode_Int64_Fixed/32768   9.761 GiB/sec   14.040 GiB/sec    43.847                             {'family_index': 12, 'per_family_instance_index': 2, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/32768', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11192}
                                BM_DeltaBitPackingDecode_Int64_Fixed/65536   9.814 GiB/sec   14.056 GiB/sec    43.222                              {'family_index': 12, 'per_family_instance_index': 3, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5617}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/4096   9.672 GiB/sec   13.543 GiB/sec    40.014                              {'family_index': 12, 'per_family_instance_index': 1, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/4096', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 88891}
                                 BM_DeltaBitPackingDecode_Int64_Fixed/1024   8.850 GiB/sec   12.207 GiB/sec    37.923                             {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_DeltaBitPackingDecode_Int64_Fixed/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 323720}

Are these changes tested?

Yes, by an additional test meant to stress this specific situation.

Are there any user-facing changes?

No.

@pitrou pitrou marked this pull request as ready for review February 16, 2026 10:09
@pitrou pitrou requested a review from wgtmac as a code owner February 16, 2026 10:09
@pitrou pitrou requested review from mapleFU and rok February 16, 2026 10:09
@pitrou
Copy link
Member Author

pitrou commented Feb 16, 2026

FTR @AntoinePrv :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant