Preprocess parquet repetition and definition levels #21139

pmattione-nvidia · 2026-01-22T18:17:21Z

The parquet repetition and definition levels are decoded multiple times throughout the total decoding process: not just during the decode itself by also during setup in compute_page_sizes_kernel() and compute_string_page_bounds_kernel(). And during chunked reads even these setup steps are run multiple times, exploding the cost of re-decoding them.

Instead we decode the levels just once per subpass into a temporary buffer, and just read these results wherever they're needed. This dramatically speeds up the list and chunked cuDF benchmarks, as highlighted below.

Centralizing this grants several advantages. First the old (non-rle_stream) rep/def decode is now ripped entirely out of decode_split_page_data_kernel(), decode_page_data(), and the delta decode kernels, simplifying maintenance. Less shared memory is needed in the decode kernels for the rle_run and result buffers. And as the decode kernel complexity decreases, unnecessary buffer loops are removed and the register count decreases. And future improvements to rle_stream decode can be further studied in their own isolated environment (except dictionary & bool decode still need it).

Benchmarks

Non-chunked int/float/bool: 4-14% faster
Non-chunked list: 48% faster
Non-chunked list: 20-35% faster
Chunked int/float/bool: 28-38% faster
Chunked list: 59-67% faster
Chunked list: 45-59% faster
All (non-list) string decodes: 10-14% faster

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

… improve/refactor common code with string offset memory accounting

pmattione-nvidia · 2026-01-22T18:23:59Z

BEFORE Benchmarks:

parquet_read_decode

[0] NVIDIA RTX A5000

data_type	io_type	cardinality	run_length	null_probability	data_size	Samples	CPU Time	Noise	GPU Time	Noise	bytes_per_second	peak_memory_usage	encoded_file_size
INTEGRAL	DEVICE_BUFFER	0	1	0.01	536870912	752x	12.946 ms	0.60%	12.942 ms	0.60%	41483381052	480.065 MiB	477.518 MiB
INTEGRAL	DEVICE_BUFFER	1000	1	0.01	536870912	430x	13.868 ms	0.50%	13.864 ms	0.50%	38725269081	158.141 MiB	155.566 MiB
INTEGRAL	DEVICE_BUFFER	0	32	0.01	536870912	544x	11.477 ms	0.55%	11.473 ms	0.55%	46794974361	28.749 MiB	26.174 MiB
INTEGRAL	DEVICE_BUFFER	1000	32	0.01	536870912	608x	11.341 ms	0.55%	11.337 ms	0.55%	47356869950	16.038 MiB	13.463 MiB
FLOAT	DEVICE_BUFFER	0	1	0.01	536870912	864x	6.634 ms	0.95%	6.630 ms	0.95%	80979950416	501.348 MiB	499.885 MiB
FLOAT	DEVICE_BUFFER	1000	1	0.01	536870912	1312x	9.279 ms	0.54%	9.275 ms	0.54%	57882832786	109.106 MiB	107.603 MiB
FLOAT	DEVICE_BUFFER	0	32	0.01	536870912	1232x	7.249 ms	0.53%	7.245 ms	0.53%	74101802777	24.559 MiB	23.056 MiB
FLOAT	DEVICE_BUFFER	1000	32	0.01	536870912	278x	7.441 ms	0.50%	7.437 ms	0.50%	72190318758	10.685 MiB	9.182 MiB
BOOL8	DEVICE_BUFFER	0	1	0.01	536870912	592x	24.121 ms	0.58%	24.116 ms	0.58%	22261570770	79.550 MiB	71.654 MiB
BOOL8	DEVICE_BUFFER	1000	1	0.01	536870912	108x	23.950 ms	0.50%	23.945 ms	0.50%	22421074654	79.024 MiB	71.128 MiB
BOOL8	DEVICE_BUFFER	0	32	0.01	536870912	24x	20.884 ms	0.26%	20.879 ms	0.26%	25713418544	29.456 MiB	21.560 MiB
BOOL8	DEVICE_BUFFER	1000	32	0.01	536870912	25x	20.820 ms	0.44%	20.816 ms	0.44%	25791807569	29.395 MiB	21.499 MiB
STRING	DEVICE_BUFFER	0	1	0.01	536870912	528x	26.078 ms	0.99%	26.075 ms	0.99%	20589297063	476.884 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	1	0.01	536870912	960x	8.577 ms	0.69%	8.573 ms	0.69%	62623610865	34.437 MiB	33.949 MiB
STRING	DEVICE_BUFFER	0	32	0.01	536870912	572x	26.149 ms	1.00%	26.146 ms	1.00%	20533667501	476.884 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	32	0.01	536870912	60x	8.419 ms	0.49%	8.415 ms	0.49%	63797663277	4.179 MiB	3.691 MiB
LIST_INT	DEVICE_BUFFER	0	1	0.01	536870912	299x	50.137 ms	1.04%	50.133 ms	1.04%	10708840380	464.953 MiB	462.473 MiB
LIST_INT	DEVICE_BUFFER	1000	1	0.01	536870912	270x	55.449 ms	1.73%	55.444 ms	1.73%	9683170117	158.619 MiB	156.984 MiB
LIST_INT	DEVICE_BUFFER	0	32	0.01	536870912	312x	48.027 ms	1.46%	48.023 ms	1.46%	11179436062	40.115 MiB	38.479 MiB
LIST_INT	DEVICE_BUFFER	1000	32	0.01	536870912	318x	47.078 ms	1.35%	47.074 ms	1.35%	11404802537	27.370 MiB	25.735 MiB
LIST_STR	DEVICE_BUFFER	0	1	0.01	536870912	444x	33.648 ms	1.07%	33.644 ms	1.07%	15957184786	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	1	0.01	536870912	581x	25.685 ms	1.91%	25.682 ms	1.91%	20904846493	35.655 MiB	35.128 MiB
LIST_STR	DEVICE_BUFFER	0	32	0.01	536870912	432x	34.627 ms	0.88%	34.623 ms	0.88%	15506322061	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	32	0.01	536870912	634x	23.513 ms	1.63%	23.509 ms	1.63%	22837049612	7.177 MiB	6.650 MiB

parquet_read_chunks

[0] NVIDIA RTX A5000

T	io_type	cardinality	run_length	chunk_read_limit	data_size	Samples	CPU Time	Noise	GPU Time	Noise	bytes_per_second	peak_memory_usage	encoded_file_size
INTEGRAL	DEVICE_BUFFER	0	1	500000	536870912	5x	211.268 ms	0.26%	211.262 ms	0.26%	2541260124	480.560 MiB	477.518 MiB
INTEGRAL	DEVICE_BUFFER	1000	1	500000	536870912	5x	201.871 ms	0.12%	201.864 ms	0.12%	2659566528	158.641 MiB	155.566 MiB
INTEGRAL	DEVICE_BUFFER	0	32	500000	536870912	5x	183.345 ms	0.19%	183.338 ms	0.19%	2928307030	29.249 MiB	26.174 MiB
INTEGRAL	DEVICE_BUFFER	1000	32	500000	536870912	5x	180.026 ms	0.19%	180.020 ms	0.19%	2982287017	16.537 MiB	13.462 MiB
FLOAT	DEVICE_BUFFER	0	1	500000	536870912	8x	67.228 ms	0.20%	67.223 ms	0.20%	7986408016	501.639 MiB	499.885 MiB
FLOAT	DEVICE_BUFFER	1000	1	500000	536870912	6x	93.068 ms	0.48%	93.063 ms	0.48%	5768919551	109.405 MiB	107.603 MiB
FLOAT	DEVICE_BUFFER	0	32	500000	536870912	7x	80.863 ms	0.12%	80.858 ms	0.13%	6639659984	24.858 MiB	23.056 MiB
FLOAT	DEVICE_BUFFER	1000	32	500000	536870912	7x	79.875 ms	0.24%	79.870 ms	0.24%	6721797551	10.984 MiB	9.182 MiB
BOOL8	DEVICE_BUFFER	0	1	500000	536870912	5x	1.112 s	0.10%	1.112 s	0.10%	482682156	81.047 MiB	71.654 MiB
BOOL8	DEVICE_BUFFER	1000	1	500000	536870912	5x	1.114 s	0.07%	1.114 s	0.07%	481762714	80.521 MiB	71.128 MiB
BOOL8	DEVICE_BUFFER	0	32	500000	536870912	5x	1.098 s	0.05%	1.098 s	0.05%	488842234	30.953 MiB	21.560 MiB
BOOL8	DEVICE_BUFFER	1000	32	500000	536870912	5x	1.099 s	0.04%	1.099 s	0.04%	488591607	30.892 MiB	21.499 MiB
STRING	DEVICE_BUFFER	0	1	500000	536870912	10x	54.543 ms	0.21%	54.538 ms	0.21%	9843898427	476.960 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	1	500000	536870912	45x	44.911 ms	0.50%	44.907 ms	0.50%	11955286923	34.517 MiB	33.950 MiB
STRING	DEVICE_BUFFER	0	32	500000	536870912	10x	54.489 ms	0.23%	54.485 ms	0.23%	9853547814	476.960 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	32	500000	536870912	11x	45.870 ms	0.34%	45.866 ms	0.34%	11705261968	4.260 MiB	3.692 MiB
LIST_INT	DEVICE_BUFFER	0	1	500000	536870912	5x	609.720 ms	0.07%	609.707 ms	0.07%	880538710	464.953 MiB	462.473 MiB
LIST_INT	DEVICE_BUFFER	1000	1	500000	536870912	28x	536.036 ms	1.05%	536.024 ms	1.05%	1001579818	158.619 MiB	156.984 MiB
LIST_INT	DEVICE_BUFFER	0	32	500000	536870912	31x	484.475 ms	1.56%	484.463 ms	1.56%	1108177452	40.115 MiB	38.479 MiB
LIST_INT	DEVICE_BUFFER	1000	32	500000	536870912	32x	479.636 ms	1.38%	479.624 ms	1.38%	1119357441	27.369 MiB	25.734 MiB
LIST_STR	DEVICE_BUFFER	0	1	500000	536870912	5x	251.797 ms	0.07%	251.792 ms	0.07%	2132203437	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	1	500000	536870912	83x	181.387 ms	1.85%	181.383 ms	1.85%	2959880715	35.655 MiB	35.128 MiB
LIST_STR	DEVICE_BUFFER	0	32	500000	536870912	5x	251.669 ms	0.06%	251.664 ms	0.06%	2133286744	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	32	500000	536870912	72x	210.132 ms	2.22%	210.127 ms	2.22%	2554982962	7.177 MiB	6.650 MiB

pmattione-nvidia · 2026-01-22T18:24:23Z

AFTER benchmarks:

parquet_read_decode

[0] NVIDIA RTX A5000

data_type	io_type	cardinality	run_length	null_probability	data_size	Samples	CPU Time	Noise	GPU Time	Noise	bytes_per_second	peak_memory_usage	encoded_file_size
INTEGRAL	DEVICE_BUFFER	0	1	0.01	536870912	43x	11.893 ms	0.22%	11.889 ms	0.22%	45157090704	480.065 MiB	477.518 MiB
INTEGRAL	DEVICE_BUFFER	1000	1	0.01	536870912	41x	12.318 ms	0.26%	12.314 ms	0.26%	43599208970	158.141 MiB	155.566 MiB
INTEGRAL	DEVICE_BUFFER	0	32	0.01	536870912	50x	10.116 ms	0.11%	10.112 ms	0.11%	53092842315	28.749 MiB	26.174 MiB
INTEGRAL	DEVICE_BUFFER	1000	32	0.01	536870912	51x	9.918 ms	0.11%	9.914 ms	0.11%	54152539520	16.037 MiB	13.463 MiB
FLOAT	DEVICE_BUFFER	0	1	0.01	536870912	78x	6.456 ms	0.20%	6.452 ms	0.20%	83209670930	501.348 MiB	499.885 MiB
FLOAT	DEVICE_BUFFER	1000	1	0.01	536870912	63x	8.000 ms	0.30%	7.996 ms	0.30%	67143053727	109.106 MiB	107.603 MiB
FLOAT	DEVICE_BUFFER	0	32	0.01	536870912	82x	6.141 ms	0.17%	6.137 ms	0.17%	87482757272	24.559 MiB	23.056 MiB
FLOAT	DEVICE_BUFFER	1000	32	0.01	536870912	78x	6.439 ms	0.16%	6.435 ms	0.16%	83432937620	10.685 MiB	9.182 MiB
BOOL8	DEVICE_BUFFER	0	1	0.01	536870912	23x	22.288 ms	0.14%	22.283 ms	0.14%	24093010706	79.550 MiB	71.654 MiB
BOOL8	DEVICE_BUFFER	1000	1	0.01	536870912	23x	22.195 ms	0.15%	22.190 ms	0.15%	24193991316	79.024 MiB	71.128 MiB
BOOL8	DEVICE_BUFFER	0	32	0.01	536870912	25x	20.048 ms	0.28%	20.044 ms	0.28%	26785130812	29.456 MiB	21.560 MiB
BOOL8	DEVICE_BUFFER	1000	32	0.01	536870912	25x	20.013 ms	0.16%	20.009 ms	0.16%	26831640124	29.395 MiB	21.499 MiB
STRING	DEVICE_BUFFER	0	1	0.01	536870912	636x	23.527 ms	0.83%	23.523 ms	0.83%	22823582834	476.884 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	1	0.01	536870912	67x	7.500 ms	0.37%	7.496 ms	0.37%	71619430280	34.437 MiB	33.949 MiB
STRING	DEVICE_BUFFER	0	32	0.01	536870912	623x	24.017 ms	1.80%	24.013 ms	1.80%	22357918475	476.884 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	32	0.01	536870912	154x	7.469 ms	0.50%	7.465 ms	0.50%	71916146297	4.180 MiB	3.692 MiB
LIST_INT	DEVICE_BUFFER	0	1	0.01	536870912	611x	24.435 ms	0.64%	24.431 ms	0.64%	21975343821	464.953 MiB	462.473 MiB
LIST_INT	DEVICE_BUFFER	1000	1	0.01	536870912	567x	26.350 ms	0.92%	26.346 ms	0.92%	20377525620	158.619 MiB	156.984 MiB
LIST_INT	DEVICE_BUFFER	0	32	0.01	536870912	528x	23.556 ms	0.74%	23.552 ms	0.74%	22795468478	40.115 MiB	38.479 MiB
LIST_INT	DEVICE_BUFFER	1000	32	0.01	536870912	654x	22.825 ms	0.81%	22.821 ms	0.81%	23525091962	27.369 MiB	25.734 MiB
LIST_STR	DEVICE_BUFFER	0	1	0.01	536870912	557x	26.820 ms	0.88%	26.816 ms	0.88%	20020683623	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	1	0.01	536870912	901x	16.523 ms	0.57%	16.520 ms	0.57%	32498814970	35.655 MiB	35.128 MiB
LIST_STR	DEVICE_BUFFER	0	32	0.01	536870912	550x	27.186 ms	0.73%	27.182 ms	0.73%	19750653072	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	32	0.01	536870912	47x	16.210 ms	0.50%	16.206 ms	0.50%	33127881571	7.177 MiB	6.650 MiB

parquet_read_chunks

[0] NVIDIA RTX A5000

T	io_type	cardinality	run_length	chunk_read_limit	data_size	Samples	CPU Time	Noise	GPU Time	Noise	bytes_per_second	peak_memory_usage	encoded_file_size
INTEGRAL	DEVICE_BUFFER	0	1	500000	536870912	5x	150.765 ms	0.24%	150.759 ms	0.24%	3561125055	480.560 MiB	477.518 MiB
INTEGRAL	DEVICE_BUFFER	1000	1	500000	536870912	5x	139.731 ms	0.09%	139.724 ms	0.09%	3842356717	158.641 MiB	155.566 MiB
INTEGRAL	DEVICE_BUFFER	0	32	500000	536870912	5x	126.437 ms	0.12%	126.431 ms	0.12%	4246341656	29.249 MiB	26.174 MiB
INTEGRAL	DEVICE_BUFFER	1000	32	500000	536870912	5x	123.280 ms	0.19%	123.275 ms	0.19%	4355083233	16.538 MiB	13.463 MiB
FLOAT	DEVICE_BUFFER	0	1	500000	536870912	11x	48.432 ms	0.10%	48.427 ms	0.10%	11086147961	501.639 MiB	499.885 MiB
FLOAT	DEVICE_BUFFER	1000	1	500000	536870912	8x	67.549 ms	0.19%	67.544 ms	0.19%	7948413478	109.405 MiB	107.603 MiB
FLOAT	DEVICE_BUFFER	0	32	500000	536870912	9x	58.779 ms	0.12%	58.775 ms	0.13%	9134407498	24.858 MiB	23.056 MiB
FLOAT	DEVICE_BUFFER	1000	32	500000	536870912	9x	57.956 ms	0.18%	57.951 ms	0.18%	9264268153	10.984 MiB	9.182 MiB
BOOL8	DEVICE_BUFFER	0	1	500000	536870912	5x	692.868 ms	0.04%	692.852 ms	0.04%	774870464	81.047 MiB	71.654 MiB
BOOL8	DEVICE_BUFFER	1000	1	500000	536870912	5x	692.756 ms	0.07%	692.741 ms	0.07%	774995564	80.521 MiB	71.128 MiB
BOOL8	DEVICE_BUFFER	0	32	500000	536870912	5x	692.155 ms	0.03%	692.139 ms	0.03%	775669291	30.953 MiB	21.560 MiB
BOOL8	DEVICE_BUFFER	1000	32	500000	536870912	5x	692.004 ms	0.04%	691.988 ms	0.04%	775838607	30.892 MiB	21.499 MiB
STRING	DEVICE_BUFFER	0	1	500000	536870912	11x	49.093 ms	0.30%	49.088 ms	0.30%	10936811575	476.960 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	1	500000	536870912	39x	38.463 ms	0.50%	38.459 ms	0.50%	13959562975	34.517 MiB	33.949 MiB
STRING	DEVICE_BUFFER	0	32	500000	536870912	11x	49.166 ms	0.23%	49.162 ms	0.23%	10920515660	476.960 MiB	476.417 MiB
STRING	DEVICE_BUFFER	1000	32	500000	536870912	13x	41.128 ms	0.47%	41.123 ms	0.47%	13055123245	4.259 MiB	3.692 MiB
LIST_INT	DEVICE_BUFFER	0	1	500000	536870912	5x	199.406 ms	0.29%	199.399 ms	0.29%	2692447267	464.953 MiB	462.473 MiB
LIST_INT	DEVICE_BUFFER	1000	1	500000	536870912	5x	190.193 ms	0.26%	190.186 ms	0.26%	2822873273	158.619 MiB	156.984 MiB
LIST_INT	DEVICE_BUFFER	0	32	500000	536870912	5x	201.782 ms	0.29%	201.775 ms	0.29%	2660742419	40.115 MiB	38.479 MiB
LIST_INT	DEVICE_BUFFER	1000	32	500000	536870912	5x	198.272 ms	0.17%	198.265 ms	0.17%	2707845749	27.369 MiB	25.734 MiB
LIST_STR	DEVICE_BUFFER	0	1	500000	536870912	5x	134.152 ms	0.15%	134.146 ms	0.15%	4002131818	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	1	500000	536870912	12x	74.376 ms	0.48%	74.371 ms	0.48%	7218815087	35.655 MiB	35.128 MiB
LIST_STR	DEVICE_BUFFER	0	32	500000	536870912	5x	134.265 ms	0.22%	134.259 ms	0.22%	3998766024	455.089 MiB	452.602 MiB
LIST_STR	DEVICE_BUFFER	1000	32	500000	536870912	129x	116.319 ms	1.15%	116.313 ms	1.15%	4615729767	7.177 MiB	6.650 MiB

pmattione-nvidia · 2026-01-22T18:25:22Z

cpp/src/io/parquet/decode_fixed.cu

  return max_depth_valid_count;
 }

-// is the page marked nullable or not


Moved to a common header

pmattione-nvidia · 2026-01-22T18:26:10Z

cpp/src/io/parquet/decode_fixed.cu

         (s->input_row_count <= last_row)) {
    int next_valid_count;
    block.sync();
+    processed_count += min(rolling_buf_size, s->page.num_input_values - processed_count);


Same for all cases

pmattione-nvidia · 2026-01-22T18:26:46Z

cpp/src/io/parquet/decode_preprocess.cu

-  }
-  block.sync();
-
-  if (!t) {


this is still present, is lower in the diff

pmattione-nvidia · 2026-01-22T18:27:31Z

cpp/src/io/parquet/decode_preprocess.cu

-  // the core loop. decode batches of level stream data using rle_stream objects
-  // and pass the results to update_page_sizes
-  int processed = 0;
-  while (processed < s->page.num_input_values) {


No longer need to loop for rep/def buffers, can call update_page_sizes() in one shot

pmattione-nvidia · 2026-01-22T18:29:22Z

cpp/src/io/parquet/page_decode.cuh

  return total_len;
 }

-/**


old code path finally no longer needed, superseded by rle_stream

pmattione-nvidia · 2026-01-22T18:31:31Z

cpp/src/io/parquet/page_string_decode.cu

      __syncthreads();

-      // do something with the level data
-      while (start_val < processed) {


most of the changes in here are due to nuking this inner loop (no longer need to buffer the level decode), and of course removing the decode itself. highly recommend hiding whitespace diffs

pmattione-nvidia · 2026-01-22T18:34:04Z

cpp/src/io/parquet/reader_impl_chunking_utils.cu

    // Fixed length byte array: Offsets are fixed, no need to allocate offset buffer
    if (chunk.physical_type == Type::FIXED_LEN_BYTE_ARRAY) { return 0; }

-    // Estimate number of offsets based on page.num_input_values


optimize string #offsets determination, combining logic with new level decode logic

mhaseeb123

Quick glance, will do a detailed review a bit later. Couple of questions and minor comments.

mhaseeb123 · 2026-01-22T19:26:51Z

cpp/src/io/parquet/reader_impl_chunking_utils.cu

+    rmm::exec_policy_nosync(stream),
+    iter,
+    iter + pages.size(),


We can just iterate over pages here since the functor only uses the page_idx to access pages[page_idx] anyway. In that case, we can also remove the pages struct member

No, we want to utilize the gpu parallelism. If we loop over pages here then we get no parallelism.

mhaseeb123 · 2026-01-22T19:28:26Z

cpp/src/io/parquet/decode_preprocess.cu

+        s, pp, chunks, min_row, num_rows, all_types_filter{}, page_processing_stage::PREPROCESS)) {
+    return;
+  }
+


We might need code to skip pages based on subpass_page_mask here and other kernels?

No code is needed here. We won't use the def/rep levels for these pages at all so there's nothing to set.

I have updated allocate_level_decode_space() to skip allocating memory for rep/def levels though since we don't need it.

mhaseeb123 · 2026-01-22T19:30:22Z

cpp/src/io/parquet/reader_impl_chunking_utils.cu

 struct compute_page_string_offset_size {
  device_span<PageInfo const> pages;
  device_span<ColumnChunkDesc const> chunks;
+  size_t skip_rows;


We can also update this functor's () op to directly iterate over pages instead of page_idx.

No, this is executed by thrust::transform. If we loop over pages within the operator then we get no parallelism.

into prepro_levels

pmattione-nvidia added 10 commits January 14, 2026 17:19

First pass, working

3709e55

def can't be nullptr

930edde

Merge branch 'main' into prepro_levels

8c1b445

in memory scratch space calculation take into account rep/def memory.…

e51d784

… improve/refactor common code with string offset memory accounting

fix race issue

e51c17f

remove old rep/def decode from delta/etc. kernels

6aa3436

Merge branch 'main' into prepro_levels

7bae48d

minor code tweaks

86ca861

style fixes

1f6939b

more minor tweaks

7f2562c

pmattione-nvidia requested review from mhaseeb123 and vuule January 22, 2026 18:17

pmattione-nvidia self-assigned this Jan 22, 2026

pmattione-nvidia requested a review from a team as a code owner January 22, 2026 18:17

pmattione-nvidia added Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 22, 2026

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 22, 2026

pmattione-nvidia commented Jan 22, 2026

View reviewed changes

cpp/src/io/parquet/decode_preprocess.cu

}

block.sync();

if (!t) {

Copy link

Contributor Author

pmattione-nvidia Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still present, is lower in the diff

pmattione-nvidia commented Jan 22, 2026

View reviewed changes

mhaseeb123 reviewed Jan 22, 2026

View reviewed changes

pmattione-nvidia and others added 2 commits January 23, 2026 15:03

fix racecheck error

8d45b4b

Merge branch 'main' into prepro_levels

1751f3c

github-actions bot assigned Matt711 Jan 24, 2026

pmattione-nvidia added 2 commits January 26, 2026 15:35

Don't allocate memory for masked out pages

78a2a10

Merge branch 'prepro_levels' of https://github.com/pmattione-nvidia/cudf

300e09a

into prepro_levels

Preprocess parquet repetition and definition levels #21139

Are you sure you want to change the base?

Preprocess parquet repetition and definition levels #21139

Uh oh!

Conversation

pmattione-nvidia commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Checklist

Uh oh!

pmattione-nvidia commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

parquet_read_decode

[0] NVIDIA RTX A5000

parquet_read_chunks

[0] NVIDIA RTX A5000

Uh oh!

pmattione-nvidia commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

parquet_read_decode

[0] NVIDIA RTX A5000

parquet_read_chunks

[0] NVIDIA RTX A5000

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmattione-nvidia Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmattione-nvidia Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmattione-nvidia Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhaseeb123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pmattione-nvidia commented Jan 22, 2026 •

edited

Loading

pmattione-nvidia commented Jan 22, 2026 •

edited

Loading

pmattione-nvidia commented Jan 22, 2026 •

edited

Loading

pmattione-nvidia Jan 22, 2026 •

edited

Loading

pmattione-nvidia Jan 22, 2026 •

edited

Loading

pmattione-nvidia Jan 22, 2026 •

edited

Loading