perf(parquet): add intra-row-group column parallelism to arrow-rs reader by desmondcheongzx · Pull Request #6423 · Eventual-Inc/Daft

desmondcheongzx · 2026-03-18T06:47:13Z

The arrow-rs parquet reader parallelizes across row groups but decodes all columns serially within each RG. For wide tables with few row groups, this leaves CPU cores idle. This PR adds intra-RG column parallelism by opening separate per-column readers with ProjectionMask::roots, decoding independently, and hconcating results.

All four read paths are modified:

Path	Parallelism
`local_parquet_read_arrowrs` (sync bulk)	Flat `par_iter` over `(RG, col)` pairs
`local_parquet_stream_arrowrs` (sync stream)	Per-RG compute tasks with rayon col split inside
`read_parquet_single_arrowrs` (async bulk)	`try_join_all` over `(RG, col)` async tasks
`stream_parquet_single_arrowrs` (async stream)	Per-RG concurrent async col tasks

When a predicate is pushed, decode uses a two-phase approach: predicate columns are decoded first (serial per RG) to compute a RowSelection, then data columns are decoded in parallel using the refined selection. Results are reassembled via hconcat_record_batches.

Column parallelism is gated by MIN_COLS_FOR_COL_PARALLELISM (3 columns) and MIN_RG_BYTES_FOR_COL_PARALLELISM (16 MiB uncompressed), falling back to the single-reader path when overhead would exceed benefit. Single-column reads use the existing decode_single_rg path with no overhead.

Shared helpers added: bool_array_to_row_selection, refine_selection, hconcat_record_batches, build_base_row_selection, compute_root_indices, and sync/async per-column decode functions.

This is part 1 of #6353, split for reviewability. Part 2 (S3 byte range prefetching + async scheduling) builds on this.

Parallelize column decoding within each row group across all four read paths. Opens separate readers per column with ProjectionMask::roots, decodes independently, and hconcats results. Supports two-phase decode when predicates are pushed (serial predicate phase, parallel data phase with refined RowSelection). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Column-parallel decode opened the file independently for each column task, adding 16x open() syscall overhead on a 16-column file. Read the file into a bytes::Bytes buffer once and share it across column tasks via cheap Bytes::clone() (atomic refcount, zero-copy). Each column reader gets its own independent cursor over the shared buffer. This fixes the CodSpeed regression in test_show[1 Small File] where the per-column file opens added ~2.6ms overhead on a small 1024-row file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add MIN_RG_BYTES_FOR_COL_PARALLELISM threshold (16 MiB uncompressed) to fall back to decode_single_rg for small row groups where per-column reader overhead (metadata clones, buffer setup, hconcat) exceeds the benefit of parallel decode. Applied to both local streaming (Path 2) and local bulk (Path 1) read paths. The CodSpeed benchmark file (1024 rows, 16 cols, ~880KB uncompressed) now takes the single-reader fast path instead of spawning 16 column tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ndle I/O The previous approach read the entire file into a bytes::Bytes buffer upfront, which added ~380ms of overhead for a 728MB file before any decode work started. To fix this, each column task now opens its own file handle via File::open (~microsecond syscall, independent seek position). The OS page cache serves subsequent reads from memory, so there is no redundant I/O. This eliminated the upfront read bottleneck and brought all_cols 8RG from 1440ms to 990ms (parity with parquet2's 996ms). Additional threshold tuning: - MIN_COLS_FOR_COL_PARALLELISM = 3: routes 1-2 column reads to the simpler per-RG fallback path where column splitting overhead isn't justified. - RG count check (rg_tasks < num_cpus * 2): when row groups already saturate cores (e.g. 64 RGs on 8 cores), per-RG decode is more efficient than column splitting with its per-builder overhead. - Async paths use MIN_COLS_FOR_COL_PARALLELISM consistently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-03-18T06:52:34Z

Greptile Summary

This PR adds intra-row-group column parallelism to the arrow-rs parquet reader across all four read paths (local bulk, local stream, remote bulk, remote stream), targeting wide tables with few row groups. When a table has ≥ 3 columns and a row group exceeds 16 MiB, columns are decoded in parallel via separate per-column readers and merged with hconcat_record_batches. A two-phase approach handles predicate-pushed reads.

Key issues:

Column ordering corruption (P0 – previously flagged, not yet resolved): hconcat_record_batches prepends pred_batch (predicate columns) before data columns, producing schema [pred_cols..., data_cols...] instead of file order. finalize_batch was patched from len() != to != schema comparison, but the table passed in has columns in merged order while read_schema has file order — the get_index lookup finds positions within read_schema, not within the table, so get_columns fetches the wrong columns when predicate columns and data columns are interleaved in the file schema.
batch_size silently ignored in async helpers (P1 – previously flagged): decode_rg_column_async and decode_rg_predicate_phase_async hardcode DEFAULT_BATCH_SIZE, ignoring the caller's batch_size parameter on the hot column-parallel path.
Offset + delete semantics inconsistency (P1 – new): The fallback path (< 3 columns) uses with_offset which is post-RowSelection — it skips the first N non-deleted rows. The column-parallel path uses combine_selections(offset_sel, delete_sel) — an intersection that skips the first N rows by physical position. These produce different results when !predicate_pushed, start_offset > 0, and deleted rows fall within the first start_offset positions. Both read_parquet_single_arrowrs (line 1051) and stream_parquet_single_arrowrs (line 2202) are affected.
Dead code in decode_rg_predicate_phase (P2 – new): The if !setup.predicate_pushed branch (line 545) is unreachable — the function is only ever called when predicate_pushed == true. The dead branch is also misleading, implying offset would be double-applied if a future caller invokes the function without pushdown.
Suppressed clippy warnings (P2 – previously flagged): #[allow(clippy::too_many_arguments, clippy::ref_option)] on decode_rg_column_async and decode_rg_predicate_phase_async should be resolved by grouping parameters or using Option<&IOStatsRef>.
Magic number in local_parquet_read_arrowrs (P2 – previously flagged): all_col_indices.len() >= 3 inline instead of >= MIN_COLS_FOR_COL_PARALLELISM.
No new tests cover the column-parallel code paths; adding unit tests for bool_array_to_row_selection, refine_selection, and hconcat_record_batches, and integration tests exercising the 3+ column parallel path with predicates and offsets, would substantially improve confidence.

Confidence Score: 2/5

Not safe to merge — column ordering corruption on predicate+wide-table reads, offset+delete semantic inconsistency, and batch_size ignored on the hot path are unresolved correctness regressions.
The PR introduces valuable parallelism, but multiple correctness bugs affect the primary hot path: (1) column ordering is wrong for predicate-pushed reads when the predicate and data columns are interleaved in file order, (2) the user-specified batch_size is silently discarded in the async column-parallel helpers, and (3) the fallback and column-parallel paths give different results for delete+offset combinations on the non-pushed path. These bugs affect correctness of query results and API contracts.
The single changed file src/daft-parquet/src/arrowrs_reader.rs requires thorough attention — particularly hconcat_record_batches call sites (column ordering), decode_rg_column_async/decode_rg_predicate_phase_async (batch_size), and per_rg_selections (offset+delete semantics).

Important Files Changed

Filename	Overview
src/daft-parquet/src/arrowrs_reader.rs	Large change adding intra-RG column parallelism across all four read paths. Several correctness issues found: column ordering corruption when predicate columns are merged (pred_cols prepended, not interleaved by file order); batch_size silently ignored in async helpers; offset+delete semantics inconsistency between fallback and column-parallel paths; dead code in decode_rg_predicate_phase; clippy warnings suppressed instead of fixed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Read Parquet Request] --> B{Local or Remote?}
    B -- Local sync --> C[local_parquet_read_arrowrs]
    B -- Local stream --> D[local_parquet_stream_arrowrs]
    B -- Remote bulk --> E[read_parquet_single_arrowrs]
    B -- Remote stream --> F[stream_parquet_single_arrowrs]

    C --> G{use_col_parallelism?}
    G -- No --> H[decode_single_rg per RG\nrayon par_iter over RGs]
    G -- Yes --> I{predicate_pushed?}
    I -- Yes --> J[Phase 1: decode pred cols per RG\nrayon par_iter]
    I -- No --> K[flat col_tasks par_iter\nover RG×col pairs]
    J --> L[Phase 2: decode data cols\nrayon par_iter over RG×col]
    L --> M[hconcat per RG\nfinalize_batch]
    K --> M

    D --> N[For each RG task]
    N --> O[decode_single_rg_col_parallel\nrayon col split inside]

    E --> P{all_col_indices < 3?}
    P -- Yes fallback --> Q[Original single stream\nwith_offset + RowFilter]
    P -- No --> R{predicate_pushed?}
    R -- Yes --> S[Phase 1: concurrent pred col decode\ntry_join_all]
    R -- No --> T[concurrent col decode\ntry_join_all RG×col]
    S --> U[Phase 2: concurrent data col decode\nper RG semaphore-gated]
    U --> V[hconcat per RG\nfinalize_batch + limit]
    T --> V

    F --> W{all_col_indices < 3?}
    W -- Yes fallback --> X[Original stream mapped\nwith_offset + RowFilter]
    W -- No --> Y[Stream per RG\nasync then closure]
    Y --> Z[col_futs try_join_all per RG\nhconcat + finalize_batch]
    Z --> AA[scan-based limit applied\nto output stream]

_{Last reviewed commit: "fix(parquet): addres..."}

greptile-apps · 2026-03-18T06:52:37Z

src/daft-parquet/src/arrowrs_reader.rs

+
+        let mut all_batches = vec![pred_batch];
+        all_batches.extend(col_batches);
+        let merged = hconcat_record_batches(&all_batches)?;
+        let daft_batch = RecordBatch::try_from(&merged)?;
+        finalize_batch(
+            daft_batch,
+            None,
+            true,
+            &setup.read_daft_schema,
+            &setup.return_daft_schema,
+        )


Column ordering corruption when predicate columns are returned

When predicate_pushed=true and column parallelism is active, pred_batch (predicate columns) is prepended to data_batches in hconcat_record_batches. This produces a merged schema of [pred_cols..., data_cols...] instead of file-schema order.

finalize_batch only reorders via get_columns when read_schema.len() != return_schema.len(). If the user requests all columns (or their requested set already contains the predicate columns), both lengths are equal and no reordering occurs — the output columns land in the wrong order.

Concrete example: file columns [a, b, c], predicate on c, user reads all.

pred_batch schema: [c]

data_batches schemas: [a], [b]

merged schema: [c, a, b] ← predicate column first

finalize_batch: read_schema.len() == return_schema.len() → skips reorder

Output: [c, a, b] instead of [a, b, c]

The same bug exists in all four paths that call finalize_batch(…, None, true, …) after hconcat_record_batches:

decode_single_rg_col_parallel (line ~731)

local_parquet_read_arrowrs (line ~1895)

read_parquet_single_arrowrs (line ~1279)

stream_parquet_single_arrowrs (line ~2410)

The fix is to unconditionally reorder in finalize_batch (look up every return column by name from the table, not only when lengths differ), or to sort the hconcatenated batch columns to match return_daft_schema order before calling finalize_batch.

// In finalize_batch, change: if read_schema.len() != return_schema.len() { // to: if read_schema != return_schema {

This will also reorder when lengths happen to match but ordering differs.

greptile-apps · 2026-03-18T06:52:38Z

src/daft-parquet/src/arrowrs_reader.rs

+    if all_col_indices.len() < 3 || rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {
+        return decode_single_rg(path, setup, task, predicate, None);


Magic number 3 instead of constant

This hardcodes 3 instead of using MIN_COLS_FOR_COL_PARALLELISM. If the threshold is ever changed, this call site would be missed. Additionally, the comment just above (line 685) says "even 2 columns benefit from splitting", which contradicts the < 3 check — the comment should be corrected to align with the actual threshold.

Suggested change

if all_col_indices.len() < 3 || rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {

return decode_single_rg(path, setup, task, predicate, None);

if all_col_indices.len() < MIN_COLS_FOR_COL_PARALLELISM || rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {

greptile-apps · 2026-03-18T06:52:39Z

src/daft-parquet/src/arrowrs_reader.rs

+fn bool_array_to_row_selection(mask: &arrow::array::BooleanArray) -> RowSelection {
+    use arrow::array::Array;


Inline import inside function body

use arrow::array::Array is placed inside the function rather than at the top of the file. Per project convention, imports should be at the top of the file.

Suggested change

fn bool_array_to_row_selection(mask: &arrow::array::BooleanArray) -> RowSelection {

use arrow::array::Array;

fn bool_array_to_row_selection(mask: &arrow::array::BooleanArray) -> RowSelection {

let mut selectors = Vec::new();

And add use arrow::array::Array; to the top-level use block at the top of the file.

Rule Used: Import statements should be placed at the top of t... (source)

Learnt From
Eventual-Inc/Daft#5078

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-18T06:52:40Z

src/daft-parquet/src/arrowrs_reader.rs

+#[allow(clippy::too_many_arguments, clippy::ref_option)]
+async fn decode_rg_predicate_phase_async(


Suppressing clippy warnings instead of fixing them

#[allow(clippy::too_many_arguments, clippy::ref_option)] silences two clippy warnings rather than fixing the underlying issues. Per project convention, clippy warnings should be resolved rather than suppressed.

clippy::ref_option: &Option<IOStatsRef> should be Option<&IOStatsRef> (avoids double indirection). The same attribute also appears on decode_rg_column_async at line 856.

clippy::too_many_arguments: Consider grouping related parameters (e.g., io_client, io_stats, parquet_metadata, arrow_schema) into a shared context struct — this pattern is already used elsewhere in this file with LocalParquetSetup.

Rule Used: Fix clippy warnings instead of suppressing them wi... (source)

Learnt From
Eventual-Inc/Daft#5207

- Fix column ordering bug: finalize_batch now reorders when schema order differs (not just length), fixing predicate column misordering after hconcat - Use MIN_COLS_FOR_COL_PARALLELISM constant instead of magic number 3 - Move inline `use arrow::array::Array` to top-level imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

desmondcheongzx · 2026-03-18T07:14:31Z

@greptile

greptile-apps · 2026-03-18T07:19:50Z

src/daft-parquet/src/arrowrs_reader.rs

+#[allow(clippy::too_many_arguments, clippy::ref_option)]
+async fn decode_rg_column_async(
+    uri: &str,
+    io_client: &Arc<IOClient>,
+    io_stats: &Option<IOStatsRef>,
+    parquet_metadata: &Arc<ParquetMetaData>,
+    arrow_schema: &Arc<arrow::datatypes::Schema>,
+    col_root_index: usize,
+    rg_idx: usize,
+    row_selection: Option<RowSelection>,
+) -> DaftResult<arrow::array::RecordBatch> {
+    let reader = DaftAsyncFileReader::new(
+        uri.to_string(),
+        io_client.clone(),
+        io_stats.clone(),
+        Some(parquet_metadata.clone()),
+        None,
+    );
+    let options = ArrowReaderOptions::new().with_schema(arrow_schema.clone());
+    let arrow_reader_metadata =
+        ArrowReaderMetadata::try_new(parquet_metadata.clone(), options).map_err(parquet_err)?;
+    let mut builder =
+        ParquetRecordBatchStreamBuilder::new_with_metadata(reader, arrow_reader_metadata);
+
+    let mask = ProjectionMask::roots(builder.parquet_schema(), std::iter::once(col_root_index));
+    builder = builder
+        .with_projection(mask)
+        .with_row_groups(vec![rg_idx])
+        .with_batch_size(DEFAULT_BATCH_SIZE);
+
+    if let Some(sel) = row_selection {
+        builder = builder.with_row_selection(sel);
+    }
+
+    let stream = builder.build().map_err(parquet_err)?;
+    let arrow_batches: Vec<arrow::array::RecordBatch> =
+        stream.try_collect().await.map_err(parquet_err)?;
+
+    if arrow_batches.is_empty() {
+        let field = arrow_schema.field(col_root_index).clone();
+        let schema = Arc::new(arrow::datatypes::Schema::new(vec![field]));
+        Ok(arrow::array::RecordBatch::new_empty(schema))
+    } else if arrow_batches.len() == 1 {
+        Ok(arrow_batches.into_iter().next().unwrap())
+    } else {
+        arrow::compute::concat_batches(&arrow_batches[0].schema(), &arrow_batches)
+            .map_err(|e| parquet_err(e).into())
+    }
+}


batch_size silently ignored in async column-parallel decode helpers

decode_rg_column_async and decode_rg_predicate_phase_async both hardcode DEFAULT_BATCH_SIZE (line 801 and line 885). The batch_size parameter passed to read_parquet_single_arrowrs and stream_parquet_single_arrowrs is correctly forwarded in the fallback single-stream path (e.g. line 1138 let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE); builder2 = builder2.with_batch_size(batch_size);), but completely ignored in the column-parallel code paths that call these two helpers.

This breaks the API contract: users who set a custom batch_size to control memory usage will see their setting respected only when column parallelism is disabled (few columns / small RGs), but silently overridden with DEFAULT_BATCH_SIZE on the hot path.

Both functions should accept a batch_size: usize parameter (or Option<usize>) and thread it through with_batch_size(...):

// In decode_rg_column_async signature: async fn decode_rg_column_async( ... batch_size: usize, ) -> DaftResult<arrow::array::RecordBatch> { ... builder = builder.with_batch_size(batch_size); } // In decode_rg_predicate_phase_async signature: async fn decode_rg_predicate_phase_async( ... batch_size: usize, ) -> DaftResult<(arrow::array::RecordBatch, RowSelection)> { ... builder = builder.with_batch_size(batch_size); }

And thread batch_size.unwrap_or(DEFAULT_BATCH_SIZE) through all call sites in read_parquet_single_arrowrs and stream_parquet_single_arrowrs.

greptile-apps · 2026-03-18T07:19:51Z

src/daft-parquet/src/arrowrs_reader.rs

+                let rg_rows = setup.parquet_metadata.row_group(task.rg_idx).num_rows() as usize;
+                let mut sel = build_base_row_selection(&setup, task, rg_rows);
+                if !setup.predicate_pushed && task.local_offset > 0 {
+                    let rg_rows = setup.parquet_metadata.row_group(task.rg_idx).num_rows() as usize;
+                    let offset_sel = build_offset_row_selection(task.local_offset, rg_rows);


Redundant rg_rows variable

rg_rows is computed on line 1922 and then recomputed with an identical expression on line 1925 inside the inner if block. The inner declaration shadows the outer one unnecessarily.

Suggested change

let rg_rows = setup.parquet_metadata.row_group(task.rg_idx).num_rows() as usize;

let mut sel = build_base_row_selection(&setup, task, rg_rows);

if !setup.predicate_pushed && task.local_offset > 0 {

let rg_rows = setup.parquet_metadata.row_group(task.rg_idx).num_rows() as usize;

let offset_sel = build_offset_row_selection(task.local_offset, rg_rows);

let rg_rows = setup.parquet_metadata.row_group(task.rg_idx).num_rows() as usize;

let mut sel = build_base_row_selection(&setup, task, rg_rows);

if !setup.predicate_pushed && task.local_offset > 0 {

let offset_sel = build_offset_row_selection(task.local_offset, rg_rows);

sel = combine_selections(sel, Some(offset_sel));

greptile-apps · 2026-03-18T07:19:53Z

src/daft-parquet/src/arrowrs_reader.rs

+        if predicate.is_some()
+            && let Some(limit) = num_rows
+        {
+            table = table.head(limit)?;
+        }
+        if predicate.is_none()
+            && let Some(limit) = num_rows
+        {
+            table = table.head(limit)?;
+        }


Redundant mutually-exclusive limit checks

The two if branches are mutually exclusive (predicate.is_some() vs predicate.is_none()) and together they always apply table.head(limit)?. They can be collapsed into a single unconditional check. The same pattern appears in local_parquet_read_arrowrs at lines 1973–1981.

Suggested change

if predicate.is_some()

&& let Some(limit) = num_rows

{

table = table.head(limit)?;

}

if predicate.is_none()

&& let Some(limit) = num_rows

{

table = table.head(limit)?;

}

if let Some(limit) = num_rows {

table = table.head(limit)?;

}

Ok(table)

- Remove redundant rg_rows shadow variable - Collapse mutually-exclusive limit checks into single unconditional check - Remove unused has_predicate variable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

desmondcheongzx · 2026-03-18T07:56:13Z

@greptile

greptile-apps · 2026-03-18T08:06:33Z

src/daft-parquet/src/arrowrs_reader.rs

+                let offset_sel = if use_offset_selection || (!predicate_pushed && global_start > 0)
+                {
+                    let local_offset = global_start.saturating_sub(rg_start_in_stream);
+                    if local_offset > 0 {
+                        Some(build_offset_row_selection(local_offset, rg_rows))
+                    } else {
+                        None
+                    }
+                } else {
+                    None
+                };
+
+                let delete_sel = if let Some(deletes) = delete_rows
+                    && !deletes.is_empty()
+                {
+                    Some(build_single_rg_delete_selection(
+                        deletes,
+                        rg_global_starts[rg_idx],
+                        rg_rows,
+                    ))
+                } else {
+                    None
+                };

-        let offset_selection = if use_offset_selection {
-            Some(build_offset_row_selection(
-                start_offset.unwrap(),
-                total_selected_rows,
-            ))
-        } else {
-            None
-        };
+                combine_selections(offset_sel, delete_sel)


Offset + delete semantics diverge between fallback and column-parallel paths

When !predicate_pushed, start_offset > 0, and delete_rows are present, the fallback path and the column-parallel path produce different results.

Fallback path (all_col_indices.len() < MIN_COLS_FOR_COL_PARALLELISM): applies with_offset(start_offset) after with_row_selection(delete_sel). Because arrow-rs applies with_offset over the post-selection row stream, this skips the first N non-deleted rows.

Column-parallel path (here): combine_selections(offset_sel, delete_sel) produces an intersection — rows that are both at position ≥ N AND not deleted. This skips the first N rows by physical position regardless of deletion status.

Concrete example: rows 0–9, delete_row=2, start_offset=3.

Fallback: non-deleted rows are {0,1,3,4,5,6,7,8,9}; skip first 3 → output starts at row 4.

Column-parallel: rows ≥ 3 AND not deleted = {3,4,5,6,7,8,9}; output starts at row 3.

This is a behavioral regression for users who rely on Iceberg positional deletes combined with a non-predicate-pushed scan with an offset. The same inconsistency exists in stream_parquet_single_arrowrs at line 2202.

greptile-apps · 2026-03-18T08:06:34Z

src/daft-parquet/src/arrowrs_reader.rs

+    // Non-pushed offset for predicate phase.
+    if !setup.predicate_pushed {
+        builder = builder.with_offset(task.local_offset);
+    }


Dead code branch in decode_rg_predicate_phase

decode_rg_predicate_phase is only ever called when setup.predicate_pushed == true (in both decode_single_rg_col_parallel and local_parquet_read_arrowrs, the call sites are inside if setup.predicate_pushed { ... }). The if !setup.predicate_pushed branch at line 545 is therefore unreachable dead code. More importantly, it is also misleading — it suggests the function is designed for use in the non-pushed case, which could cause a correctness problem if a future caller invokes it without pushdown (the offset would be double-applied: once via build_base_row_selection for the delete selection, and again via with_offset).

The branch should be removed to accurately reflect the function's invariant, or the function should be documented with a // Only called when predicate_pushed == true assertion.

Suggested change

// Non-pushed offset for predicate phase.

if !setup.predicate_pushed {

builder = builder.with_offset(task.local_offset);

}

// Apply base row selection (offset + deletes).

let base_selection = build_base_row_selection(setup, task, rg_rows);

if let Some(ref sel) = base_selection {

builder = builder.with_row_selection(sel.clone());

}

codecov · 2026-03-18T08:23:08Z

Codecov Report

❌ Patch coverage is 31.09920% with 771 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.44%. Comparing base (8fc9a71) to head (9de8afb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-parquet/src/arrowrs_reader.rs	31.09%	771 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6423      +/-   ##
==========================================
- Coverage   74.81%   74.44%   -0.38%     
==========================================
  Files        1021     1021              
  Lines      136570   137547     +977     
==========================================
+ Hits       102172   102394     +222     
- Misses      34398    35153     +755

Files with missing lines	Coverage Δ
src/daft-parquet/src/arrowrs_reader.rs	`56.78% <31.09%> (-27.40%)`	⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…dead code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

desmondcheongzx

@greptileai review

desmondcheongzx and others added 4 commits March 17, 2026 23:43

desmondcheongzx requested a review from a team as a code owner March 18, 2026 06:47

github-actions bot added the perf label Mar 18, 2026

desmondcheongzx mentioned this pull request Mar 18, 2026

perf(parquet): S3 byte range prefetching + async scheduling for arrow-rs reader #6424

Closed

desmondcheongzx marked this pull request as draft March 18, 2026 06:48

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

fix(parquet): address additional review feedback from #6423

cc07e0d

- Remove redundant rg_rows shadow variable - Collapse mutually-exclusive limit checks into single unconditional check - Remove unused has_predicate variable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

fix(parquet): thread batch_size through async decode helpers, remove …

9de8afb

…dead code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

desmondcheongzx commented Mar 18, 2026

View reviewed changes

		if all_col_indices.len() < 3 \|\| rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {
		return decode_single_rg(path, setup, task, predicate, None);

	if all_col_indices.len() < 3 \|\| rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {
	return decode_single_rg(path, setup, task, predicate, None);
	if all_col_indices.len() < MIN_COLS_FOR_COL_PARALLELISM \|\| rg_byte_size < MIN_RG_BYTES_FOR_COL_PARALLELISM {

		fn bool_array_to_row_selection(mask: &arrow::array::BooleanArray) -> RowSelection {
		use arrow::array::Array;

		#[allow(clippy::too_many_arguments, clippy::ref_option)]
		async fn decode_rg_predicate_phase_async(

-    // Non-pushed offset for predicate phase.
-    if !setup.predicate_pushed {
-        builder = builder.with_offset(task.local_offset);
-    }
+    // Apply base row selection (offset + deletes).
+    let base_selection = build_base_row_selection(setup, task, rg_rows);
+    if let Some(ref sel) = base_selection {
+        builder = builder.with_row_selection(sel.clone());
+    }

Conversation

desmondcheongzx commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

desmondcheongzx commented Mar 18, 2026

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

desmondcheongzx commented Mar 18, 2026

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

desmondcheongzx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

codecov bot commented Mar 18, 2026 •

edited

Loading