docs: Add detailed diagrams to contributor guide for all Parquet scan implementations #2681

andygrove · 2025-11-03T21:08:13Z

Which issue does this PR close?

Addresses PR review feedback in #2674

Rationale for this change

Add more detailed documentation explaining how scans are implemented.

What changes are included in this PR?

How are these changes tested?

andygrove · 2025-11-06T15:20:12Z

@mbutrovich @parthchandra This is low priority, but could you review when you get a chance?

parthchandra · 2025-11-06T17:22:44Z

docs/source/contributor-guide/parquet_scans.md

+│ 2. For each column:           │
+│    - Get ColumnDescriptor     │
+│    - Read pages via           │
+│      PageReadStore            │
+│    - Create CometVector       │
+│      from native data         │
+│ 3. Return ColumnarBatch       │
+└───────────┬───────────────────┘
+            │
+            │ Uses JNI to access native decoders
+            │ (not for page reading, only for
+            │  specialized operations if needed)


Between steps 2-3 is where there is a set of JNI calls to decode individual columns.

parthchandra · 2025-11-06T17:26:00Z

docs/source/contributor-guide/parquet_scans.md

+                                        │ via CometBatchIterator    │
+                                        │                           │
+                                        │ Key operations:           │
+                                        │ ├─ next_batch()           │


There is one more JNI call here. ScanExec.get_next makes a call back to CometBatchIterator which exports the batch back to native (so an FFI call)

parthchandra · 2025-11-06T17:31:12Z

docs/source/contributor-guide/parquet_scans.md

+│ Key method:                   │
+│ init(AbstractColumnReader[])  │  Iceberg provides column readers
+│                               │
+│ Purpose:                      │


This is not correct. (This is how the current integration of native_comet and Iceberg works). native_iceberg_compat uses the same init_datasource_exec call to create a DataFusion DataSourceExec and wraps the natve batch and native columns in corresponding classes on the JVM side (NativeBatchReader and NativeColumnReader)

parthchandra · 2025-11-06T17:34:32Z

docs/source/contributor-guide/parquet_scans.md

+            │
+            ↓
+┌───────────────────────────────┐
+│ AbstractColumnReader[]        │  Iceberg-managed column readers


For Iceberg, we will use IcebergCometNativeBatchReader which does not pass in column readers.
Iceberg integration is not done (and may not be done this way) anyway, so there must be a way that native_iceberg_compat works without Iceberg.

andygrove added 3 commits November 3, 2025 14:06

diagrams

7bd4efe

diagrams

57677d5

fix

5e8d9b5

andygrove mentioned this pull request Nov 3, 2025

docs: Various documentation updates #2674

Merged

andygrove marked this pull request as ready for review November 6, 2025 15:19

parthchandra approved these changes Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add detailed diagrams to contributor guide for all Parquet scan implementations #2681

docs: Add detailed diagrams to contributor guide for all Parquet scan implementations #2681

Uh oh!

andygrove commented Nov 3, 2025

Uh oh!

andygrove commented Nov 6, 2025

Uh oh!

parthchandra Nov 6, 2025

Uh oh!

parthchandra Nov 6, 2025

Uh oh!

parthchandra Nov 6, 2025

Uh oh!

parthchandra Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: Add detailed diagrams to contributor guide for all Parquet scan implementations #2681

Are you sure you want to change the base?

docs: Add detailed diagrams to contributor guide for all Parquet scan implementations #2681

Uh oh!

Conversation

andygrove commented Nov 3, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Nov 6, 2025

Uh oh!

parthchandra Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants