Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Addresses PR review feedback in #2674

Rationale for this change

Add more detailed documentation explaining how scans are implemented.

What changes are included in this PR?

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review November 6, 2025 15:19
@andygrove
Copy link
Member Author

@mbutrovich @parthchandra This is low priority, but could you review when you get a chance?

Comment on lines +199 to +210
│ 2. For each column: │
│ - Get ColumnDescriptor │
│ - Read pages via │
│ PageReadStore │
│ - Create CometVector │
│ from native data │
│ 3. Return ColumnarBatch │
└───────────┬───────────────────┘
│ Uses JNI to access native decoders
│ (not for page reading, only for
│ specialized operations if needed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between steps 2-3 is where there is a set of JNI calls to decode individual columns.

│ via CometBatchIterator │
│ │
│ Key operations: │
│ ├─ next_batch() │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more JNI call here. ScanExec.get_next makes a call back to CometBatchIterator which exports the batch back to native (so an FFI call)

│ Key method: │
│ init(AbstractColumnReader[]) │ Iceberg provides column readers
│ │
│ Purpose: │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. (This is how the current integration of native_comet and Iceberg works). native_iceberg_compat uses the same init_datasource_exec call to create a DataFusion DataSourceExec and wraps the natve batch and native columns in corresponding classes on the JVM side (NativeBatchReader and NativeColumnReader)

┌───────────────────────────────┐
│ AbstractColumnReader[] │ Iceberg-managed column readers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Iceberg, we will use IcebergCometNativeBatchReader which does not pass in column readers.
Iceberg integration is not done (and may not be done this way) anyway, so there must be a way that native_iceberg_compat works without Iceberg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants