Add split size session config support in SparkScanBuilder.configureSplitPlanning()#230
Closed
shanthoosh wants to merge 1 commit intoopenhouse-1.5.2from
Closed
Add split size session config support in SparkScanBuilder.configureSplitPlanning()#230shanthoosh wants to merge 1 commit intoopenhouse-1.5.2from
shanthoosh wants to merge 1 commit intoopenhouse-1.5.2from
Conversation
| @@ -194,7 +194,13 @@ public int orcBatchSize() { | |||
| } | |||
|
|
|||
| public Long splitSizeOption() { | |||
Contributor
There was a problem hiding this comment.
The name does mean that it is reading from Option alone.
Collaborator
Author
There was a problem hiding this comment.
In SparkStagedScan and SparkMicroBatchStream, readConf.splitSize() method is used. Here is the relevant usage:
sumedhsakdeo
requested changes
Jan 20, 2026
Contributor
sumedhsakdeo
left a comment
There was a problem hiding this comment.
We should take a deeper look at why only options / tableproperty are being looked at in the alternate split size path.
overriding method that is named to look at spark options alone with session conf / tblproperty, etc. seems incorrect.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The session config spark.sql.iceberg.split-size was honored in some Spark read paths but ignored in others. Specifically, SparkReadConf.splitSizeOption() only checked the read option (SparkReadOptions.SPLIT_SIZE) and not the session config, causing inconsistent behavior: Specifically, APIs such as SparkStagedScan and SparkMicroBatchStream uses the session config(SparkSQLProperties.SPLIT_SIZE), while SparkScanBuilder.configureSplitPlanning() did not respect the session configuration(SparkSQLProperties.SPLIT_SIZE).
Fix
This PR fixes the inconsistency by updating splitSizeOption() to also consider the session configuration (SparkSQLProperties.SPLIT_SIZE), ensuring consistent split-size handling across all Spark reader control flows.
Testing