Skip to content

Adding SparkSQLProperty for split-size#228

Merged
shanthoosh merged 1 commit intolinkedin:openhouse-1.5.2from
shanthoosh:add_split_size_configuration
Jan 8, 2026
Merged

Adding SparkSQLProperty for split-size#228
shanthoosh merged 1 commit intolinkedin:openhouse-1.5.2from
shanthoosh:add_split_size_configuration

Conversation

@shanthoosh
Copy link
Collaborator

Description

In Hive, we relied on the session-level config spark.sql.files.maxPartitionBytes to control file split sizes, and this was widely adopted across many flows at LinkedIn. As we transition to Iceberg, the equivalent setting is read.split.target-size, but it can't be set via session configs. This creates a gap—our current options are either to update thousands of jobs to pass this value through OPTIONS, or to set the table property spark.sql.iceberg.split-size directly on the tables and apply the config to all the jobs. Both of which is suboptimal.

Back-port PR: apache/iceberg#13677
Same changes done in spark-3.1 for openhouse-1.2.0 branch #190

Testing

./gradlew clean && ./gradlew build

@github-actions github-actions bot added the SPARK label Jan 8, 2026
@shanthoosh shanthoosh force-pushed the add_split_size_configuration branch from 53524f8 to 095afa4 Compare January 8, 2026 18:33
Copy link
Member

@abhisheknath2011 abhisheknath2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick PR.

@shanthoosh shanthoosh merged commit efb0922 into linkedin:openhouse-1.5.2 Jan 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants