Skip to content

Conversation

@rahil-c
Copy link
Collaborator

@rahil-c rahil-c commented Jan 1, 2026

Describe the issue this Pull Request addresses

Introducing a new config PLAN_STRATEGY_FLOOR_FILE_LIMIT which only considers clustering on files greater than the specified limit. (Note this is to be used in conjuction with the existing PLAN_STRATEGY_SMALL_FILE_LIMIT which only considers clustering on files less than the small file limit which acts as a ceiling).

Summary and Changelog

  • Added new config PLAN_STRATEGY_FLOOR_FILE_LIMIT
  • Updated logic in getFileSlicesEligibleForClustering to use new config
  • Add spark functional test TestSparkSizeBasedClusteringWithFloorLimit for testing this config

Impact

none

Risk Level

low

Documentation Update

new config update PLAN_STRATEGY_FLOOR_FILE_LIMIT

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@rahil-c rahil-c requested a review from nsivabalan January 1, 2026 18:06
@rahil-c rahil-c assigned rahil-c and unassigned nsivabalan Jan 1, 2026
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 1, 2026
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 1, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@rahil-c rahil-c removed their assignment Jan 2, 2026
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 minor commend. LGTM otherwise


public static final ConfigProperty<String> PLAN_STRATEGY_SMALL_FILE_FLOOR_LIMIT = ConfigProperty
.key(CLUSTERING_STRATEGY_PARAM_PREFIX + "small.file.floor.limit")
.defaultValue(String.valueOf(0L))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants