fix(storage): split oversized compact blocks during recluster#19577
fix(storage): split oversized compact blocks during recluster#19577zhyass wants to merge 2 commits intodatabendlabs:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 608736989d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
833cf48 to
c4b8f26
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4b8f26ca7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/query/pipeline/transforms/src/processors/transforms/transform_compact_block.rs
Show resolved
Hide resolved
fix fix fix fix fix fix fix fix fix
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This change adds split handling for compacted blocks in the recluster path to prevent oversized blocks from being
produced after sort and compaction. Previously, the output of recluster could retain blocks larger than expected,
which increased memory and I/O pressure.
The implementation reuses the common compact block pipeline and splits blocks when they exceed the configured
threshold. It also adds validation for the upper bound of block_size_threshold to avoid overly large settings. The
goal is to keep block size under control during recluster and reduce the impact of oversized blocks on serialization,
write path behavior, and resource usage.
Tests
Type of change
This change is