-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Fix compaction horizon might be reset to an old position when phase two is interrupted #25119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix compaction horizon might be reset to an old position when phase two is interrupted #25119
Conversation
…n when phase two is interrupted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug where the compaction horizon could be reset to an old position when phase two of compaction is interrupted by broker shutdown, causing excessive re-reading of already-compacted entries in subsequent compactions.
Key Changes:
- Modified
ManagedCursorImpl.internalResetCursorto preserve the mark-delete position for the__compactioncursor instead of resetting it to the previous position of the new read position - Added test infrastructure to inject failures during compaction phase two and verify the fix
- Refactored test lifecycle from
@BeforeMethod/@AfterMethodto@BeforeClass/@AfterClasswith batch read size configuration to improve test performance
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java | Adds new test testPhaseTwoInterruption to reproduce the bug, refactors test lifecycle for performance, adds unique topic names to prevent test interference, and fixes testCompactorReadsCompacted logic |
| pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java | Introduces static volatile injection point for testing phase two interruption scenarios |
| managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java | Implements the fix by preventing mark-delete position modification for compaction cursor during reset operations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Show resolved
Hide resolved
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #25119 +/- ##
============================================
- Coverage 74.82% 74.44% -0.39%
+ Complexity 33836 33690 -146
============================================
Files 1899 1899
Lines 149656 149708 +52
Branches 17393 17402 +9
============================================
- Hits 111979 111447 -532
- Misses 28892 29404 +512
- Partials 8785 8857 +72
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…n when phase two is interrupted (#25119)
Fixes #25118
Motivation
The steps of compaction phase two are:
However, if broker is closing during phase two, the pending
readNextAsynccall could fail withCancellationException. In this case, when the managed ledger is closed, the position of the 1st retained entry will be persisted as the__compactioncursor's mark-delete position.As a result, during the next compaction, the horizon will be very early, and phase one will read too many entries from original ledgers. This issue could get into a bad state when the original ledgers are not deleted, typically due to improper retention policy or unexpected large backlog from another durable subscription.
Modifications
Specially for
__compactioncursor, do not modify the mark-delete position ininternalResetCursor(triggered by client seek API).Add
testPhaseTwoInterruptionto reproduce this issue by injecting topic close in compaction phase two. Additionally, speed up the wholeCompactionTest.It should be noted that the original behavior also makes
testCompactorReadsCompactedincorrect. This test sends two messages to two ledgers and compacted them. Then it sends the third message and trigger the compaction. However, it assumes the 2nd ledger will be opened. Technically, the 2nd ledger has only 1 entry that has been compacted, so it should not be opened. The entries from first two ledgers should be read directly from the compacted ledger. This test is fixed by adding another more message before creating the 3rd ledger.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: