Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

Fixes #25118

Motivation

The steps of compaction phase two are:

  1. Seek the compacted reader to the position of the 1st retained entry
  2. Read to latest
  3. Acknowledge the last compacted message id (horizon) as well as the compacted ledger's id in properties

However, if broker is closing during phase two, the pending readNextAsync call could fail with CancellationException. In this case, when the managed ledger is closed, the position of the 1st retained entry will be persisted as the __compaction cursor's mark-delete position.

As a result, during the next compaction, the horizon will be very early, and phase one will read too many entries from original ledgers. This issue could get into a bad state when the original ledgers are not deleted, typically due to improper retention policy or unexpected large backlog from another durable subscription.

Modifications

Specially for __compaction cursor, do not modify the mark-delete position in internalResetCursor (triggered by client seek API).

Add testPhaseTwoInterruption to reproduce this issue by injecting topic close in compaction phase two. Additionally, speed up the whole CompactionTest.

It should be noted that the original behavior also makes testCompactorReadsCompacted incorrect. This test sends two messages to two ledgers and compacted them. Then it sends the third message and trigger the compaction. However, it assumes the 2nd ledger will be opened. Technically, the 2nd ledger has only 1 entry that has been compacted, so it should not be opened. The entries from first two ledgers should be read directly from the compacted ledger. This test is fixed by adding another more message before creating the 3rd ledger.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 29, 2025
@BewareMyPower BewareMyPower self-assigned this Dec 29, 2025
@BewareMyPower BewareMyPower added type/bug The PR fixed a bug or issue reported a bug release/4.0.9 release/4.1.3 labels Dec 29, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where the compaction horizon could be reset to an old position when phase two of compaction is interrupted by broker shutdown, causing excessive re-reading of already-compacted entries in subsequent compactions.

Key Changes:

  • Modified ManagedCursorImpl.internalResetCursor to preserve the mark-delete position for the __compaction cursor instead of resetting it to the previous position of the new read position
  • Added test infrastructure to inject failures during compaction phase two and verify the fix
  • Refactored test lifecycle from @BeforeMethod/@AfterMethod to @BeforeClass/@AfterClass with batch read size configuration to improve test performance

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionTest.java Adds new test testPhaseTwoInterruption to reproduce the bug, refactors test lifecycle for performance, adds unique topic names to prevent test interference, and fixes testCompactorReadsCompacted logic
pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java Introduces static volatile injection point for testing phase two interruption scenarios
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Implements the fix by preventing mark-delete position modification for compaction cursor during reset operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari closed this Jan 2, 2026
@lhotari lhotari reopened this Jan 2, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.44%. Comparing base (ff0d0eb) to head (87906cd).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25119      +/-   ##
============================================
- Coverage     74.82%   74.44%   -0.39%     
+ Complexity    33836    33690     -146     
============================================
  Files          1899     1899              
  Lines        149656   149708      +52     
  Branches      17393    17402       +9     
============================================
- Hits         111979   111447     -532     
- Misses        28892    29404     +512     
- Partials       8785     8857      +72     
Flag Coverage Δ
inttests 26.36% <40.00%> (-0.53%) ⬇️
systests 22.98% <0.00%> (-0.22%) ⬇️
unittests 73.98% <100.00%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 78.19% <100.00%> (-1.69%) ⬇️
...e/pulsar/compaction/AbstractTwoPhaseCompactor.java 78.87% <100.00%> (+0.20%) ⬆️

... and 127 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@BewareMyPower BewareMyPower merged commit f101811 into apache:master Jan 4, 2026
100 of 102 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/compaction-interruption branch January 4, 2026 11:04
Technoboy- pushed a commit that referenced this pull request Jan 7, 2026
lhotari pushed a commit that referenced this pull request Jan 8, 2026
…n when phase two is interrupted (#25119)

(cherry picked from commit f101811)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Compaction subscription could be reset back to a very early position

4 participants