-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Fix topic compaction is failed after compactedLedger's all quorum is being recover #21552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
codelipenghui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TakaHiR07 But the new solution will introduce a data inconsistent issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you described in the issue, it seems the ledger doesn't replicate successfully to the new ensemble. The metadata is still the old one:
Read of ledger entry failed: L1705537 E0-E0, Sent to [bookie-1, bookie-2, bookie-3], Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First unread entry is (-1, rc = null)
It is still trying to read the old ensembles.
asyncOpenLedgerNoRecovery is not recommended because it could receive different responses on the different bookies.
For example, when you write 3 bookies and you need 2 bookie ack but only receive one ack. Using this API can read that entry from one bookie but can not read it from the another the failed write bookie. The write client know it's a failed write. But the read client can read it.
Replicate of auto-recovery is success, metadata on zk is the new one, but metadata on CompactTopicContext#ReadLedgerHandle is the old one. And I can read ledger directly from bookie-shell. I guess compactedLedger is closed ledger, it should not be write and read at the same time. Therefore the read client should not be able to read the failed write entry ?? And I also have a question here. Why asyncOpenLedger not register the zk listener to update metadata change? |
|
openLedger is a frequent operation, If every openLedger operation registers the listener, that would be a huge number. |
|
I have the same problem, is there any new progress? |
|
This problem has not been fixed and I hope to continue the review of this pr. Firstly, I think the data inconsistent issue of asyncOpenLedgerNoRecovery is not exist.
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/compaction/AbstractTwoPhaseCompactor.java Lines 184 to 212 in 8a40b30
Secondly, we only register listener for compactedLedger, the listener number is not huge. |
|
possible related: apache/bookkeeper#4613 |
Fixes #21551
Motivation
Fix topic compaction is failed after compactedLedger's all quorum is being recover, which is described in issue.
Modifications
When CompactedTopicImpl try to open compactedLedger, use asyncOpenLedgerNoRecovery instead of asyncOpenLedger. Then compactedLedger can watch the zk node change and update the ledger metadata in broker.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: TakaHiR07#19