-
Notifications
You must be signed in to change notification settings - Fork 964
Do not skip opened ledger in repair not adhering placement ledger #3977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Do not skip opened ledger in repair not adhering placement ledger #3977
Conversation
|
@horizonzy can you take a look of this ? |
| "For ledger: {}, Segment starting at entry: {}, with ensemble: {} having " | ||
| + "writeQuorumSize: {} and ackQuorumSize: {} is not adhering to " | ||
| + "EnsemblePlacementPolicy", | ||
| if (!metadata.isClosed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better skipping check the last segment if the ledger is OPEN, otherwise, the replication worker will fence a lot of ledgers can lead to client recreate new ledgers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
If there is a rack failure and enforceMinNumRacksPerWriteQuorum=false, the open segment will never satisfy the rack distribution, and the auditor will cause all open ledger fences in the cluster. We can try to repair the closed ledger or segment, but for the open segment, the frequent fence ledgers can be a bit bad when rack failures.
We should be consistent with the behavior of the bookie client write side. Especially for open segments.
|
@horizonzy Please help take a look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the code and found that the ReplicaitonWorker didn't fence the ledger. So If we skip the open ledger, this ledger will be written simultaneously by two clients. One is the original client who created the ledger, and the other is a client replicating the data.
Example:
The E:W:A=3:2:2, and there are 3 bookies [bk0, bk1, bk2].
Rack info: bk0 -> rack1, bk1 -> rack1, bk2 -> rack2.
The client creates ledger 1 and writes entries to bk0, bk1.
At the same time, the ReplicaitonWorker found the ledger 1 ensemble [bk0, bk1] is not adhere to the placement policy.
The ReplicationWorker opens ledger 1 with no recovery and replicates the data to bk2, but the origin client is still to write data to ledger 1.
The final result is very messy
I didn't notice this logic, can you help describe it? |
When do rereplicate, if foundOpenFragments = true, it goto deferLedgerLockRelease(). The ledger would be fenced. And then wait openLedgerRereplicationGracePeriod(default 30s) to trigger rereplicate again. bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java Lines 461 to 463 in c8c593e
|
We'd better skip the last ensemble of the current ledger, as it may be writing data. Let's only copy the ensembles that have already been written before it. |
+1 |
|
@horizonzy @hangc0276 @wenbingshen The code has been modified to skip the last ensemble of the current ledger. Can you help review again? |
All ledgers are equal in the bookie. For the |
+1. |
|
I feel that this change is quite complicated. Can I elaborate on it, do I need to write a BP? |
you can. |
|
Fencing is not exactly a cheap operation for application using BK. There is a config parameter bookkeeper/conf/bk_server.conf Lines 1063 to 1065 in b8cc1fb
|
Motivation
In order to completely fix not adhering placement ledgers, also repair the opened ledger in feature "auto recover support repaired not adhering placement ledger"
Changes
remove "if (!metadata.isClosed())" in two function.
It is ok to recover opened ledger because the ledger would become fence when do recovery.
Master Issue: #3971