should not waitBackOff when replicating open fragment#4593
should not waitBackOff when replicating open fragment#4593SongOf wants to merge 1 commit intoapache:masterfrom
Conversation
zymap
left a comment
There was a problem hiding this comment.
It seems a little dangerous that skip the backoff for one case. When you have that case, it possible you have many of ledger into the state. Then the auto recovery will try to recover it without any breaks. So maybe one way to control that is collect the ledgers who is open and release it in the future time. Then you won't block the normal ledger replication and the open ledger still have a backoff there.
When replicating a ledger in the open state, ReplicationWorker will choose to wait for a while before recovering the open ledger. wait openLedgerRereplicationGracePeriod ms in function "deferLedgerLockRelease". The open ledger actually waits for a while in the function "deferLedgerLockRelease" before recovering it. |
|
But there are still metadata operations before deferring it. If you don't give them a backoff, then the metadata service will suffer huge requests in a short time. |
Descriptions of the changes in this PR:
When replicating a ledger with open fragment, ReplicationWorker.rereplicate(long ledgerIdToReplicate) method returns false.
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java
Lines 495 to 499 in 25326dc
This will cause the replication worker to wait for rwRereplicateBackoffMs ms and be unable to replicate other ledgers. This seriously affects the efficiency of the replication worker.
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java
Lines 247 to 252 in 25326dc
Motivation
When replicating a ledger in the open state, ReplicationWorker will choose to wait for a while before releasing the lock.
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java
Lines 496 to 497 in 25326dc
Therefore, ReplicationWorker does not need to waitBackOffTime any more after skipping replicating an open ledger.
Changes
When replicating an open ledger, it no longer waitBackOffTime but immediately continues to replicate the next ledger.