-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[POC - do not review yet] chore: almost mutex free replication #5758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/server/server_family.cc
Outdated
}; | ||
fb2::LockGuard lk(replicaof_mu_); | ||
// Deep copy because tl_replica might be overwritten inbetween | ||
auto replica = tl_replica; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bread and butter no1 of this PR. Bye bye blocking info command because of the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO worth adding a test
// If we are called by "Replicate", tx will be null but we do not need | ||
// to flush anything. | ||
|
||
util::fb2::LockGuard lk(replicaof_mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bread and butter no2. No more loading state
prematurely. We only get into loading_state
if we are doing full sync
. Otherwise, no state change at all.
TODO: move this to |
38640b3
to
76598f2
Compare
Signed-off-by: kostas <[email protected]>
|
||
logging.info(f"succeses: {num_successes}") | ||
assert COMMANDS_TO_ISSUE > num_successes, "At least one REPLICAOF must be cancelled" | ||
assert COMMANDS_TO_ISSUE == num_successes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new algorithm does not use two phase locking
so the following is no longer a possibility:
- client 1 -> REPLICAOF -> Locks the mutex -> updates
replica_
tonew_replica
-> releases the mutex -> calls replica_->Start() - client 2 -> REPLICAOF -> same as (1) but first calls replica_->Stop() -> releases the mutex
- client 1 -> REPLICAOF command grabs the second lock to the mutex, observes that the context got cancelled because of step (2) and boom returns "replication cancelled"
This can not happen anymore because we lock only once and atomically update everything including stopping the previous replica. So by the time (2) grabs the lock in the example above, the previous REPLICAOF command is not in some intermediate state. To observe that indeed we cancelled, we should read the logs and see ("Stopping replication") COMMANDS_TO_ISSUE - 1
times + 1 (because of the Shutdown() at the end)
Bonus points is that I suspect we might be able to also solve #4685 but I will need to follow up with that
Signed-off-by: kostas <[email protected]>
Do not review.