-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: lock free info command with replicaof v2 #5864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kostasrim
wants to merge
6
commits into
main
Choose a base branch
from
kpr2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+170
−60
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,7 +64,7 @@ ABSL_DECLARE_FLAG(uint16_t, announce_port); | |
ABSL_FLAG( | ||
int, replica_priority, 100, | ||
"Published by info command for sentinel to pick replica based on score during a failover"); | ||
ABSL_FLAG(bool, experimental_replicaof_v2, true, | ||
ABSL_FLAG(bool, experimental_replicaof_v2, false, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will revert back to true, just want to make sure I did not break anything in case we want to switch back to the old implemntation |
||
"Use ReplicaOfV2 algorithm for initiating replication"); | ||
|
||
namespace dfly { | ||
|
@@ -152,6 +152,8 @@ void Replica::StartMainReplicationFiber(std::optional<LastMasterSyncData> last_m | |
void Replica::EnableReplication() { | ||
VLOG(1) << "Enabling replication"; | ||
|
||
socket_thread_ = ProactorBase::me(); | ||
|
||
state_mask_ = R_ENABLED; // set replica state to enabled | ||
sync_fb_ = MakeFiber(&Replica::MainReplicationFb, this, nullopt); // call replication fiber | ||
} | ||
|
@@ -170,9 +172,17 @@ std::optional<Replica::LastMasterSyncData> Replica::Stop() { | |
sync_fb_.JoinIfNeeded(); | ||
DVLOG(1) << "MainReplicationFb stopped " << this; | ||
acks_fb_.JoinIfNeeded(); | ||
for (auto& flow : shard_flows_) { | ||
flow.reset(); | ||
} | ||
|
||
proactor_->Await([this]() { | ||
// Destructor is blocking, so other fibers can observe partial state | ||
// of flows during clean up. To avoid this, we move them and clear the | ||
// member before the preemption point | ||
auto shard_flows = std::move(shard_flows_); | ||
shard_flows_.clear(); | ||
for (auto& flow : shard_flows) { | ||
flow.reset(); | ||
} | ||
}); | ||
|
||
if (last_journal_LSNs_.has_value()) { | ||
return LastMasterSyncData{master_context_.master_repl_id, last_journal_LSNs_.value()}; | ||
|
@@ -501,18 +511,12 @@ error_code Replica::InitiatePSync() { | |
return error_code{}; | ||
} | ||
|
||
// Initialize and start sub-replica for each flow. | ||
error_code Replica::InitiateDflySync(std::optional<LastMasterSyncData> last_master_sync_data) { | ||
auto start_time = absl::Now(); | ||
|
||
// Initialize MultiShardExecution. | ||
multi_shard_exe_.reset(new MultiShardExecution()); | ||
|
||
// Initialize shard flows. | ||
void Replica::InitializeShardFlows() { | ||
shard_flows_.resize(master_context_.num_flows); | ||
DCHECK(!shard_flows_.empty()); | ||
for (unsigned i = 0; i < shard_flows_.size(); ++i) { | ||
// Transfer LSN state for partial sync | ||
thread_flow_map_ = Partition(shard_flows_.size()); | ||
|
||
for (size_t i = 0; i < shard_flows_.size(); ++i) { | ||
uint64_t partial_sync_lsn = 0; | ||
if (shard_flows_[i]) { | ||
partial_sync_lsn = shard_flows_[i]->JournalExecutedCount(); | ||
|
@@ -523,7 +527,19 @@ error_code Replica::InitiateDflySync(std::optional<LastMasterSyncData> last_mast | |
shard_flows_[i]->SetRecordsExecuted(partial_sync_lsn); | ||
} | ||
} | ||
thread_flow_map_ = Partition(shard_flows_.size()); | ||
} | ||
|
||
// Initialize and start sub-replica for each flow. | ||
error_code Replica::InitiateDflySync(std::optional<LastMasterSyncData> last_master_sync_data) { | ||
auto start_time = absl::Now(); | ||
|
||
// Initialize MultiShardExecution. | ||
multi_shard_exe_.reset(new MultiShardExecution()); | ||
|
||
// Initialize shard flows. The update to the shard_flows_ should be done by this thread. | ||
// Otherwise, there is a race condition between GetSummary() and the shard_flows_[i].reset() | ||
// below. | ||
InitializeShardFlows(); | ||
|
||
// Blocked on until all flows got full sync cut. | ||
BlockingCounter sync_block{unsigned(shard_flows_.size())}; | ||
|
@@ -754,11 +770,6 @@ error_code Replica::ConsumeDflyStream() { | |
}; | ||
RETURN_ON_ERR(exec_st_.SwitchErrorHandler(std::move(err_handler))); | ||
|
||
size_t total_flows_to_finish_partial = 0; | ||
for (const auto& flow : thread_flow_map_) { | ||
total_flows_to_finish_partial += flow.size(); | ||
} | ||
|
||
LOG(INFO) << "Transitioned into stable sync"; | ||
// Transition flows into stable sync. | ||
{ | ||
|
@@ -1210,11 +1221,12 @@ error_code Replica::ParseReplicationHeader(base::IoBuf* io_buf, PSyncResponse* d | |
|
||
auto Replica::GetSummary() const -> Summary { | ||
auto f = [this]() { | ||
DCHECK(this); | ||
auto last_io_time = LastIoTime(); | ||
|
||
// Note: we access LastIoTime from foreigh thread in unsafe manner. However, specifically here | ||
// it's unlikely to cause a real bug. | ||
for (const auto& flow : shard_flows_) { // Get last io time from all sub flows. | ||
for (const auto& flow : shard_flows_) { | ||
DCHECK(Proactor() == ProactorBase::me()); | ||
DCHECK(flow); | ||
last_io_time = std::max(last_io_time, flow->LastIoTime()); | ||
} | ||
|
||
|
@@ -1246,25 +1258,14 @@ auto Replica::GetSummary() const -> Summary { | |
return res; | ||
}; | ||
|
||
if (Sock()) | ||
return Proactor()->AwaitBrief(f); | ||
|
||
/** | ||
* when this branch happens: there is a very short grace period | ||
* where Sock() is not initialized, yet the server can | ||
* receive ROLE/INFO commands. That period happens when launching | ||
* an instance with '--replicaof' and then immediately | ||
* sending a command. | ||
* | ||
* In that instance, we have to run f() on the current fiber. | ||
*/ | ||
return f(); | ||
return Proactor()->AwaitBrief(f); | ||
} | ||
|
||
std::vector<uint64_t> Replica::GetReplicaOffset() const { | ||
std::vector<uint64_t> flow_rec_count; | ||
flow_rec_count.resize(shard_flows_.size()); | ||
for (const auto& flow : shard_flows_) { | ||
DCHECK(flow.get()); | ||
uint32_t flow_id = flow->FlowId(); | ||
uint64_t rec_count = flow->JournalExecutedCount(); | ||
DCHECK_LT(flow_id, shard_flows_.size()); | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending a few hours on this, I still don't understand why, if we keep this code, we get a segfault on the destructor of
std::shared_ptr<Replica>
. It seems that it happens during the preemption but thesock_
resources are alreadydeallocated
soClose()
should early return becausefd_ < 0
.What is more, the core dump shows that
tl_replica
and its copy, have adifferent ref counted object
because one shows that it is expired and the other one having a ref count of 7. I addedCHECK()
before the crash to make sure that bothcopies of the shared_ptr
point to the exact samecontrol block
. The checks passed yet the core dump showed otherwise which makes me think that this is somehow a memory corruption error.The good thing is that we don't need this code anymore, as we handle closing the socket outside of the descturctor now.
While writing this, the only case I can think of is that the last instance of
tl_replica
gets destructed, but it needs to preempt and andinfo command
comes in and grabs a copywhile the shared_ptr is destructing
which could lead to a race condition.I will verify rthis theory once I am back from the holidays.
ps. the test that failed test_cancel_replication_immediately (and every 300 runs so its kinda time consuming to reproduce)