Skip to content

Commit 3c2161e

Browse files
bill-scalesaainscow
authored andcommitted
osd: Optimized EC choose_async_recovery_ec must use auth_shard
Optimized EC pools modify how GetLog and choose_acting work, if the auth_shard is a non-primary shard and the (new) primary is behind the auth_shard then we cannot just get the log from the non-primary shard because it will be missing entries for partial writes. Instead we need to get the log from a shard that has the full log first and then repeat GetLog to get the log from the auth_shard. choose_acting was modifying auth_shard in the case where we need to get the log from another shard first. This is wrong - the remainder of the logic in choose_acting and in particular choose_async_recovery_ec needs to use the auth_shard to calculate what the acting set will be. Using a different shard occasional can cause a different acting set to be selected (because of thresholds about the number of log entries behind a shard needs to be to perform async recovery) and this can lead to two shards flip/flopping with different opinions about what the acting set should be. Fix is to separate out which shard will be returned to GetLog from the auth_shard which will be used for acting set calculations. Signed-off-by: Bill Scales <[email protected]>
1 parent 645cdf9 commit 3c2161e

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

src/osd/PeeringState.cc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2503,6 +2503,7 @@ bool PeeringState::choose_acting(pg_shard_t &auth_log_shard_id,
25032503

25042504
auto auth_log_shard = find_best_info(all_info, restrict_to_up_acting,
25052505
false, history_les_bound);
2506+
auto get_log_shard = auth_log_shard;
25062507

25072508
if ((repeat_getlog != nullptr) &&
25082509
auth_log_shard != all_info.end() &&
@@ -2516,16 +2517,16 @@ bool PeeringState::choose_acting(pg_shard_t &auth_log_shard_id,
25162517
// has to be rolled backwards
25172518
psdout(10) << "auth_log_shard " << auth_log_shard->first
25182519
<< " is ahead but is a non_primary shard" << dendl;
2519-
auth_log_shard = find_best_info(all_info, restrict_to_up_acting,
2520+
get_log_shard = find_best_info(all_info, restrict_to_up_acting,
25202521
true, history_les_bound);
2521-
if (auth_log_shard != all_info.end()) {
2522+
if (get_log_shard != all_info.end()) {
25222523
psdout(10) << "auth_log_shard " << auth_log_shard->first
25232524
<< " selected instead" << dendl;
25242525
*repeat_getlog = true;
25252526
}
25262527
}
25272528

2528-
if (auth_log_shard == all_info.end()) {
2529+
if (get_log_shard == all_info.end()) {
25292530
if (up != acting) {
25302531
psdout(10) << "no suitable info found (incomplete backfills?),"
25312532
<< " reverting to up" << dendl;
@@ -2540,7 +2541,7 @@ bool PeeringState::choose_acting(pg_shard_t &auth_log_shard_id,
25402541
}
25412542

25422543
ceph_assert(!auth_log_shard->second.is_incomplete());
2543-
auth_log_shard_id = auth_log_shard->first;
2544+
auth_log_shard_id = get_log_shard->first;
25442545

25452546
set<pg_shard_t> want_backfill, want_acting_backfill;
25462547
vector<int> want;

0 commit comments

Comments
 (0)