Commit 3c2161e
osd: Optimized EC choose_async_recovery_ec must use auth_shard
Optimized EC pools modify how GetLog and choose_acting work,
if the auth_shard is a non-primary shard and the (new) primary
is behind the auth_shard then we cannot just get the log from
the non-primary shard because it will be missing entries for
partial writes. Instead we need to get the log from a shard
that has the full log first and then repeat GetLog to get
the log from the auth_shard.
choose_acting was modifying auth_shard in the case where
we need to get the log from another shard first. This is
wrong - the remainder of the logic in choose_acting and
in particular choose_async_recovery_ec needs to use the
auth_shard to calculate what the acting set will be.
Using a different shard occasional can cause a
different acting set to be selected (because of
thresholds about the number of log entries behind
a shard needs to be to perform async recovery) and
this can lead to two shards flip/flopping with
different opinions about what the acting set should be.
Fix is to separate out which shard will be returned
to GetLog from the auth_shard which will be used
for acting set calculations.
Signed-off-by: Bill Scales <[email protected]>1 parent 645cdf9 commit 3c2161e
1 file changed
+5
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2503 | 2503 | | |
2504 | 2504 | | |
2505 | 2505 | | |
| 2506 | + | |
2506 | 2507 | | |
2507 | 2508 | | |
2508 | 2509 | | |
| |||
2516 | 2517 | | |
2517 | 2518 | | |
2518 | 2519 | | |
2519 | | - | |
| 2520 | + | |
2520 | 2521 | | |
2521 | | - | |
| 2522 | + | |
2522 | 2523 | | |
2523 | 2524 | | |
2524 | 2525 | | |
2525 | 2526 | | |
2526 | 2527 | | |
2527 | 2528 | | |
2528 | | - | |
| 2529 | + | |
2529 | 2530 | | |
2530 | 2531 | | |
2531 | 2532 | | |
| |||
2540 | 2541 | | |
2541 | 2542 | | |
2542 | 2543 | | |
2543 | | - | |
| 2544 | + | |
2544 | 2545 | | |
2545 | 2546 | | |
2546 | 2547 | | |
| |||
0 commit comments