Skip to content

Commit 72d55ee

Browse files
bill-scalesaainscow
authored andcommitted
osd: Optimized EC calculate_maxles_and_minlua needs to use ...
exclude_nonprimary_shards When an optimized EC pool is searching for the best shard that isn't a non-primary shard then the calculation for maxles and minlua needs to exclude nonprimary-shards This bug was seen in a test run where activating a PG was interrupted by a new epoch and only a couple of non-primary shards became active and updated les. In the next epoch a new primary (without log) failed to find a shard that wasn't non-primary with the latest les. The les of non-primary shards should be ignored when looking for an appropriate shard to get the full log from. This is safe because an epoch cannot start I/O without at least K shards that have updated les, and there are always K-1 non-primary shards. If I/O has started then we will find the latest les even if we skip non-primary shards. If I/O has not started then the latest les ignoring non-primary shards is the last epoch in which I/O was started and has a good enough log+missing list. Signed-off-by: Bill Scales <[email protected]>
1 parent 3c2161e commit 72d55ee

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

src/osd/PeeringState.cc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1615,13 +1615,17 @@ void PeeringState::reject_reservation()
16151615
void PeeringState::calculate_maxles_and_minlua( const map<pg_shard_t, pg_info_t> &infos,
16161616
epoch_t& max_last_epoch_started,
16171617
eversion_t& min_last_update_acceptable,
1618+
bool exclude_nonprimary_shards,
16181619
bool *history_les_bound) const
16191620
{
16201621
/* See doc/dev/osd_internals/last_epoch_started.rst before attempting
16211622
* to make changes to this process. Also, make sure to update it
16221623
* when you find bugs! */
16231624
max_last_epoch_started = 0;
16241625
for (auto i = infos.begin(); i != infos.end(); ++i) {
1626+
if (exclude_nonprimary_shards &&
1627+
pool.info.is_nonprimary_shard(shard_id_t(i->first.shard)))
1628+
continue;
16251629
if (!cct->_conf->osd_find_best_info_ignore_history_les &&
16261630
max_last_epoch_started < i->second.history.last_epoch_started) {
16271631
if (history_les_bound) {
@@ -1665,6 +1669,7 @@ map<pg_shard_t, pg_info_t>::const_iterator PeeringState::find_best_info(
16651669
calculate_maxles_and_minlua( infos,
16661670
max_last_epoch_started,
16671671
min_last_update_acceptable,
1672+
exclude_nonprimary_shards,
16681673
history_les_bound);
16691674

16701675
if (min_last_update_acceptable == eversion_t::max())

src/osd/PeeringState.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1691,6 +1691,7 @@ class PeeringState : public MissingLoc::MappingInfo {
16911691
void calculate_maxles_and_minlua( const std::map<pg_shard_t, pg_info_t> &infos,
16921692
epoch_t& max_last_epoch_started,
16931693
eversion_t& min_last_update_acceptable,
1694+
bool exclude_nonprimary_shards = false,
16941695
bool *history_les_bound = nullptr) const;
16951696

16961697
// acting std::set

0 commit comments

Comments
 (0)