Skip to content

Commit 282558c

Browse files
committed
mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state
Reverts the change from ceph#53993 and directly clears daemon health metrics for down and out OSDs. The former approach of removing down/out OSDs from the daemon state has undesirable consequences for stat output, including the prometheus exporter. Fixes: https://tracker.ceph.com/issues/66168 Signed-off-by: Cory Snyder <[email protected]>
1 parent abadee6 commit 282558c

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

src/mgr/Mgr.cc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -498,7 +498,7 @@ void Mgr::handle_osd_map()
498498
cluster_state.with_osdmap_and_pgmap([this, &names_exist](const OSDMap &osd_map,
499499
const PGMap &pg_map) {
500500
for (int osd_id = 0; osd_id < osd_map.get_max_osd(); ++osd_id) {
501-
if (!osd_map.exists(osd_id) || (osd_map.is_out(osd_id) && osd_map.is_down(osd_id))) {
501+
if (!osd_map.exists(osd_id)) {
502502
continue;
503503
}
504504

@@ -510,9 +510,16 @@ void Mgr::handle_osd_map()
510510
if (daemon_state.is_updating(k)) {
511511
continue;
512512
}
513+
514+
DaemonStatePtr daemon = daemon_state.get(k);
515+
516+
if (daemon && osd_map.is_out(osd_id) && osd_map.is_down(osd_id)) {
517+
std::lock_guard l(daemon->lock);
518+
daemon->daemon_health_metrics.clear();
519+
}
513520

514521
bool update_meta = false;
515-
if (daemon_state.exists(k)) {
522+
if (daemon) {
516523
if (osd_map.get_up_from(osd_id) == osd_map.get_epoch()) {
517524
dout(4) << "Mgr::handle_osd_map: osd." << osd_id
518525
<< " joined cluster at " << "e" << osd_map.get_epoch()

0 commit comments

Comments
 (0)