Skip to content

Commit adb448b

Browse files
committed
mds: fix rank 0 marked damaged if stopping fails after Elid flush and log trimmed
steps to reproduce ../src/vstart.sh --debug --new -x --localhost --bluestore ./bin/ceph tell mds.<rank 0> config set mds_kill_shutdown_at 10 ./bin/ceph fs set <fs name> down true wait for a few seconds and will see the following log from take-over mds and rank 0 is marked damaged 2025-09-11T16:47:24.591+0800 785dabeaa6c0 -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank! 2025-09-11T16:47:24.591+0800 785dabeaa6c0 5 mds.beacon.b set_want_state: up:rejoin -> down:damaged During shutdown_pass after submitting Elid and trimming mdlog, mds log will now have only ELid event which does nothing at replay. After replay, no subtree is found. Fix this by checking whther MDLog contains only one event. If so, skip the subtree check for rank 0, and allow it to request STATE_STOPPED just like the other ranks. Fixes: https://tracker.ceph.com/issues/72983 Signed-off-by: ethanwu <[email protected]>
1 parent 098432f commit adb448b

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

src/mds/MDSRank.cc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2032,8 +2032,9 @@ void MDSRank::rejoin_done()
20322032

20332033
// funny case: is our cache empty? no subtrees?
20342034
if (!mdcache->is_subtrees()) {
2035-
if (whoami == 0) {
2036-
// The root should always have a subtree!
2035+
if (whoami == 0 && mdlog->get_num_events() > 1) {
2036+
// The root should always have a subtree except when
2037+
// the mdlog contains only the ELid event
20372038
clog->error() << "No subtrees found for root MDS rank!";
20382039
damaged();
20392040
ceph_assert(mdcache->is_subtrees());

0 commit comments

Comments
 (0)