Skip to content

Commit fba50e6

Browse files
authored
Merge pull request ceph#64832 from zdover23/wip-doc-2025-08-05-cephfs-troubleshooting-stuck-during-recovery
doc/cephfs: edit troubleshooting.rst Reviewed-by: Anthony D'Atri <[email protected]>
2 parents 2b65f4a + 969c01f commit fba50e6

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

doc/cephfs/troubleshooting.rst

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,34 +27,35 @@ Stuck during recovery
2727
Stuck in up:replay
2828
------------------
2929

30-
If your MDS is stuck in ``up:replay`` then it is likely that the journal is
31-
very long. Did you see ``MDS_HEALTH_TRIM`` cluster warnings saying the MDS is
32-
behind on trimming its journal? If the journal has grown very large, it can
33-
take hours to read the journal. There is no working around this but there
34-
are things you can do to speed things along:
30+
If your MDS is stuck in the ``up:replay`` state, then it is likely that the
31+
journal is very long. Did you see ``MDS_HEALTH_TRIM`` cluster warnings saying
32+
the MDS is behind on trimming its journal? Very large journals can take hours
33+
to read. There is no working around this but there are things you can do to
34+
speed things along:
3535

36-
Reduce MDS debugging to 0. Even at the default settings, the MDS logs some
37-
messages to memory for dumping if a fatal error is encountered. You can avoid
38-
this:
36+
Reduce MDS debugging to 0. Even with the default settings, the MDS logs a few
37+
messages to memory for dumping in case a fatal error is encountered. You can
38+
turn off all logging by running the following commands:
3939

40-
.. code:: bash
40+
.. prompt:: bash #
4141

4242
ceph config set mds debug_mds 0
4343
ceph config set mds debug_ms 0
4444
ceph config set mds debug_monc 0
4545

46-
Note if the MDS fails then there will be virtually no information to determine
47-
why. If you can calculate when ``up:replay`` will complete, you should restore
48-
these configs just prior to entering the next state:
46+
Remember that when you set ``debug_mds``, ``debug_ms``, and ``debug_monc`` to
47+
``0``, Note if the MDS fails then there will be no information to determine why
48+
fatal errors occurred. If you can calculate when ``up:replay`` will complete,
49+
you should restore these configs just prior to entering the next state:
4950

5051
.. code:: bash
5152
5253
ceph config rm mds debug_mds
5354
ceph config rm mds debug_ms
5455
ceph config rm mds debug_monc
5556
56-
Once you've got replay moving along faster, you can calculate when the MDS will
57-
complete. This is done by examining the journal replay status:
57+
After replay has been speeded up, calculate when the MDS will complete the
58+
replay. Examine the journal replay status:
5859

5960
.. code:: bash
6061
@@ -68,7 +69,7 @@ complete. This is done by examining the journal replay status:
6869
}
6970
7071
Replay completes when the ``journal_read_pos`` reaches the
71-
``journal_write_pos``. The write position will not change during replay. Track
72+
``journal_write_pos``. The write position does not change during replay. Track
7273
the progression of the read position to compute the expected time to complete.
7374

7475

0 commit comments

Comments
 (0)