@@ -27,34 +27,35 @@ Stuck during recovery
2727Stuck in up:replay
2828------------------
2929
30- If your MDS is stuck in ``up:replay `` then it is likely that the journal is
31- very long. Did you see ``MDS_HEALTH_TRIM `` cluster warnings saying the MDS is
32- behind on trimming its journal? If the journal has grown very large, it can
33- take hours to read the journal . There is no working around this but there
34- are things you can do to speed things along:
30+ If your MDS is stuck in the ``up:replay `` state, then it is likely that the
31+ journal is very long. Did you see ``MDS_HEALTH_TRIM `` cluster warnings saying
32+ the MDS is behind on trimming its journal? Very large journals can take hours
33+ to read. There is no working around this but there are things you can do to
34+ speed things along:
3535
36- Reduce MDS debugging to 0. Even at the default settings, the MDS logs some
37- messages to memory for dumping if a fatal error is encountered. You can avoid
38- this :
36+ Reduce MDS debugging to 0. Even with the default settings, the MDS logs a few
37+ messages to memory for dumping in case a fatal error is encountered. You can
38+ turn off all logging by running the following commands :
3939
40- .. code :: bash
40+ .. prompt :: bash #
4141
4242 ceph config set mds debug_mds 0
4343 ceph config set mds debug_ms 0
4444 ceph config set mds debug_monc 0
4545
46- Note if the MDS fails then there will be virtually no information to determine
47- why. If you can calculate when ``up:replay `` will complete, you should restore
48- these configs just prior to entering the next state:
46+ Remember that when you set ``debug_mds ``, ``debug_ms ``, and ``debug_monc `` to
47+ ``0 ``, Note if the MDS fails then there will be no information to determine why
48+ fatal errors occurred. If you can calculate when ``up:replay `` will complete,
49+ you should restore these configs just prior to entering the next state:
4950
5051.. code :: bash
5152
5253 ceph config rm mds debug_mds
5354 ceph config rm mds debug_ms
5455 ceph config rm mds debug_monc
5556
56- Once you've got replay moving along faster, you can calculate when the MDS will
57- complete. This is done by examining the journal replay status:
57+ After replay has been speeded up, calculate when the MDS will complete the
58+ replay. Examine the journal replay status:
5859
5960.. code :: bash
6061
@@ -68,7 +69,7 @@ complete. This is done by examining the journal replay status:
6869 }
6970
7071 Replay completes when the ``journal_read_pos `` reaches the
71- ``journal_write_pos ``. The write position will not change during replay. Track
72+ ``journal_write_pos ``. The write position does not change during replay. Track
7273the progression of the read position to compute the expected time to complete.
7374
7475
0 commit comments