Skip to content

Commit c55eb8a

Browse files
committed
doc/cephfs: edit troubleshooting.rst
Edit "Avoiding Recovery Roadblocks" in the "Stuck During Recovery" section of doc/cephfs/troubleshooting.rst. This commit follows ceph#64854. Signed-off-by: Zac Dover <[email protected]>
1 parent ebf66bf commit c55eb8a

File tree

1 file changed

+19
-16
lines changed

1 file changed

+19
-16
lines changed

doc/cephfs/troubleshooting.rst

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -91,13 +91,14 @@ Do the following when restoring your file system:
9191
``refuse_client_session`` file-system setting to prevent new sessions from
9292
connecting to the CephFS.
9393

94-
* **Extend the MDS heartbeat grace period.** This avoids replacing an MDS that
95-
appears "stuck" during some operation. Sometimes recovery of an MDS may
96-
involve an operation that takes longer than expected (from the programmer's
97-
perspective). This is more likely when recovery is already taking longer than
98-
normal to complete (indicated by your reading this document). Avoid
99-
unnecessary replacement loops by running the following command and extending
100-
the heartbeat grace period:
94+
* **Extend the MDS heartbeat grace period.** Doing this causes the system to
95+
avoid replacing an MDS that becomes "stuck" during an operation. Sometimes
96+
recovery of an MDS may involve operations that take longer than expected
97+
(from the programmer's perspective). This is more likely when recovery has
98+
already taken longer than normal to complete (which, if you're reading this
99+
document, is likely the situation you find yourself in). Avoid unnecessary
100+
replacement loops by running the following command and extending the
101+
heartbeat grace period:
101102

102103
.. prompt:: bash #
103104

@@ -111,19 +112,21 @@ Do the following when restoring your file system:
111112
* **Disable open-file-table prefetch.** Under normal circumstances, the MDS
112113
prefetches directory contents during recovery as a way of heating up its
113114
cache. During a long recovery, the cache is probably already hot **and
114-
large**. So this behavior is unnecessary and can be undesirable. Disable
115-
open-file-table prefetching by running the following command:
115+
large**. If the cache is already hot and large, this prefetching is
116+
unnecessary and can be undesirable. Disable open-file-table prefetching by
117+
running the following command:
116118

117119
.. prompt:: bash #
118120

119121
ceph config set mds mds_oft_prefetch_dirfrags false
120122

121123
* **Turn off clients.** Clients that reconnect to the newly ``up:active`` MDS
122124
can create new load on the file system just as it is becoming operational.
123-
Maintenance is often necessary before allowing clients to connect to the file
124-
system and resuming a regular workload. For example, expediting the trimming
125-
of journals may be advisable if the recovery took a long time because replay
126-
was reading a very large journal.
125+
This is often undesirable. Maintenance is often necessary before allowing
126+
clients to connect to the file system and before resuming a regular workload.
127+
For example, expediting the trimming of journals may be advisable if the
128+
recovery took a long time due to the amount of time replay spent in reading a
129+
very large journal.
127130

128131
Client sessions can be refused manually, or by using the
129132
``refuse_client_session`` tunable as in the following command:
@@ -135,9 +138,9 @@ Do the following when restoring your file system:
135138
This command has the effect of preventing clients from establishing new
136139
sessions with the MDS.
137140

138-
* **Do not tweak max_mds.** Modifying the file system setting variable
139-
``max_mds`` is sometimes thought to be good step during troubleshooting or
140-
recovery. But modifying ``max_mds`` might have the effect of further
141+
* **Do not tweak max_mds.** Modifying the file-system setting variable
142+
``max_mds`` may seem like a good idea during troubleshooting and recovery,
143+
but it probably isn't. Modifying ``max_mds`` might have the effect of further
141144
destabilizing the cluster. If ``max_mds`` must be changed in such
142145
circumstances, run the command to change ``max_mds`` with the confirmation
143146
flag (``--yes-i-really-mean-it``).

0 commit comments

Comments
 (0)