Skip to content

Commit a78281e

Browse files
authored
Merge pull request ceph#64876 from zdover23/wip-doc-2025-08-07-cephfs-troubleshooting-2
doc/cephfs: edit troubleshooting.rst Reviewed-by: Venky Shankar <[email protected]>
2 parents 16d593e + c55eb8a commit a78281e

File tree

1 file changed

+19
-16
lines changed

1 file changed

+19
-16
lines changed

doc/cephfs/troubleshooting.rst

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -105,13 +105,14 @@ Do the following when restoring your file system:
105105
``refuse_client_session`` file-system setting to prevent new sessions from
106106
connecting to the CephFS.
107107

108-
* **Extend the MDS heartbeat grace period.** This avoids replacing an MDS that
109-
appears "stuck" during some operation. Sometimes recovery of an MDS may
110-
involve an operation that takes longer than expected (from the programmer's
111-
perspective). This is more likely when recovery is already taking longer than
112-
normal to complete (indicated by your reading this document). Avoid
113-
unnecessary replacement loops by running the following command and extending
114-
the heartbeat grace period:
108+
* **Extend the MDS heartbeat grace period.** Doing this causes the system to
109+
avoid replacing an MDS that becomes "stuck" during an operation. Sometimes
110+
recovery of an MDS may involve operations that take longer than expected
111+
(from the programmer's perspective). This is more likely when recovery has
112+
already taken longer than normal to complete (which, if you're reading this
113+
document, is likely the situation you find yourself in). Avoid unnecessary
114+
replacement loops by running the following command and extending the
115+
heartbeat grace period:
115116

116117
.. prompt:: bash #
117118

@@ -125,19 +126,21 @@ Do the following when restoring your file system:
125126
* **Disable open-file-table prefetch.** Under normal circumstances, the MDS
126127
prefetches directory contents during recovery as a way of heating up its
127128
cache. During a long recovery, the cache is probably already hot **and
128-
large**. So this behavior is unnecessary and can be undesirable. Disable
129-
open-file-table prefetching by running the following command:
129+
large**. If the cache is already hot and large, this prefetching is
130+
unnecessary and can be undesirable. Disable open-file-table prefetching by
131+
running the following command:
130132

131133
.. prompt:: bash #
132134

133135
ceph config set mds mds_oft_prefetch_dirfrags false
134136

135137
* **Turn off clients.** Clients that reconnect to the newly ``up:active`` MDS
136138
can create new load on the file system just as it is becoming operational.
137-
Maintenance is often necessary before allowing clients to connect to the file
138-
system and resuming a regular workload. For example, expediting the trimming
139-
of journals may be advisable if the recovery took a long time because replay
140-
was reading a very large journal.
139+
This is often undesirable. Maintenance is often necessary before allowing
140+
clients to connect to the file system and before resuming a regular workload.
141+
For example, expediting the trimming of journals may be advisable if the
142+
recovery took a long time due to the amount of time replay spent in reading a
143+
very large journal.
141144

142145
Client sessions can be refused manually, or by using the
143146
``refuse_client_session`` tunable as in the following command:
@@ -149,9 +152,9 @@ Do the following when restoring your file system:
149152
This command has the effect of preventing clients from establishing new
150153
sessions with the MDS.
151154

152-
* **Do not tweak max_mds.** Modifying the file system setting variable
153-
``max_mds`` is sometimes thought to be good step during troubleshooting or
154-
recovery. But modifying ``max_mds`` might have the effect of further
155+
* **Do not tweak max_mds.** Modifying the file-system setting variable
156+
``max_mds`` may seem like a good idea during troubleshooting and recovery,
157+
but it probably isn't. Modifying ``max_mds`` might have the effect of further
155158
destabilizing the cluster. If ``max_mds`` must be changed in such
156159
circumstances, run the command to change ``max_mds`` with the confirmation
157160
flag (``--yes-i-really-mean-it``).

0 commit comments

Comments
 (0)