Skip to content

Commit 1733d95

Browse files
authored
Merge pull request ceph#64940 from zdover23/wip-doc-2025-08-11-cephfs-troubleshooting-2
doc/cephfs: edit troubleshooting.rst Reviewed-by: Anthony D'Atri <[email protected]>
2 parents efadd26 + c897107 commit 1733d95

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

doc/cephfs/troubleshooting.rst

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -369,16 +369,17 @@ will switch to doing writes synchronously. Synchronous writes are quite slow.
369369

370370
Disconnected+Remounted FS
371371
=========================
372-
Because CephFS has a "consistent cache", if your network connection is
373-
disrupted for a long enough time, the client will be forcibly
374-
disconnected from the system. At this point, the kernel client is in
375-
a bind: it cannot safely write back dirty data, and many applications
376-
do not handle IO errors correctly on close().
377-
At the moment, the kernel client will remount the FS, but outstanding file system
378-
IO may or may not be satisfied. In these cases, you may need to reboot your
372+
373+
Because CephFS has a "consistent cache", your client is forcibly disconnected
374+
from the cluster when the network connection has been disrupted for a long
375+
time. When this happens, the kernel client cannot safely write back dirty data
376+
and many applications will not handle IO errors correctly on ``close()``.
377+
Currently, the kernel client will remount the file system, but any outstanding
378+
file-system IO may not be properly handled. If this is the case, reboot the
379379
client system.
380380

381-
You can identify you are in this situation if dmesg/kern.log report something like::
381+
You are in this situation if the output of ``dmesg/kern.log`` contains
382+
something like the following::
382383

383384
Jul 20 08:14:38 teuthology kernel: [3677601.123718] ceph: mds0 closed our session
384385
Jul 20 08:14:38 teuthology kernel: [3677601.128019] ceph: mds0 reconnect start
@@ -389,11 +390,12 @@ You can identify you are in this situation if dmesg/kern.log report something li
389390
Jul 20 08:14:40 teuthology kernel: [3677603.126214] libceph: mds0 172.21.5.114:6812 connection reset
390391
Jul 20 08:14:40 teuthology kernel: [3677603.132176] libceph: reset on mds0
391392

392-
This is an area of ongoing work to improve the behavior. Kernels will soon
393-
be reliably issuing error codes to in-progress IO, although your application(s)
394-
may not deal with them well. In the longer-term, we hope to allow reconnect
395-
and reclaim of data in cases where it won't violate POSIX semantics (generally,
396-
data which hasn't been accessed or modified by other clients).
393+
This is an area of ongoing work to improve the behavior. Kernels will soon be
394+
reliably issuing error codes to in-progress IO, although your application(s)
395+
may not deal with them well. In the longer term, we hope to allow reconnection
396+
and reclamation of data in cases where doing so does not violate POSIX
397+
semantics (generally, data which hasn't been accessed or modified by other
398+
clients).
397399

398400
Mounting
399401
========

0 commit comments

Comments
 (0)