1
1
Investigating a Failed Ceph Drive
2
2
---------------------------------
3
3
4
- After deployment, when a drive fails it may cause OSD crashes in Ceph .
5
- If Ceph detects crashed OSDs, it will go into `HEALTH_WARN ` state.
4
+ A failing drive in a Ceph cluster will cause OSD daemon to crash .
5
+ In this case Ceph will go into `HEALTH_WARN ` state.
6
6
Ceph can report details about failed OSDs by running:
7
7
8
- .. ifconfig :: deployment['cephadm']
8
+ .. code-block :: console
9
9
10
- .. note ::
10
+ ceph# ceph health detail
11
11
12
- Remember to run ceph/rbd commands after issuing ``cephadm shell`` or
13
- installing ceph clients.
14
- It is also important to run the commands on the hosts with _admin label
15
- (Ceph monitors by default).
12
+ .. ifconfig :: deployment['cephadm']
16
13
17
- .. code-block :: console
14
+ .. note ::
18
15
19
- ceph# ceph health detail
16
+ Remember to run ceph/rbd commands from within ``cephadm shell``
17
+ (preferred method) or after installing Ceph client. Details in the
18
+ official `documentation <https://docs.ceph.com/en/quincy/cephadm/install/#enable-ceph-cli>`__.
19
+ It is also required that the host where commands are executed has admin
20
+ Ceph keyring present - easiest to achieve by applying
21
+ `_admin <https://docs.ceph.com/en/quincy/cephadm/host-management/#special-host-labels>`__
22
+ label (Ceph MON servers have it by default when using
23
+ `StackHPC Cephadm collection <https://github.com/stackhpc/ansible-collection-cephadm>`__).
20
24
21
25
A failed OSD will also be reported as down by running:
22
26
@@ -26,7 +30,7 @@ A failed OSD will also be reported as down by running:
26
30
27
31
Note the ID of the failed OSD.
28
32
29
- The failed hardware device is logged by the Linux kernel:
33
+ The failed disk is usually logged by the Linux kernel too :
30
34
31
35
.. code-block :: console
32
36
0 commit comments