Skip to content

Commit 0206421

Browse files
committed
Add ceph management doc
1 parent e0216a6 commit 0206421

File tree

1 file changed

+123
-0
lines changed

1 file changed

+123
-0
lines changed
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
==========================
2+
Managing Ceph with Cephadm
3+
==========================
4+
5+
cephadm configuration location
6+
==============================
7+
8+
In kayobe-config repository, under ``etc/kayobe/cephadm.yml`` (or in a specific
9+
Kayobe environment when using multiple environment, e.g.
10+
``etc/kayobe/environments/production/cephadm.yml``)
11+
12+
StackHPC's cephadm Ansible collection relies on multiple inventory groups:
13+
14+
- ``mons``
15+
- ``mgrs``
16+
- ``osds``
17+
- ``rgws`` (optional)
18+
19+
Those groups are usually defined in ``etc/kayobe/inventory/groups``.
20+
21+
Running cephadm playbooks
22+
=========================
23+
24+
In kayobe-config repository, under ``etc/kayobe/ansible`` there is a set of
25+
cephadm based playbooks utilising stackhpc.cephadm Ansible Galaxy collection.
26+
27+
- ``cephadm.yml`` - runs the end to end process starting with deployment and
28+
defining EC profiles/crush rules/pools and users
29+
- ``cephadm-crush-rules.yml`` - defines Ceph crush rules according
30+
- ``cephadm-deploy.yml`` - runs the bootstrap/deploy playbook without the
31+
additional playbooks
32+
- ``cephadm-ec-profiles.yml`` - defines Ceph EC profiles
33+
- ``cephadm-gather-keys.yml`` - gather Ceph configuration and keys and populate
34+
kayobe-config
35+
- ``cephadm-keys.yml`` - defines Ceph users/keys
36+
- ``cephadm-pools.yml`` - defines Ceph pools\
37+
38+
Running Ceph commands
39+
=====================
40+
41+
Ceph commands are usually run inside a ``cephadm shell`` utility container:
42+
43+
.. code-block:: console
44+
45+
# From the node that runs Ceph
46+
ceph# sudo cephadm shell
47+
48+
Operating a cluster requires a keyring with an admin access to be available for Ceph
49+
commands. Cephadm will copy such keyring to the nodes carrying
50+
`_admin <https://docs.ceph.com/en/quincy/cephadm/host-management/#special-host-labels>`__
51+
label - present on MON servers by default when using
52+
`StackHPC Cephadm collection <https://github.com/stackhpc/ansible-collection-cephadm>`__.
53+
54+
Adding a new storage node
55+
=========================
56+
57+
Add a node to a respective group (e.g. osds) and run ``cephadm-deploy.yml``
58+
playbook.
59+
60+
.. note::
61+
To add other node types than osds (mons, mgrs, etc) you need to specify
62+
``-e cephadm_bootstrap=True`` on playbook run.
63+
64+
Removing a storage node
65+
=======================
66+
67+
First drain the node
68+
69+
.. code-block:: console
70+
71+
ceph# cephadm shell
72+
ceph# ceph orch host drain <host>
73+
74+
Once all daemons are removed - you can remove the host:
75+
76+
.. code-block:: console
77+
78+
ceph# cephadm shell
79+
ceph# ceph orch host rm <host>
80+
81+
And then remove the host from inventory (usually in
82+
``etc/kayobe/inventory/overcloud``)
83+
84+
Additional options/commands may be found in
85+
`Host management <https://docs.ceph.com/en/latest/cephadm/host-management/>`_
86+
87+
Replacing a Failed Ceph Drive
88+
=============================
89+
90+
Once an OSD has been identified as having a hardware failure,
91+
the affected drive will need to be replaced.
92+
93+
If rebooting a Ceph node, first set ``noout`` to prevent excess data
94+
movement:
95+
96+
.. code-block:: console
97+
98+
ceph# cephadm shell
99+
ceph# ceph osd set noout
100+
101+
Reboot the node and replace the drive
102+
103+
Unset noout after the node is back online
104+
105+
.. code-block:: console
106+
107+
ceph# cephadm shell
108+
ceph# ceph osd unset noout
109+
110+
Remove the OSD using Ceph orchestrator command:
111+
112+
.. code-block:: console
113+
114+
ceph# cephadm shell
115+
ceph# ceph orch osd rm <ID> --replace
116+
117+
After removing OSDs, if the drives the OSDs were deployed on once again become
118+
available, cephadm may automatically try to deploy more OSDs on these drives if
119+
they match an existing drivegroup spec.
120+
If this is not your desired action plan - it's best to modify the drivegroup
121+
spec before (``cephadm_osd_spec`` variable in ``etc/kayobe/cephadm.yml``).
122+
Either set ``unmanaged: true`` to stop cephadm from picking up new disks or
123+
modify it in some way that it no longer matches the drives you want to remove.

0 commit comments

Comments
 (0)