docs: Add instructions for Ceph shutdown

Alex-Welsh · MoteHue · cityofships · Alex-Welsh · commit cfe92da2a031 · 2025-08-29T10:31:19.000+01:00
Co-authored-by: Matt Crees &lt;mattc@stackhpc.com&gt;
Co-authored-by: Piotr Parczewski &lt;19818516+cityofships@users.noreply.github.com&gt;
diff --git a/doc/source/operations/control-plane-operation.rst b/doc/source/operations/control-plane-operation.rst
@@ -174,8 +174,30 @@ is advisable to migrate all of the instances to another machine. See
 Ceph
 ----
 
-The following guide provides a good overview:
-https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/director_installation_and_usage/sect-rebooting-ceph
+#. Check that the cluster is healthy (i.e. ``ceph -s``). Where possible, solve
+   or isolate any issues before the shutdown e.g. by marking unhealthy OSDs as
+   'out' in the cluster.
+
+#. Stop all clients. This includes
+
+   * **All** OpenStack VMs (if their storage is RBD-backed).
+
+   * CephFS mounts.
+
+   * Ceph-backed OpenStack services such as Glance, Cinder, Manila, and RGW/S3/Swift.
+
+#. Set the ``noout`` flag, so that the cluster does not attempt to redistribute
+   data when OSDs go down. Use the following command on a MON node:
+
+   .. code-block:: console
+
+      sudo cephadm shell -- ceph osd set noout
+
+#. Shut down all the nodes, with those holding MON services last.
+
+Note that if it is not desired for Ceph services to automatically start later
+with the operating system, extra steps need to be taken and are not described
+here.
 
 Shutting down the seed VM
 -------------------------
@@ -201,6 +223,24 @@ following order:
 * Shut down seed VM
 * Shut down Ansible control host
 
+Full startup
+-------------
+
+If the entire control plane is powered down, it is best to bring the nodes up
+in the reverse order of shutdown:
+
+* Power on Ansible control host
+* Power on seed VM (and other service VMs)
+* Power on Ceph nodes (if applicable)
+   * Where possible, start the nodes running MON services first.
+   * Make sure that all OSD services are back up and running. At this point
+     it is safe to unset  the ``noout`` cluster flag.
+* Power on controllers
+* Power on network nodes (if separate from controllers)
+* Power on monitoring node (if separate from controllers)
+* Power on compute nodes
+* Power on virtual machines
+
 Rebooting a node
 ----------------