@@ -174,8 +174,30 @@ is advisable to migrate all of the instances to another machine. See
174
174
Ceph
175
175
----
176
176
177
- The following guide provides a good overview:
178
- https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/director_installation_and_usage/sect-rebooting-ceph
177
+ #. Check that the cluster is healthy (i.e. ``ceph -s ``). Where possible, solve
178
+ or isolate any issues before the shutdown e.g. by marking unhealthy OSDs as
179
+ 'out' in the cluster.
180
+
181
+ #. Stop all clients. This includes
182
+
183
+ * **All ** OpenStack VMs (if their storage is RBD-backed).
184
+
185
+ * CephFS mounts.
186
+
187
+ * Ceph-backed OpenStack services such as Glance, Cinder, Manila, and RGW/S3/Swift.
188
+
189
+ #. Set the ``noout `` flag, so that the cluster does not attempt to redistribute
190
+ data when OSDs go down. Use the following command on a MON node:
191
+
192
+ .. code-block :: console
193
+
194
+ sudo cephadm shell -- ceph osd set noout
195
+
196
+ #. Shut down all the nodes, with those holding MON services last.
197
+
198
+ Note that if it is not desired for Ceph services to automatically start later
199
+ with the operating system, extra steps need to be taken and are not described
200
+ here.
179
201
180
202
Shutting down the seed VM
181
203
-------------------------
@@ -201,6 +223,24 @@ following order:
201
223
* Shut down seed VM
202
224
* Shut down Ansible control host
203
225
226
+ Full startup
227
+ -------------
228
+
229
+ If the entire control plane is powered down, it is best to bring the nodes up
230
+ in the reverse order of shutdown:
231
+
232
+ * Power on Ansible control host
233
+ * Power on seed VM (and other service VMs)
234
+ * Power on Ceph nodes (if applicable)
235
+ * Where possible, start the nodes running MON services first.
236
+ * Make sure that all OSD services are back up and running. At this point
237
+ it is safe to unset the ``noout `` cluster flag.
238
+ * Power on controllers
239
+ * Power on network nodes (if separate from controllers)
240
+ * Power on monitoring node (if separate from controllers)
241
+ * Power on compute nodes
242
+ * Power on virtual machines
243
+
204
244
Rebooting a node
205
245
----------------
206
246
0 commit comments