Skip to content

Commit 1ce5bae

Browse files
committed
Full shutdown document added
1 parent 86a01a4 commit 1ce5bae

File tree

3 files changed

+209
-14
lines changed

3 files changed

+209
-14
lines changed

source/full_shutdown.rst

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
.. include:: vars.rst
2+
3+
=======================
4+
Full Shutdown Procedure
5+
=======================
6+
7+
In case a full shutdown of the system is required, we advise to use the
8+
following order:
9+
10+
* Perform a graceful shutdown of all virtual machine instances
11+
* Stop Ceph (if applicable)
12+
* Put all nodes into maintenance mode in Bifrost
13+
* Shut down compute nodes
14+
* Shut down monitoring node
15+
* Shut down network nodes (if separate from controllers)
16+
* Shut down controllers
17+
* Shut down Ceph nodes (if applicable)
18+
* Shut down seed VM
19+
* Shut down Ansible control host
20+
21+
Virtual Machines shutdown
22+
-------------------------
23+
24+
Contact Openstack users to stop their virtual machines gracefully,
25+
If that is not possible shut down VMs using openstack CLI as admin user:
26+
27+
.. code-block:: bash
28+
29+
for i in `openstack server list --all-projects -c ID -f value` ; \
30+
do openstack server stop $i ; done
31+
32+
33+
.. ifconfig:: deployment['ceph_managed']
34+
35+
Stop Ceph
36+
---------
37+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
38+
39+
- Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS)
40+
- Check if cluster is in healthy state
41+
42+
.. code-block:: bash
43+
44+
ceph status
45+
46+
- Stop CephFS (if applicable)
47+
48+
Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank.
49+
50+
.. code-block:: bash
51+
52+
ceph fs set FS_NAME max_mds 1
53+
ceph mds deactivate FS_NAME:1 # rank 2 of 2
54+
ceph status # wait for rank 1 to finish stopping
55+
ceph fs set FS_NAME cluster_down true
56+
ceph mds fail FS_NAME:0
57+
58+
Setting the cluster_down flag prevents standbys from taking over the failed rank.
59+
60+
- Set the noout, norecover, norebalance, nobackfill, nodown and pause flags.
61+
62+
.. code-block:: bash
63+
64+
ceph osd set noout
65+
ceph osd set norecover
66+
ceph osd set norebalance
67+
ceph osd set nobackfill
68+
ceph osd set nodown
69+
ceph osd set pause
70+
71+
- Shut down the OSD nodes one by one:
72+
73+
.. code-block:: bash
74+
75+
systemctl stop ceph-osd.target
76+
77+
- Shut down the monitor/manager nodes one by one:
78+
79+
.. code-block:: bash
80+
81+
systemctl stop ceph.target
82+
83+
Set Bifrost maintenance mode
84+
----------------------------
85+
86+
Set maintenance mode in bifrost to prevent nodes from automatically
87+
powering back on
88+
89+
.. code-block:: bash
90+
91+
bifrost# for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
92+
do openstack --os-cloud bifrost baremetal node maintenance set --reason full-shutdown $i ; done
93+
94+
Shut down nodes
95+
---------------
96+
97+
Shut down nodes one at a time gracefully using:
98+
99+
.. code-block:: bash
100+
101+
systemctl poweroff
102+
103+
Shut down the seed VM
104+
---------------------
105+
106+
Shut down seed vm on ansible control host gracefully using:
107+
108+
.. code-block:: bash
109+
:substitutions:
110+
111+
ssh stack@|seed_name| sudo systemctl poweroff
112+
virsh shutdown |seed_name|
113+
114+
.. _full-power-on:
115+
116+
Full Power on Procedure
117+
-----------------------
118+
119+
* Start ansible control host and seed vm
120+
* Remove nodes from maintenance mode in bifrost
121+
* Recover MariaDB cluster
122+
* Start Ceph (if applicable)
123+
* Check that all docker containers are running
124+
* Check Kibana for any messages with log level ERROR or equivalent
125+
126+
Start Ansible Control Host
127+
--------------------------
128+
129+
The Ansible control host is not enrolled in Bifrost and will have to be powered
130+
on manually.
131+
132+
Start Seed VM
133+
-------------
134+
135+
The seed VM (and any other service VM) should start automatically when the seed
136+
hypervisor is powered on. If it does not, it can be started with:
137+
138+
.. code-block:: bash
139+
140+
virsh start seed-0
141+
142+
Unset Bifrost maintenance mode
143+
------------------------------
144+
145+
Unsetting maintenance mode in bifrost should automatically power on the nodes
146+
147+
.. code-block:: bash
148+
149+
bifrost# for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
150+
do openstack --os-cloud bifrost baremetal node maintenance unset $i ; done
151+
152+
Recover MariaDB cluster
153+
-----------------------
154+
155+
If all of the servers were shut down at the same time, it is necessary to run a
156+
script to recover the database once they have all started up. This can be done
157+
with the following command:
158+
159+
.. code-block:: bash
160+
161+
kayobe# kayobe overcloud database recover
162+
163+
.. ifconfig:: deployment['ceph_managed']
164+
165+
Start Ceph
166+
----------
167+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
168+
169+
- Start monitor/manager nodes:
170+
171+
.. code-block:: bash
172+
173+
systemctl start ceph.target
174+
175+
- Start the OSD nodes:
176+
177+
.. code-block:: bash
178+
179+
systemctl start ceph-osd.target
180+
181+
- Wait for all the nodes to come up
182+
183+
- Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags
184+
185+
.. code-block:: bash
186+
187+
ceph osd unset noout
188+
ceph osd unset norecover
189+
ceph osd unset norebalance
190+
ceph osd unset nobackfill
191+
ceph osd unset nodown
192+
ceph osd unset pause
193+
194+
- Start CephFS (if applicable)
195+
196+
CephFS cluster must be brought back up by setting the cluster_down flag to false
197+
198+
.. code-block:: bash
199+
200+
ceph fs set FS_NAME cluster_down false
201+
202+
- Verify ceph cluster status
203+
204+
.. code-block:: bash
205+
206+
ceph status

source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Contents
2424
ceph_storage
2525
managing_users_and_projects
2626
operations_and_monitoring
27+
full_shutdown
2728
customising_deployment
2829
gpus_in_openstack
2930

source/operations_and_monitoring.rst

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -502,22 +502,10 @@ Shutting down the seed VM
502502
kayobe# ssh stack@|seed_name| sudo systemctl poweroff
503503
kayobe# virsh shutdown |seed_name|
504504
505-
.. _full-shutdown:
506-
507505
Full shutdown
508506
-------------
509507

510-
In case a full shutdown of the system is required, we advise to use the
511-
following order:
512-
513-
* Perform a graceful shutdown of all virtual machine instances
514-
* Shut down compute nodes
515-
* Shut down monitoring node
516-
* Shut down network nodes (if separate from controllers)
517-
* Shut down controllers
518-
* Shut down Ceph nodes (if applicable)
519-
* Shut down seed VM
520-
* Shut down Ansible control host
508+
Follow separate :doc:`document <full_shutdown>`.
521509

522510
Rebooting a node
523511
----------------
@@ -575,7 +563,7 @@ hypervisor is powered on. If it does not, it can be started with:
575563
Full power on
576564
-------------
577565

578-
Follow the order in :ref:`full-shutdown`, but in reverse order.
566+
Follow separate :ref:`document <full-power-on>`.
579567

580568
Shutting Down / Restarting Monitoring Services
581569
----------------------------------------------

0 commit comments

Comments
 (0)