Skip to content

Commit 29f8fa4

Browse files
Merge pull request #4326 from ovh/YC-Nutanix-cluster-firmware-update
Nutanix - Updating your cluster firmware
2 parents 83b7c27 + ca73b00 commit 29f8fa4

19 files changed

+231
-0
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
---
2+
title: Updating your Nutanix cluster firmware
3+
slug: nutanix-cluster-firmware-update
4+
excerpt: Find out how to update your Nutanix cluster firmware
5+
section: Upgrade
6+
order: 01
7+
updated: 2023-03-08
8+
---
9+
10+
**Last updated 8th March 2023**
11+
12+
## Objective
13+
14+
This article provides you with the steps to update Nutanix clusters firmwares by putting each node in maintenance, before rebooting in rescue mode one node at a time.
15+
16+
Our services will take over to apply updates firmwares and will restart the node once done.
17+
18+
> [!warning]
19+
> Before beginning any action, log in to your [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.co.uk/&ovhSubsidiary=GB) and create a support request ticket to ask a firmware update and provide the OVHcloud support teams with the technical elements regarding your cluster.
20+
21+
**Find out how to update your Nutanix cluster firmware.**
22+
23+
## Requirements
24+
25+
- A Nutanix cluster in your OVHcloud account
26+
- Access to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.co.uk/&ovhSubsidiary=GB)
27+
- Consulting the guide [First steps to use the OVHcloud API](https://docs.ovh.com/gb/en/api/first-steps-with-ovh-api/) (to familiarise yourself with the OVHcloud API)
28+
29+
## Instructions
30+
31+
Before any action, log in to your Prism Element interface and perform the following tasks:
32+
33+
- Check that the cluster's "**Data Resiliency Status**" is `OK`
34+
35+
This can be verified on the main dashboard of your Prism Element (PE) interface:
36+
37+
![Prism element - Data Resiliency Status](images/nutanix-cluster-fw-update-01.png){.thumbnail}
38+
39+
- Run a NCC check
40+
41+
In the Prism Element interface, click `Health`{.action} from the main menu.
42+
43+
![Prism element - health](images/nutanix-cluster-fw-update-02.png){.thumbnail}
44+
45+
Then click `Actions`{.action} to the right and click `Run NCC Checks`{.action}.
46+
47+
![Prism element - Run NCC checks](images/nutanix-cluster-fw-update-03.png){.thumbnail}
48+
49+
Select `All checks`{.action} and click `Run`{.action}.
50+
51+
![Prism element - run checks](images/nutanix-cluster-fw-update-03b.png){.thumbnail}
52+
53+
A log file called `/home/nutanix/data/logs/ncc-output-latest.log` will be generated at the end of checks.
54+
55+
Please analyze it carefully. If you find errors or fails about cluster or service state, do not continue and contact the OVHcloud support.
56+
57+
> [!primary]
58+
> It is possible to run ncc checks on the CVM by typing the following command from a terminal.
59+
60+
```bash
61+
ncc health_checks run_all
62+
```
63+
64+
### Enabling maintenance mode
65+
66+
Nodes will be updated one by one, the Nutanix cluster will continue to work properly.
67+
68+
To log in to CVM, you can launch IPMI from your OVHcloud Control Panel or use a terminal.
69+
70+
> [!primary]
71+
> Before putting the host in maintenance, ensure remaining hosts have enough resources to host migrated VMS from it (CPU, Memory, storage).
72+
73+
#### Connect to CVM
74+
75+
At the login prompt, log in with root credentials to access the host terminal.<br>
76+
Then open an SSH connection to any CVM with Nutanix credentials to access the CVM terminal.
77+
78+
![CVM connection](images/nutanix-cluster-fw-update-04.png){.thumbnail}
79+
80+
#### Check nodes state
81+
82+
Once logged in, check that:
83+
84+
- `Node state` status is set to `AcropolisNormal`.
85+
- `Schedulable` column is set to `True` for all nodes.
86+
87+
Then run the following command to check:
88+
89+
```bash
90+
acli host.list
91+
```
92+
93+
![Checking nodes state](images/nutanix-cluster-fw-update-05.png){.thumbnail}
94+
95+
If all checks are OK, you need to check that the current host state can be changed to maintenance. To do so, use the following command:
96+
97+
```bash
98+
acli host.enter_maintenance_mode_check <Hypervisor_IP>
99+
```
100+
101+
![Checking nodes state](images/nutanix-cluster-fw-update-06.png){.thumbnail}
102+
103+
#### Put a node in maintenance mode
104+
105+
> [!primary]
106+
> VMs with specific policies (like affinity, CPU passthrough...) shall be stopped manually before running maintenance as they will not migrate.
107+
108+
If all hosts are eligible to maintenance mode, put a first host in maintenance mode with the following command:
109+
110+
```bash
111+
acli host.enter_maintenance_mode 192.168.0.1 wait=true
112+
```
113+
114+
![maintenance mode](images/nutanix-cluster-fw-update-07.png){.thumbnail}
115+
116+
> [!warning]
117+
> When hosts enter maintenance mode, all hosted VMs will be migrated on other hosts without any interruption.
118+
119+
#### Shutdown the CVM
120+
121+
Once the host is in maintenance mode, CVM can be shutdown with the following command:
122+
123+
```bash
124+
cvm_shutdown -P now
125+
```
126+
127+
![shutdown CVM](images/nutanix-cluster-fw-update-08.png){.thumbnail}
128+
129+
With root credentials, open a terminal on the node that hosts the CVM and confirm that the CVM is stopped:
130+
131+
```bash
132+
virsh list --all
133+
```
134+
135+
![shutdown CVM](images/nutanix-cluster-fw-update-09.png){.thumbnail}
136+
137+
On the main dashboard, the "**Data Resiliency Status**" will become `Critical`, the cluster is now running with 2 nodes.
138+
139+
![shutdown CVM](images/nutanix-cluster-fw-update-10.png){.thumbnail}
140+
141+
The CVM is now shut down.
142+
143+
### Reboot to rescue mode
144+
145+
Log in to the [OVHcloud Control Panel](https://www.ovh.com/auth/?action=gotomanager&from=https://www.ovh.co.uk/&ovhSubsidiary=GB), go to the `Hosted Private Cloud`{.action}, choose the `Nutanix`{.action} solution and select your cluster.
146+
147+
![OVHcloud Control Panel - cluster access](images/nutanix-cluster-fw-update-11.png){.thumbnail}
148+
149+
Identify the node to boot in rescue mode by using the following OVHcloud API call:
150+
151+
> [!api]
152+
>
153+
> @api {GET} /nutanix/{serviceName}
154+
>
155+
156+
- `serviceName`: enter the cluster name
157+
158+
You can then identify your node name:
159+
160+
![OVHcloud API - node name](images/nutanix-cluster-fw-update-12.png){.thumbnail}
161+
162+
Once you have retrieved the name of the node to reboot in rescue mode, select this node in your OVHcloud Control Panel.
163+
164+
In the `Boot` section, click the `...`{.action} button then click `Edit`{.action}.
165+
166+
![OVHcloud Control Panel - Boot](images/nutanix-cluster-fw-update-13.png){.thumbnail}
167+
168+
Change the netboot by choosing `rescue mode`{.action}, choose the `rescue-customer`{.action} version and click `Next`{.action}.
169+
170+
![OVHcloud Control Panel - Boot](images/nutanix-cluster-fw-update-14.png){.thumbnail}
171+
172+
Confirm your choice.
173+
174+
![OVHcloud Control Panel - Boot](images/nutanix-cluster-fw-update-15.png){.thumbnail}
175+
176+
Once confirmed, a green message will confirm that the new netboot has been updated.
177+
178+
Click again the `...`{.action} button and click `Restart`{.action}.
179+
180+
![OVHcloud Control Panel - Boot](images/nutanix-cluster-fw-update-13.png){.thumbnail}
181+
182+
The server will reboot. Optionally, you can open an IPMI session to follow the reboot of your node.
183+
184+
When the node is booted on `rescue-customer`, update the your support ticket with this information to notify the OVHcloud support teams that they can proceed with the firmware update.
185+
186+
Our support teams will finish the necessary updates, meaning they will:
187+
188+
- restart the node on the local disk, which will start the Nutanix system and the CVM automatically.
189+
- update the ticket to let you know you can exit the node from maintenance mode.
190+
191+
At this time, the node will be up and running, follow the next step to exit the maintenance mode.
192+
193+
### Exit from maintenance mode
194+
195+
After updating the node, our services will reboot the node from local disk. The Nutanix software will load AOS and the CVM will automatically start.
196+
197+
Once the system is up and running, log in to the CVM and run the following command:
198+
199+
```bash
200+
acli host.list
201+
```
202+
203+
As you can see in the output image below, the first node is still in maintenance mode.
204+
205+
![maintenance mode exit](images/nutanix-cluster-fw-update-07.png){.thumbnail}
206+
207+
To exit the node from maintenance mode, run the following command:
208+
209+
```bash
210+
host.exit_maintenance_mode 192.168.0.1
211+
```
212+
213+
![maintenance mode exit](images/nutanix-cluster-fw-update-16.png){.thumbnail}
214+
215+
The host exits from `maintenance` state and goes back to `Normal` state.
216+
217+
Migrated VMs from this node automatically move from other nodes to it.
218+
219+
On the main dashboard, the "**Data Resiliency Status**" will revert to `OK`, the cluster also returns to its nominal state.
220+
221+
![Data Resiliency Status](images/nutanix-cluster-fw-update-01.png){.thumbnail}
222+
223+
Proceed with the remaining nodes one at a time with the same steps.
224+
225+
Please do not open a new ticket, just add comments on the same ticket for each node, specifying the name server (e.g. `ns123456`).
226+
227+
## Go further <a name="gofurther"></a>
228+
229+
Join our community of users on <https://community.ovh.com/en/>.
16.4 KB
Loading
36.7 KB
Loading
23.8 KB
Loading
18.3 KB
Loading
7.03 KB
Loading
53.5 KB
Loading
50.6 KB
Loading
52.8 KB
Loading
125 KB
Loading

0 commit comments

Comments
 (0)