Skip to content

Commit 24c52e5

Browse files
authored
Merge pull request #62506 from apinnick/bz2219552-node-health-check-ha
BZ#2219552: HA and node health checks
2 parents c418084 + 6dcec36 commit 24c52e5

11 files changed

+78
-44
lines changed

_topic_maps/_topic_map.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3642,8 +3642,6 @@ Topics:
36423642
File: virt-accessing-vm-consoles
36433643
- Name: Automating Windows installation with sysprep
36443644
File: virt-automating-windows-sysprep
3645-
- Name: Triggering virtual machine failover by resolving a failed node
3646-
File: virt-triggering-vm-failover-resolving-failed-node
36473645
- Name: Installing the QEMU guest agent and VirtIO drivers
36483646
File: virt-installing-qemu-guest-agent
36493647
- Name: Viewing the QEMU guest agent information for virtual machines
@@ -3682,6 +3680,8 @@ Topics:
36823680
File: virt-configuring-mediated-devices
36833681
- Name: Enabling descheduler evictions on virtual machines
36843682
File: virt-enabling-descheduler-evictions
3683+
- Name: About high availability for virtual machines
3684+
File: virt-high-availability-for-vms
36853685
# Importing virtual machines
36863686
- Name: Importing virtual machines
36873687
Dir: importing_vms
@@ -3817,6 +3817,8 @@ Topics:
38173817
File: virt-managing-node-labeling-obsolete-cpu-models
38183818
- Name: Preventing node reconciliation
38193819
File: virt-preventing-node-reconciliation
3820+
- Name: Deleting a failed node to trigger virtual machine failover
3821+
File: virt-triggering-vm-failover-resolving-failed-node
38203822
- Name: Monitoring
38213823
Dir: monitoring
38223824
Topics:

modules/virt-about-node-maintenance.adoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,18 @@
55
[id="virt-about-node-maintenance_{context}"]
66
= About node maintenance mode
77

8-
Nodes can be placed into maintenance mode using the `oc adm` utility, or using `NodeMaintenance` custom resources (CRs).
8+
Nodes can be placed into maintenance mode by using the `oc adm` utility or `NodeMaintenance` custom resources (CRs).
99

1010
[NOTE]
1111
====
12-
The `node-maintenance-operator` (NMO) is no longer shipped with {VirtProductName}. It is now available to deploy as a standalone Operator from the *OperatorHub* in the {product-title} web console, or by using the OpenShift CLI (`oc`).
12+
The `node-maintenance-operator` (NMO) is no longer shipped with {VirtProductName}. It is deployed as a standalone Operator from the *OperatorHub* in the {product-title} web console or by using the OpenShift CLI (`oc`).
13+
14+
For more information on remediation, fencing, and maintaining nodes, see the link:https://access.redhat.com/documentation/en-us/workload_availability_for_red_hat_openshift/23.2/html-single/remediation_fencing_and_maintenance/index#about-remediation-fencing-maintenance[Workload Availability for Red Hat OpenShift] documentation.
1315
====
1416

1517
Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a `LiveMigrate` eviction strategy are live migrated to another node without loss of service. This eviction strategy is configured by default in virtual machine created from common templates but must be configured manually for custom virtual machines.
1618

17-
Virtual machine instances without an eviction strategy are shut down. Virtual machines with a `RunStrategy` of `Running` or `RerunOnFailure` are recreated on another node. Virtual machines with a `RunStrategy` of `Manual` are not automatically restarted.
19+
Virtual machine instances without an eviction strategy are shut down. Virtual machines with a `runStrategy` of `Running` or `RerunOnFailure` are recreated on another node. Virtual machines with a `runStrategy` of `Manual` are not automatically restarted.
1820

1921
[IMPORTANT]
2022
====

modules/virt-about-runstrategies-vms.adoc

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,55 @@
11
// Module included in the following assemblies:
22
//
3-
// * virt/virtual_machines/virt-create-vms.adoc
3+
// * virt/node_maintenance/virt-about-node-maintenance.adoc
44

55
:_content-type: CONCEPT
66
[id="virt-about-runstrategies-vms_{context}"]
7-
= About RunStrategies for virtual machines
7+
= About run strategies for virtual machines
88

9-
A `RunStrategy` for virtual machines determines a virtual machine instance's (VMI) behavior, depending on a series of conditions. The `spec.runStrategy` setting exists in the virtual machine configuration process as an alternative to the `spec.running` setting.
10-
The `spec.runStrategy` setting allows greater flexibility for how VMIs are created and managed, in contrast to the `spec.running` setting with only `true` or `false` responses. However, the two settings are mutually exclusive. Only either `spec.running` or `spec.runStrategy` can be used. An error occurs if both are used.
9+
Run strategies for virtual machines (VMs) determine how virtual machine instances (VMIs) behave under certain conditions.
1110

12-
There are four defined RunStrategies.
11+
You configure a run strategy by assigning a value to the `runStrategy` key in the `VirtualMachine` manifest as in the following example:
12+
13+
.Example run strategy
14+
[source,yaml]
15+
----
16+
apiVersion: kubevirt.io/v1
17+
kind: VirtualMachine
18+
spec:
19+
runStrategy: Always
20+
template:
21+
# ...
22+
----
23+
24+
[IMPORTANT]
25+
====
26+
The `runStrategy` and the `running` keys are mutually exclusive. Only one of them can be used.
27+
====
28+
29+
The `runStrategy` key gives you more flexibility because it has four values, unlike the `running` key, which has a Boolean value.
30+
31+
.`runStrategy` key values
1332

1433
`Always`::
15-
A VMI is always present when a virtual machine is created. A new VMI is created if the original stops for any reason, which is the same behavior as `spec.running: true`.
34+
The VMI is always present when a virtual machine is created. A new VMI is created if the original stops for any reason. This is the same behavior as `running: true`.
35+
1636
`RerunOnFailure`::
17-
A VMI is re-created if the previous instance fails due to an error. The instance is not re-created if the virtual machine stops successfully, such as when it shuts down.
37+
The VMI is re-created if the previous instance fails. The instance is not re-created if the virtual machine stops successfully, such as when it is shut down.
38+
1839
`Manual`::
19-
The `start`, `stop`, and `restart` virtctl client commands can be used to control the VMI's state and existence.
40+
You control the VMI state manually with the `start`, `stop`, and `restart` virtctl client commands.
41+
2042
`Halted`::
21-
No VMI is present when a virtual machine is created, which is the same behavior as `spec.running: false`.
43+
No VMI is present when a virtual machine is created. This is the same behavior as `running: false`.
2244

23-
Different combinations of the `start`, `stop` and `restart` virtctl commands affect which `RunStrategy` is used.
45+
Different combinations of the `start`, `stop` and `restart` virtctl commands affect the run strategy.
2446

25-
The following table follows a VM's transition from different states. The first column shows the VM's initial `RunStrategy`. Each additional column shows a virtctl command and the new `RunStrategy` after that command is run.
47+
The following table describes a VM's transition from different states. The first column shows the VM's initial run strategy. The remaining columns show a virtctl command and the new run strategy after that command is run.
2648

49+
.Run strategy before and after `virtctl` commands
50+
[options="header"]
2751
|===
28-
|Initial RunStrategy |start |stop |restart
52+
|Initial run strategy |Start |Stop |Restart
2953

3054
|Always
3155
|-
@@ -50,16 +74,6 @@ The following table follows a VM's transition from different states. The first c
5074

5175
[NOTE]
5276
====
53-
In {VirtProductName} clusters installed using installer-provisioned infrastructure, when a node fails the MachineHealthCheck and becomes unavailable to the cluster, VMs with a RunStrategy of `Always` or `RerunOnFailure` are rescheduled on a new node.
77+
If a node in a cluster installed by using installer-provisioned infrastructure fails the machine health check and is unavailable, VMs with `runStrategy: Always` or `runStrategy: RerunOnFailure` are rescheduled on a new node.
5478
====
5579

56-
[source,yaml]
57-
----
58-
apiVersion: kubevirt.io/v1
59-
kind: VirtualMachine
60-
spec:
61-
RunStrategy: Always <1>
62-
template:
63-
# ...
64-
----
65-
<1> The VMI's current `RunStrategy` setting.

modules/virt-about-workload-updates.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ If you enable both `LiveMigrate` and `Evict`:
3131

3232
* VMIs that support live migration use the `LiveMigrate` update strategy.
3333
34-
* VMIs that do not support live migration use the `Evict` update strategy. If a VMI is controlled by a `VirtualMachine` object that has a `runStrategy` value of `always`, a new VMI is created in a new pod with updated components.
34+
* VMIs that do not support live migration use the `Evict` update strategy. If a VMI is controlled by a `VirtualMachine` object that has `runStrategy: Always` set, a new VMI is created in a new pod with updated components.
3535
3636
[discrete]
3737
[id="migration-attempts-timeouts_{context}"]

modules/virt-configuring-workload-update-methods.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ spec:
4646
<1> The methods that can be used to perform automated workload updates. The available values are `LiveMigrate` and `Evict`. If you enable both options as shown in this example, updates use `LiveMigrate` for VMIs that support live migration and `Evict` for any VMIs that do not support live migration. To disable automatic workload updates, you can either remove the `workloadUpdateStrategy` stanza or set `workloadUpdateMethods: []` to leave the array empty.
4747
//NOTE: in 4.10, removing the stanza will not disable the feature.
4848
<2> The least disruptive update method. VMIs that support live migration are updated by migrating the virtual machine (VM) guest into a new pod with the updated components enabled. If `LiveMigrate` is the only workload update method listed, VMIs that do not support live migration are not disrupted or updated.
49-
<3> A disruptive method that shuts down VMI pods during upgrade. `Evict` is the only update method available if live migration is not enabled in the cluster. If a VMI is controlled by a `VirtualMachine` object that has `runStrategy: always` configured, a new VMI is created in a new pod with updated components.
49+
<3> A disruptive method that shuts down VMI pods during upgrade. `Evict` is the only update method available if live migration is not enabled in the cluster. If a VMI is controlled by a `VirtualMachine` object that has `runStrategy: Always` configured, a new VMI is created in a new pod with updated components.
5050
<4> The number of VMIs that can be forced to be updated at a time by using the `Evict` method. This does not apply to the `LiveMigrate` method.
5151
<5> The interval to wait before evicting the next batch of workloads. This does not apply to the `LiveMigrate` method.
5252
+

modules/virt-runbook-outdatedvirtualmachineinstanceworkloads.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Update the `HyperConverged` CR to enable automatic workload updates.
8585
[id="stopping-a-vm-associated-with-a-non-live-migratable-vmi-outdatedvirtualmachineinstanceworkloads"]
8686
=== Stopping a VM associated with a non-live-migratable VMI
8787

88-
* If a VMI is not live-migratable and if `runStrategy: always` is
88+
* If a VMI is not live-migratable and if `runStrategy: Always` is
8989
set in the corresponding `VirtualMachine` object, you can update the
9090
VMI by manually stopping the virtual machine (VM):
9191
+

virt/install/preparing-cluster-for-virt.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ You can configure one of the following high-availability (HA) options for your c
143143
+
144144
[NOTE]
145145
====
146-
In {product-title} clusters installed using installer-provisioned infrastructure and with MachineHealthCheck properly configured, if a node fails the MachineHealthCheck and becomes unavailable to the cluster, it is recycled. What happens next with VMs that ran on the failed node depends on a series of conditions. See xref:../../virt/virtual_machines/virt-create-vms.adoc#virt-about-runstrategies-vms_virt-create-vms[About RunStrategies for virtual machines] for more detailed information about the potential outcomes and how RunStrategies affect those outcomes.
146+
In {product-title} clusters installed using installer-provisioned infrastructure and with MachineHealthCheck properly configured, if a node fails the MachineHealthCheck and becomes unavailable to the cluster, it is recycled. What happens next with VMs that ran on the failed node depends on a series of conditions. See xref:../../virt/node_maintenance/virt-about-node-maintenance.adoc#virt-about-runstrategies-vms_virt-about-node-maintenance[About RunStrategies for virtual machines] for more detailed information about the potential outcomes and how RunStrategies affect those outcomes.
147147
====
148148

149149
* Automatic high availability for both IPI and non-IPI is available by using the *Node Health Check Operator* on the {product-title} cluster to deploy the `NodeHealthCheck` controller. The controller identifies unhealthy nodes and uses the Self Node Remediation Operator to remediate the unhealthy nodes. For more information on remediation, fencing, and maintaining nodes, see the link:https://access.redhat.com/documentation/en-us/workload_availability_for_red_hat_openshift/23.2/html-single/remediation_fencing_and_maintenance/index#about-remediation-fencing-maintenance[Workload Availability for Red Hat OpenShift] documentation.

virt/node_maintenance/virt-about-node-maintenance.adoc

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,12 @@ toc::[]
88

99
include::modules/virt-about-node-maintenance.adoc[leveloffset=+1]
1010

11+
include::modules/virt-about-runstrategies-vms.adoc[leveloffset=+1]
12+
1113
include::modules/virt-maintaining-bare-metal-nodes.adoc[leveloffset=+1]
1214

1315
[role="_additional-resources"]
1416
[id="additional-resources_virt-about-node-maintenance"]
1517
== Additional resources
16-
* xref:../../nodes/nodes/nodes-remediating-fencing-maintaining-rhwa.adoc#nodes-remediating-fencing-maintaining-rhwa[Installing the Node Maintenance Operator by using the CLI]
17-
* xref:../../nodes/nodes/nodes-remediating-fencing-maintaining-rhwa.adoc#nodes-remediating-fencing-maintaining-rhwa[Setting a node to maintenance mode]
18-
* xref:../../nodes/nodes/nodes-remediating-fencing-maintaining-rhwa.adoc#nodes-remediating-fencing-maintaining-rhwa[Resuming a node from maintenance mode]
19-
* xref:../../virt/virtual_machines/virt-create-vms.adoc#virt-about-runstrategies-vms_virt-create-vms[About RunStrategies for virtual machines]
2018
* xref:../../virt/live_migration/virt-live-migration.adoc#virt-live-migration[Virtual machine live migration]
2119
* xref:../../virt/live_migration/virt-configuring-vmi-eviction-strategy.adoc#virt-configuring-vmi-eviction-strategy[Configuring virtual machine eviction strategy]

virt/virtual_machines/virt-triggering-vm-failover-resolving-failed-node.adoc renamed to virt/node_maintenance/virt-triggering-vm-failover-resolving-failed-node.adoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,26 @@
11
:_content-type: ASSEMBLY
22
[id="virt-triggering-vm-failover-resolving-failed-node"]
3-
= Triggering virtual machine failover by resolving a failed node
3+
= Deleting a failed node to trigger virtual machine failover
44
include::_attributes/common-attributes.adoc[]
55
:context: virt-triggering-vm-failover-resolving-failed-node
66

77
toc::[]
88

9-
If a node fails and xref:../../machine_management/deploying-machine-health-checks.adoc#machine-health-checks-about_deploying-machine-health-checks[machine health checks] are not deployed on your cluster, virtual machines (VMs) with `RunStrategy: Always` configured are not automatically relocated to healthy nodes. To trigger VM failover, you must manually delete the `Node` object.
9+
If a node fails and xref:../../machine_management/deploying-machine-health-checks.adoc#machine-health-checks-about_deploying-machine-health-checks[machine health checks] are not deployed on your cluster, virtual machines (VMs) with `runStrategy: Always` configured are not automatically relocated to healthy nodes. To trigger VM failover, you must manually delete the `Node` object.
1010

1111
[NOTE]
1212
====
13-
If you installed your cluster by using xref:../../installing/installing_bare_metal_ipi/ipi-install-overview.adoc#ipi-install-overview[installer-provisioned infrastructure] and you properly configured machine health checks:
13+
If you installed your cluster by using xref:../../installing/installing_bare_metal_ipi/ipi-install-overview.adoc#ipi-install-overview[installer-provisioned infrastructure] and you properly configured machine health checks, the following events occur:
1414
1515
* Failed nodes are automatically recycled.
16-
* Virtual machines with xref:../../virt/virtual_machines/virt-create-vms.adoc#virt-about-runstrategies-vms_virt-create-vms[`RunStrategy`] set to `Always` or `RerunOnFailure` are automatically scheduled on healthy nodes.
16+
* Virtual machines with xref:../../virt/node_maintenance/virt-about-node-maintenance.adoc#virt-about-runstrategies-vms_virt-about-node-maintenance[`runStrategy`] set to `Always` or `RerunOnFailure` are automatically scheduled on healthy nodes.
1717
====
1818

1919
[id="prerequisites_{context}"]
2020
== Prerequisites
2121

2222
* A node where a virtual machine was running has the `NotReady` xref:../../nodes/nodes/nodes-nodes-viewing.adoc#nodes-nodes-viewing-listing_nodes-nodes-viewing[condition].
23-
* The virtual machine that was running on the failed node has `RunStrategy` set to `Always`.
23+
* The virtual machine that was running on the failed node has `runStrategy` set to `Always`.
2424
* You have installed the OpenShift CLI (`oc`).
2525

2626
include::modules/nodes-nodes-working-deleting-bare-metal.adoc[leveloffset=+1]
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_content-type: ASSEMBLY
2+
[id="virt-high-availability-for-vms"]
3+
= About high availability for virtual machines
4+
include::_attributes/common-attributes.adoc[]
5+
:context: virt-high-availability-for-vms
6+
7+
toc::[]
8+
9+
You can enable high availability for virtual machines (VMs) by manually deleting a failed node to trigger VM failover or by configuring remediating nodes.
10+
11+
.Manually deleting a failed node
12+
13+
If a node fails and machine health checks are not deployed on your cluster, virtual machines with `runStrategy: Always` configured are not automatically relocated to healthy nodes. To trigger VM failover, you must manually delete the `Node` object.
14+
15+
See xref:../../../virt/node_maintenance/virt-triggering-vm-failover-resolving-failed-node.adoc#virt-triggering-vm-failover-resolving-failed-node[Deleting a failed node to trigger virtual machine failover].
16+
17+
.Configuring remediating nodes
18+
19+
You can configure remediating nodes by installing the Self Node Remediation Operator from the OperatorHub and enabling machine health checks or node remediation checks.
20+
21+
For more information on remediation, fencing, and maintaining nodes, see the link:https://access.redhat.com/documentation/en-us/workload_availability_for_red_hat_openshift/23.2/html-single/remediation_fencing_and_maintenance/index#about-remediation-fencing-maintenance[Workload Availability for Red Hat OpenShift] documentation.

0 commit comments

Comments
 (0)