|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * updating/preparing_for_updates/updating-cluster-prepare.adoc |
| 4 | + |
| 5 | +:_mod-docs-content-type: REFERENCE |
| 6 | +[id="update-best-practices_{context}"] |
| 7 | += Best practices for cluster updates |
| 8 | + |
| 9 | +{product-title} provides a robust update experience that minimizes workload disruptions during an update. |
| 10 | +Updates will not begin unless the cluster is in an upgradeable state at the time of the update request. |
| 11 | + |
| 12 | +This design enforces some key conditions before initiating an update, but there are a number of actions you can take to increase your chances of a successful cluster update. |
| 13 | + |
| 14 | +[id="recommended-versions_{context}"] |
| 15 | +== Choose versions recommended by the OpenShift Update Service |
| 16 | + |
| 17 | +The OpenShift Update Service (OSUS) provides update recommendations based on cluster characteristics such as the cluster's subscribed channel. |
| 18 | +The Cluster Version Operator saves these recommendations as either recommended or conditional updates. |
| 19 | +While it is possible to attempt an update to a version that is not recommended by OSUS, following a recommended update path protects users from encountering known issues or unintended consequences on the cluster. |
| 20 | + |
| 21 | +Choose only update targets that are recommended by OSUS to ensure a successful update. |
| 22 | + |
| 23 | +[id="critical-alerts_{context}"] |
| 24 | +== Address all critical alerts on the cluster |
| 25 | + |
| 26 | +Critical alerts must always be addressed as soon as possible, but it is especially important to address these alerts and resolve any problems before initiating a cluster update. |
| 27 | +Failing to address critical alerts before beginning an update can cause problematic conditions for the cluster. |
| 28 | + |
| 29 | +In the *Administrator* perspective of the web console, navigate to *Observe* -> *Alerting* to find critical alerts. |
| 30 | + |
| 31 | +[id="cluster-upgradeable_{context}"] |
| 32 | +== Ensure that the cluster is in an Upgradable state |
| 33 | + |
| 34 | +When one or more Operators have not reported their `Upgradeable` condition as `True` for more than an hour, the `ClusterNotUpgradeable` warning alert is triggered in the cluster. |
| 35 | +In most cases this alert does not block patch updates, but you cannot perform a minor version update until you resolve this alert and all Operators report `Upgradeable` as `True`. |
| 36 | + |
| 37 | +For more information about the `Upgradeable` condition, see "Understanding cluster Operator condition types" in the additional resources section. |
| 38 | + |
| 39 | +[id="nodes-ready_{context}"] |
| 40 | +== Ensure that enough spare nodes are available |
| 41 | + |
| 42 | +// Completely guessing the explanation in this section just to have something to start with when this is reviewed by an SME. |
| 43 | +A cluster should not be running with little to no spare node capacity, especially when initiating a cluster update. |
| 44 | +Nodes that are not running and available may limit a cluster's ability to perform an update with minimal disruption to cluster workloads. |
| 45 | + |
| 46 | +Depending on the configured value of the cluster's `maxUnavailable` spec, the cluster might not be able to apply machine configuration changes to nodes if there is an unavailable node. |
| 47 | +Additionally, if compute nodes do not have enough spare capacity, workloads might not be able to temporarily shift to another node while the first node is taken offline for an update. |
| 48 | + |
| 49 | +Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates. |
| 50 | + |
| 51 | +[id="pod-disruption-budget_{context}"] |
| 52 | +== Ensure that the cluster's PodDisruptionBudget is properly configured |
| 53 | + |
| 54 | +You can use the `PodDisruptionBudget` object to define the minimum number or percentage of pod replicas that must be available at any given time. |
| 55 | +This configuration protects workloads from disruptions during maintenance tasks such as cluster updates. |
| 56 | + |
| 57 | +However, it is possible to configure the `PodDisruptionBudget` for a given topology in a way that prevents nodes from being drained and updated during a cluster update. |
| 58 | + |
| 59 | +When planning a cluster update, check the configuration of the `PodDisruptionBudget` object for the following factors: |
| 60 | + |
| 61 | +* For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the `PodDisruptionBudget`. |
| 62 | + |
| 63 | +* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination. |
0 commit comments