|
| 1 | +--- |
| 2 | +title: Use Azure Spot Virtual Machines in an Azure Red Hat OpenShift (ARO) cluster |
| 3 | +description: Discover how to utilize Azure Spot Virtual Machines in Azure Red Hat OpenShift (ARO) |
| 4 | +author: nilsanderselde |
| 5 | +ms.author: suvetriv |
| 6 | +ms.service: azure-redhat-openshift |
| 7 | +keywords: spot, nodes, aro, deploy, openshift, red hat |
| 8 | +ms.topic: how-to #Required; leave this attribute/value as-is. |
| 9 | +ms.date: 10/21/2021 |
| 10 | +ms.custom: template-how-to #Required; leave this attribute/value as-is. |
| 11 | +--- |
| 12 | + |
| 13 | +# Use Azure Spot Virtual Machines in an Azure Red Hat OpenShift (ARO) cluster** |
| 14 | + |
| 15 | +This article provides the necessary details that allow you to configure your Azure Red Hat OpenShift cluster (ARO) to use Azure Spot Virtual Machines. |
| 16 | + |
| 17 | +## Before you begin |
| 18 | + |
| 19 | +This article assumes you've already created a new cluster or have an existing cluster with latest updates applied. If you need an ARO cluster, see the [ARO quickstart](tutorial-create-cluster.md) for a public cluster, or the [private cluster tutorial](howto-create-private-cluster-4x.md) for a private cluster. The steps to configure your cluster to use Spot VMs are the same for both private and public clusters. |
| 20 | + |
| 21 | +It is also assumed you have an understanding of [how Spot VMs work](../virtual-machines/spot-vms.md). |
| 22 | + |
| 23 | + |
| 24 | +## Add Spot VMs |
| 25 | + |
| 26 | +The use of Spot VMs is specified by adding the `spotVMOptions` field within the template spec of a MachineSet. |
| 27 | + |
| 28 | +To create a Spot MachineSet in ARO, the easiest way is to use an existing worker MachineSet as a template. The benefit of this is that you only need to change a few fields rather than starting from scratch. |
| 29 | + |
| 30 | +The YAML fields you need to change when creating a Spot MachineSet based on a worker MachineSet are: |
| 31 | + |
| 32 | +```yaml |
| 33 | +* `metadata.name` |
| 34 | +* `spec.selector.matchLabels.machine.openshift.io/cluster-api-machineset` |
| 35 | +* `spec.template.metadata.labels.machine.openshift.io/cluster-api-machineset` |
| 36 | +* `spec.template.spec.providerSpec.value.spotVMOptions` (add this field, set it to `{}`) |
| 37 | +``` |
| 38 | + |
| 39 | +An abridged example of Spot MachineSet YAML is below. It highlights the key changes you need to make when basing a new Spot MachineSet on an existing worker MachineSet, including some additional information for context. (It does not represent an entire, functional MachineSet; many fields have been omitted below.) |
| 40 | + |
| 41 | +```yaml |
| 42 | +apiVersion: machine.openshift.io/v1beta1 |
| 43 | +kind: MachineSet |
| 44 | +metadata: |
| 45 | + name: aro-cluster-abcd1-spot-eastus |
| 46 | +spec: |
| 47 | + replicas: 2 |
| 48 | + selector: |
| 49 | + matchLabels: |
| 50 | + machine.openshift.io/cluster-api-cluster: aro-cluster-abcd1 |
| 51 | + machine.openshift.io/cluster-api-machineset: aro-cluster-abcd1-spot-eastus |
| 52 | + template: |
| 53 | + metadata: |
| 54 | + machine.openshift.io/cluster-api-machineset: aro-cluster-abcd1-spot-eastus |
| 55 | + spec: |
| 56 | + providerSpec: |
| 57 | + value: |
| 58 | + spotVMOptions: {} |
| 59 | + taints: |
| 60 | + - effect: NoExecute |
| 61 | + key: spot |
| 62 | + value: 'true' |
| 63 | + image: |
| 64 | + offer: aro4 |
| 65 | + publisher: azureopenshift |
| 66 | + resourceID: '' |
| 67 | + sku: aro_47 |
| 68 | + version: 47.83.20210522 |
| 69 | +``` |
| 70 | +
|
| 71 | +Once you've created the MachineSet successfully, you will see as many machines created as you specified. First the machines are provisioned, and then they are provisioned as a node. Once they are provisioned as a node, pods can be scheduled on them. |
| 72 | +
|
| 73 | +## Schedule interruptible workloads |
| 74 | +
|
| 75 | +It's recommended to add a taint to the Spot nodes to prevent noninterruptible nodes from being scheduled on them, and to add tolerations of this taint to any pods that you want scheduled on them. You can taint the nodes via the MachineSet spec. |
| 76 | +
|
| 77 | +For example, you can add the following YAML to `spec.template.spec`: |
| 78 | + |
| 79 | +``` |
| 80 | + taints: |
| 81 | + - effect: NoExecute |
| 82 | + key: spot |
| 83 | + value: 'true' |
| 84 | +``` |
| 85 | +
|
| 86 | +This would prevent pods from being scheduled on the resultant node unless they had a toleration for `spot='true'` taint, and it would evict any pods lacking that toleration. |
| 87 | +
|
| 88 | +To learn more about applying taints and tolerations, please read [Controlling pod placement using node taints](https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-scheduler-taints-tolerations.html). |
| 89 | +
|
| 90 | +## Quota |
| 91 | +
|
| 92 | +Machines may go into a failed state due to quota issues if the quota for the machine type you are using is too low for a brief moment, even if it should eventually be enough (e.g. one node is still deleting when another is being created). Because of this, it's recommended to set quota for the machine type you'll be using for Spot instances to be slightly higher than should be needed (maybe by 2*n, where n is the number of cores used by a machine). This overhead would avoid having to remedy failed machines, which, though relatively simple, is still manual intervention. (See Troubleshooting section below). |
| 93 | +
|
| 94 | +## Node readiness |
| 95 | +
|
| 96 | +As is explained in the Spot VM documentation linked above, VMs go into Deallocated provisioning state when they are no longer available, or no longer available at the maximum price specified. |
| 97 | +
|
| 98 | +This will manifest itself in OpenShift as Not Ready nodes. The machines will remain healthy, in Phase "Provisioned as node". |
| 99 | +
|
| 100 | +They will return to being Ready once the VMs are available again |
| 101 | +
|
| 102 | +## Troubleshooting |
| 103 | +
|
| 104 | +### Node stuck in Not Ready state, underlying VM deallocated |
| 105 | +
|
| 106 | +If a node is stuck for a long period of time in Not Ready state after its VM was deallocated, you can try deleting it, or deleting its corresponding OpenShift machine object. |
| 107 | +
|
| 108 | +### Spot Machine stuck in Failed state |
| 109 | +
|
| 110 | +If a machine (OpenShift object) that uses a Spot VM is stuck in a Failed state, try deleting it manually. If it cannot be deleted due to a 403 because the VM no longer exists, then edit the machine and remove the finalizers. |
| 111 | +
|
| 112 | +## Support |
| 113 | +
|
| 114 | +Due to the dynamic nature of SPOT workers, issues with SPOT workers must be raised through a support case. |
0 commit comments