Skip to content

Commit a96551d

Browse files
Merge pull request #215163 from ju-shim/wang-1
VMSS Instance Repairs - WANG edit
2 parents e6970bd + 39b66c1 commit a96551d

File tree

1 file changed

+31
-27
lines changed

1 file changed

+31
-27
lines changed

articles/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-instance-repairs.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,33 @@
11
---
2-
title: Automatic instance repairs with Azure virtual machine scale sets
2+
title: Automatic instance repairs with Azure Virtual Machine Scale Sets
33
description: Learn how to configure automatic repairs policy for VM instances in a scale set
4-
author: mamccrea
5-
ms.author: mamccrea
4+
author: ju-shim
5+
ms.author: jushiman
66
ms.topic: conceptual
77
ms.service: virtual-machine-scale-sets
88
ms.subservice: instance-protection
9-
ms.date: 02/28/2020
10-
ms.reviewer: jushiman
11-
ms.custom: avverma, devx-track-azurecli, devx-track-azurepowershell
9+
ms.date: 10/19/2022
10+
ms.reviewer: mimckitt
11+
ms.custom: devx-track-azurecli, devx-track-azurepowershell
1212

1313
---
14-
# Automatic instance repairs for Azure virtual machine scale sets
14+
# Automatic instance repairs for Azure Virtual Machine Scale Sets
1515

16-
Enabling automatic instance repairs for Azure virtual machine scale sets helps achieve high availability for applications by maintaining a set of healthy instances. If an instance in the scale set is found to be unhealthy as reported by [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md), then this feature automatically performs instance repair by deleting the unhealthy instance and creating a new one to replace it.
16+
Enabling automatic instance repairs for Azure Virtual Machine Scale Sets helps achieve high availability for applications by maintaining a set of healthy instances. The [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md) may find that an instance is unhealthy. Automatic instance repairs will automatically perform instance repairs by deleting the unhealthy instance and creating a new one to replace it.
1717

1818
## Requirements for using automatic instance repairs
1919

2020
**Enable application health monitoring for scale set**
2121

22-
The scale set should have application health monitoring for instances enabled. This can be done using either [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md). Only one of these can be enabled at a time. The application health extension or the load balancer probes ping the application endpoint configured on virtual machine instances to determine the application health status. This health status is used by the scale set orchestrator to monitor instance health and perform repairs when required.
22+
The scale set should have application health monitoring for instances enabled. Health monitoring can be done using either [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md), where only one can be enabled at a time. The application health extension or the load balancer probes ping the application endpoint configured on virtual machine instances to determine the application health status. This health status is used by the scale set orchestrator to monitor instance health and perform repairs when required.
2323

2424
**Configure endpoint to provide health status**
2525

2626
Before enabling automatic instance repairs policy, ensure that the scale set instances have application endpoint configured to emit the application health status. When an instance returns status 200 (OK) on this application endpoint, then the instance is marked as "Healthy". In all other cases, the instance is marked "Unhealthy", including the following scenarios:
2727

28-
- When there is no application endpoint configured inside the virtual machine instances to provide application health status
28+
- When there's no application endpoint configured inside the virtual machine instances to provide application health status
2929
- When the application endpoint is incorrectly configured
30-
- When the application endpoint is not reachable
30+
- When the application endpoint isn't reachable
3131

3232
For instances marked as "Unhealthy", automatic repairs are triggered by the scale set. Ensure the application endpoint is correctly configured before enabling the automatic repairs policy in order to avoid unintended instance repairs, while the endpoint is getting configured.
3333

@@ -43,45 +43,49 @@ Resource or subscription moves are currently not supported for scale sets when a
4343

4444
This feature is currently not supported for service fabric scale sets.
4545

46+
**Restriction for VMs with provisioning errors**
47+
48+
Automatic repairs doesn't currently support scenarios where a VM instance is marked *Unhealthy* due to a provisioning failure. VMs must be successfully initialized to enable health monitoring and automatic repair capabilities.
49+
4650
## How do automatic instance repairs work?
4751

4852
Automatic instance repair feature relies on health monitoring of individual instances in a scale set. VM instances in a scale set can be configured to emit application health status using either the [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md). If an instance is found to be unhealthy, then the scale set performs repair action by deleting the unhealthy instance and creating a new one to replace it. The latest virtual machine scale set model is used to create the new instance. This feature can be enabled in the virtual machine scale set model by using the *automaticRepairsPolicy* object.
4953

5054
### Batching
5155

52-
The automatic instance repair operations are performed in batches. At any given time, no more than 5% of the instances in the scale set are repaired through the automatic repairs policy. This helps avoid simultaneous deletion and re-creation of a large number of instances if found unhealthy at the same time.
56+
The automatic instance repair operations are performed in batches. At any given time, no more than 5% of the instances in the scale set are repaired through the automatic repairs policy. This process helps avoid simultaneous deletion and re-creation of a large number of instances if found unhealthy at the same time.
5357

5458
### Grace period
5559

56-
When an instance goes through a state change operation because of a PUT, PATCH or POST action performed on the scale set (for example reimage, redeploy, update, etc.), then any repair action on that instance is performed only after waiting for the grace period. Grace period is the amount of time to allow the instance to return to healthy state. The grace period starts after the state change has completed. This helps avoid any premature or accidental repair operations. The grace period is honored for any newly created instance in the scale set (including the one created as a result of repair operation). Grace period is specified in minutes in ISO 8601 format and can be set using the property *automaticRepairsPolicy.gracePeriod*. Grace period can range between 10 minutes and 90 minutes, and has a default value of 30 minutes.
60+
When an instance goes through a state change operation because of a PUT, PATCH, or POST action performed on the scale set, then any repair action on that instance is performed only after the grace period ends. Grace period is the amount of time to allow the instance to return to healthy state. The grace period starts after the state change has completed, which helps avoid any premature or accidental repair operations. The grace period is honored for any newly created instance in the scale set, including the one created as a result of repair operation. Grace period is specified in minutes in ISO 8601 format and can be set using the property *automaticRepairsPolicy.gracePeriod*. Grace period can range between 10 minutes and 90 minutes, and has a default value of 30 minutes.
5761

5862
### Suspension of Repairs
5963

60-
Virtual machine scale sets provide the capability to temporarily suspend automatic instance repairs if needed. The *serviceState* for automatic repairs under the property *orchestrationServices* in instance view of virtual machine scale set shows the current state of the automatic repairs. When a scale set is opted into automatic repairs, the value of parameter *serviceState* is set to *Running*. When the automatic repairs are suspended for a scale set, the parameter *serviceState* is set to *Suspended*. If *automaticRepairsPolicy* is defined on a scale set but the automatic repairs feature is not enabled, then the parameter *serviceState* is set to *Not Running*.
64+
Virtual Machine Scale Sets provide the capability to temporarily suspend automatic instance repairs if needed. The *serviceState* for automatic repairs under the property *orchestrationServices* in instance view of virtual machine scale set shows the current state of the automatic repairs. When a scale set is opted into automatic repairs, the value of parameter *serviceState* is set to *Running*. When the automatic repairs are suspended for a scale set, the parameter *serviceState* is set to *Suspended*. If *automaticRepairsPolicy* is defined on a scale set but the automatic repairs feature isn't enabled, then the parameter *serviceState* is set to *Not Running*.
6165

6266
If newly created instances for replacing the unhealthy ones in a scale set continue to remain unhealthy even after repeatedly performing repair operations, then as a safety measure the platform updates the *serviceState* for automatic repairs to *Suspended*. You can resume the automatic repairs again by setting the value of *serviceState* for automatic repairs to *Running*. Detailed instructions are provided in the section on [viewing and updating the service state of automatic repairs policy](#viewing-and-updating-the-service-state-of-automatic-instance-repairs-policy) for your scale set.
6367

6468
The automatic instance repairs process works as follows:
6569

6670
1. [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md) ping the application endpoint inside each virtual machine in the scale set to get application health status for each instance.
6771
2. If the endpoint responds with a status 200 (OK), then the instance is marked as "Healthy". In all the other cases (including if the endpoint is unreachable), the instance is marked "Unhealthy".
68-
3. When an instance is found to be unhealthy, the scale set triggers a repair action by deleting the unhealthy instance and creating a new one to replace it.
72+
3. When an instance is found to be unhealthy, the scale set triggers a repair action by deleting the unhealthy instance, and creating a new one to replace it.
6973
4. Instance repairs are performed in batches. At any given time, no more than 5% of the total instances in the scale set are repaired. If a scale set has fewer than 20 instances, the repairs are done for one unhealthy instance at a time.
7074
5. The above process continues until all unhealthy instance in the scale set are repaired.
7175

7276
## Instance protection and automatic repairs
7377

74-
If an instance in a scale set is protected by applying one of the [protection policies](./virtual-machine-scale-sets-instance-protection.md), then automatic repairs are not performed on that instance. This applies to both the protection policies: *Protect from scale-in* and *Protect from scale-set* actions.
78+
If an instance in a scale set is protected by applying one of the [protection policies](./virtual-machine-scale-sets-instance-protection.md), then automatic repairs aren't performed on that instance. This behavior applies to both the protection policies: *Protect from scale-in* and *Protect from scale-set* actions.
7579

7680
## Terminate notification and automatic repairs
7781

78-
If the [terminate notification](./virtual-machine-scale-sets-terminate-notification.md) feature is enabled on a scale set, then during automatic repair operation, the deletion of an unhealthy instance follows the terminate notification configuration. A terminate notification is sent through Azure metadata service – scheduled events – and instance deletion is delayed for the duration of the configured delay timeout. However, the creation of a new instance to replace the unhealthy one does not wait for the delay timeout to complete.
82+
If the [terminate notification](./virtual-machine-scale-sets-terminate-notification.md) feature is enabled on a scale set, then during automatic repair operation, the deletion of an unhealthy instance follows the terminate notification configuration. A terminate notification is sent through Azure metadata service – scheduled events – and instance deletion is delayed during the configured delay timeout. However, the creation of a new instance to replace the unhealthy one doesn't wait for the delay timeout to complete.
7983

8084
## Enabling automatic repairs policy when creating a new scale set
8185

82-
For enabling automatic repairs policy while creating a new scale set, ensure that all the [requirements](#requirements-for-using-automatic-instance-repairs) for opting in to this feature are met. The application endpoint should be correctly configured for scale set instances to avoid triggering unintended repairs while the endpoint is getting configured. For newly created scale sets, any instance repairs are performed only after waiting for the duration of grace period. To enable the automatic instance repair in a scale set, use *automaticRepairsPolicy* object in the virtual machine scale set model.
86+
For enabling automatic repairs policy while creating a new scale set, ensure that all the [requirements](#requirements-for-using-automatic-instance-repairs) for opting in to this feature are met. The application endpoint should be correctly configured for scale set instances to avoid triggering unintended repairs while the endpoint is getting configured. For newly created scale sets, any instance repairs are performed only after the grace period completes. To enable the automatic instance repair in a scale set, use *automaticRepairsPolicy* object in the virtual machine scale set model.
8387

84-
You can also use this [quickstart template](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.compute/vmss-automatic-repairs-slb-health-probe) to deploy a virtual machine scale set with load balancer health probe and automatic instance repairs enabled with a grace period of 30 minutes.
88+
You can also use this [quickstart template](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.compute/vmss-automatic-repairs-slb-health-probe) to deploy a virtual machine scale set. The scale set has a load balancer health probe and automatic instance repairs enabled with a grace period of 30 minutes.
8589

8690
### Azure portal
8791

@@ -95,7 +99,7 @@ The following steps enabling automatic repairs policy when creating a new scale
9599
1. Locate the **Automatic repair policy** section.
96100
1. Turn **On** the **Automatic repairs** option.
97101
1. In **Grace period (min)**, specify the grace period in minutes, allowed values are between 30 and 90 minutes.
98-
1. When you are done creating the new scale set, select **Review + create** button.
102+
1. When you're done creating the new scale set, select **Review + create** button.
99103

100104
### REST API
101105

@@ -147,7 +151,7 @@ az vmss create \
147151
--automatic-repairs-grace-period 30
148152
```
149153

150-
The above example uses an existing load balancer and health probe for monitoring application health status of instances. If you prefer to use an application health extension for monitoring instead, you can create a scale set, configure the application health extension and then enable the automatic instance repairs policy using the *az vmss update*, as explained in the next section.
154+
The above example uses an existing load balancer and health probe for monitoring application health status of instances. If you prefer using an application health extension for monitoring, you can do the following instead: create a scale set, configure the application health extension, and enable the automatic instance repairs policy. You can enable that policy by using the *az vmss update*, as explained in the next section.
151155

152156
## Enabling automatic repairs policy when updating an existing scale set
153157

@@ -165,7 +169,7 @@ You can modify the automatic repairs policy of an existing scale set through the
165169
1. Locate the **Automatic repair policy** section.
166170
1. Turn **On** the **Automatic repairs** option.
167171
1. In **Grace period (min)**, specify the grace period in minutes, allowed values are between 30 and 90 minutes.
168-
1. When you are done, select **Save**.
172+
1. When you're done, select **Save**.
169173

170174
### REST API
171175

@@ -200,7 +204,7 @@ Update-AzVmss `
200204

201205
### Azure CLI 2.0
202206

203-
The following is an example for updating the automatic instance repairs policy of an existing scale set, using *[az vmss update](/cli/azure/vmss#az-vmss-update)*.
207+
The following example demonstrates how to update the automatic instance repairs policy of an existing scale set, using *[az vmss update](/cli/azure/vmss#az-vmss-update)*.
204208

205209
```azurecli-interactive
206210
az vmss update \
@@ -258,7 +262,7 @@ az vmss get-instance-view \
258262
--resource-group MyResourceGroup
259263
```
260264

261-
Use [set-orchestration-service-state](/cli/azure/vmss#az-vmss-set-orchestration-service-state) cmdlet to update the *serviceState* for automatic instance repairs. Once the scale set is opted into the automatic repair feature, then you can use this cmdlet to suspend or resume automatic repairs for you scale set.
265+
Use [set-orchestration-service-state](/cli/azure/vmss#az-vmss-set-orchestration-service-state) cmdlet to update the *serviceState* for automatic instance repairs. Once the scale set is opted into the automatic repair feature, then you can use this cmdlet to suspend or resume automatic repairs for your scale set.
262266

263267
```azurecli-interactive
264268
az vmss set-orchestration-service-state \
@@ -278,7 +282,7 @@ Get-AzVmss `
278282
-InstanceView
279283
```
280284

281-
Use Set-AzVmssOrchestrationServiceState cmdlet to update the *serviceState* for automatic instance repairs. Once the scale set is opted into the automatic repair feature, you can use this cmdlet to suspend or resume automatic repairs for you scale set.
285+
Use Set-AzVmssOrchestrationServiceState cmdlet to update the *serviceState* for automatic instance repairs. Once the scale set is opted into the automatic repair feature, you can use this cmdlet to suspend or resume automatic repairs for your scale set.
282286

283287
```azurepowershell-interactive
284288
Set-AzVmssOrchestrationServiceState `
@@ -292,11 +296,11 @@ Set-AzVmssOrchestrationServiceState `
292296

293297
**Failure to enable automatic repairs policy**
294298

295-
If you get a 'BadRequest' error with a message stating "Could not find member 'automaticRepairsPolicy' on object of type 'properties'", then check the API version used for virtual machine scale set. API version 2018-10-01 or higher is required for this feature.
299+
If you get a 'BadRequest' error with a message stating "Couldn't find member 'automaticRepairsPolicy' on object of type 'properties'", then check the API version used for virtual machine scale set. API version 2018-10-01 or higher is required for this feature.
296300

297301
**Instance not getting repaired even when policy is enabled**
298302

299-
The instance could be in grace period. This is the amount of time to wait after any state change on the instance before performing repairs. This is to avoid any premature or accidental repairs. The repair action should happen once the grace period is completed for the instance.
303+
The instance could be in grace period. This period is the amount of time to wait after any state change on the instance before performing repairs, which helps avoid any premature or accidental repairs. The repair action should happen once the grace period is completed for the instance.
300304

301305
**Viewing application health status for scale set instances**
302306

0 commit comments

Comments
 (0)