Skip to content

Commit 20afa6e

Browse files
Merge pull request #258524 from hilaryw29/patch-17
Create alert-rules-automatic-repairs-service-state.md
2 parents a006fe3 + 31dbc09 commit 20afa6e

File tree

10 files changed

+49
-7
lines changed

10 files changed

+49
-7
lines changed

articles/virtual-machine-scale-sets/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,8 @@
198198
maintainContext: true
199199
- name: Terminate notifications
200200
href: virtual-machine-scale-sets-terminate-notification.md
201+
- name: Monitor automatic repairs service state
202+
href: alert-rules-automatic-repairs-service-state.md
201203
- name: Instance Metadata service
202204
items:
203205
- name: CLI
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: Use Azure Alert Rules to monitor changes in Automatic Instance Repairs ServiceState
3+
description: Learn how to use Azure Alert Rules to get notified of changes to Automatic Instance Repairs ServiceState.
4+
author: hilaryw29
5+
ms.author: hilarywang
6+
ms.topic: how-to
7+
ms.service: virtual-machine-scale-sets
8+
ms.date: 11/14/2023
9+
---
10+
11+
# Use Azure Alert Rules to monitor changes in Automatic Instance Repairs ServiceState
12+
13+
This article shows you how to use [Alert Rules from Azure Monitor](../azure-monitor/alerts/alerts-overview.md) to receive custom notifications every time the ServiceState for Automatic Repairs is updated on your scale set. This will help track if Automatic Repairs become _Suspended_ due to VM instances remaining unhealthy after multiple repair operations. To learn more about Azure Monitor alerts, see the [alerts overview](../azure-monitor/alerts/alerts-overview.md).
14+
15+
To follow this tutorial, ensure that you have a Virtual Machine scale set with [Automatic Repairs](./virtual-machine-scale-sets-automatic-instance-repairs.md) enabled.
16+
17+
## Azure portal
18+
1. In the [portal](https://portal.azure.com/), navigate to your VM scale set resource
19+
2. Select **Alerts** from the left pane, and then select **+ Create > Alert rule**. :::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-1.png" alt-text="Create monitoring alert in the Azure portal":::
20+
3. Under the **Condition** tab, select **See all signals** and choose the signal name called “Sets the state of an orchestration service in a Virtual Machine Scale set”. Select **Apply**. :::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-2.png" alt-text="Select alert signal to monitor scale set orchestration service state":::
21+
4. Set **Event Level** to “Informational” and **Status** to “Succeeded”. :::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-3.png" alt-text="Configure event level and status for alert rule":::
22+
5. Under the **Actions** tab, select an existing action group or see [Create action group](#creating-an-action-group)
23+
6. Under the **Details** tab > **Alert rule name**, set a name for your alert. Then select **Review + create** > **Create** to create your alert.
24+
:::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-4.png" alt-text="Review and create alert rule":::
25+
26+
Once the alert is created and enabled on your scale set, you'll receive a notification every time a change to the ServiceState is detected on your scale set.
27+
28+
### Sample email notification from alert rule
29+
Below is an example of an email notification created from a configured alert rule.
30+
:::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-5.png" alt-text="Sample email notification from alert rule":::
31+
32+
## Creating an action group
33+
1. Under the **Actions** tab, select **Create action group**.
34+
:::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-6.png" alt-text="Create action group on portal":::
35+
2. In the **Basics** tab, provide an **Action group name** and **Display name**.
36+
3. Under the **Notifications** tab **> Notification type**, select “Email/SMS message/Push/Voice”. Select the **edit** button to configure how you’d like to be notified.
37+
:::image type="content" source="media/alert-rules-automatic-repairs-service-state/picture-7.png" alt-text="Configure notification type for action group":::
38+
4. Select **Review + Create > Create**
228 KB
Loading
84.6 KB
Loading
146 KB
Loading
81.5 KB
Loading
88.7 KB
Loading
60.5 KB
Loading
73.4 KB
Loading

articles/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-instance-repairs.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,14 @@ Automatic repairs currently do not support scenarios where a VM instance is mark
5454

5555
Automatic instance repair feature relies on health monitoring of individual instances in a scale set. VM instances in a scale set can be configured to emit application health status using either the [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md). If an instance is found to be unhealthy, the scale set will perform a preconfigured repair action on the unhealthy instance. Automatic instance repairs can be enabled in the Virtual Machine Scale Set model by using the `automaticRepairsPolicy` object.
5656

57+
The automatic instance repairs process goes as follows:
58+
59+
1. [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md) ping the application endpoint inside each virtual machine in the scale set to get application health status for each instance.
60+
2. If the endpoint responds with a status 200 (OK), then the instance is marked as "Healthy". In all the other cases (including if the endpoint is unreachable), the instance is marked "Unhealthy".
61+
3. When an instance is found to be unhealthy, the scale set applies the configured repair action (default is *Replace*) to the unhealthy instance.
62+
4. Instance repairs are performed in batches. At any given time, no more than 5% of the total instances in the scale set are repaired. If a scale set has fewer than 20 instances, the repairs are done for one unhealthy instance at a time.
63+
5. The above process continues until all unhealthy instance in the scale set are repaired.
64+
5765
### Available repair actions
5866

5967
> [!CAUTION]
@@ -92,13 +100,7 @@ Virtual Machine Scale Sets provide the capability to temporarily suspend automat
92100

93101
If newly created instances for replacing the unhealthy ones in a scale set continue to remain unhealthy even after repeatedly performing repair operations, then as a safety measure the platform updates the *serviceState* for automatic repairs to *Suspended*. You can resume the automatic repairs again by setting the value of *serviceState* for automatic repairs to *Running*. Detailed instructions are provided in the section on [viewing and updating the service state of automatic repairs policy](#viewing-and-updating-the-service-state-of-automatic-instance-repairs-policy) for your scale set.
94102

95-
The automatic instance repairs process works as follows:
96-
97-
1. [Application Health extension](./virtual-machine-scale-sets-health-extension.md) or [Load balancer health probes](../load-balancer/load-balancer-custom-probe-overview.md) ping the application endpoint inside each virtual machine in the scale set to get application health status for each instance.
98-
2. If the endpoint responds with a status 200 (OK), then the instance is marked as "Healthy". In all the other cases (including if the endpoint is unreachable), the instance is marked "Unhealthy".
99-
3. When an instance is found to be unhealthy, the scale set applies the configured repair action (default is *Replace*) to the unhealthy instance.
100-
4. Instance repairs are performed in batches. At any given time, no more than 5% of the total instances in the scale set are repaired. If a scale set has fewer than 20 instances, the repairs are done for one unhealthy instance at a time.
101-
5. The above process continues until all unhealthy instance in the scale set are repaired.
103+
You can also set up Azure Alert Rules to monitor *serviceState* changes and get notified if automatic repairs becomes suspended on your scale set. For details, see [Use Azure alert rules to monitor changes in automatic instance repairs service state](./alert-rules-automatic-repairs-service-state.md).
102104

103105
## Instance protection and automatic repairs
104106

0 commit comments

Comments
 (0)