|
1 | 1 | ---
|
2 | 2 | title: Best practices for autoscale
|
3 |
| -description: Autoscale patterns in Azure for Web Apps, virtual machine scale sets, and Cloud Services |
| 3 | +description: Autoscale patterns in Azure for Web Apps, Azure Virtual Machine Scale Sets, and Azure Cloud Services. |
4 | 4 | author: EdB-MSFT
|
5 | 5 | ms.author: edbaynash
|
6 | 6 | ms.topic: conceptual
|
7 | 7 | ms.date: 09/13/2022
|
8 | 8 | ms.subservice: autoscale
|
9 | 9 | ms.reviewer: akkumari
|
10 | 10 | ---
|
11 |
| -# Best practices for Autoscale |
12 |
| -Azure Monitor autoscale applies only to [Virtual Machine Scale Sets](https://azure.microsoft.com/services/virtual-machine-scale-sets/), [Cloud Services](https://azure.microsoft.com/services/cloud-services/), [App Service - Web Apps](https://azure.microsoft.com/services/app-service/web/), and [API Management services](../../api-management/api-management-key-concepts.md). |
| 11 | +# Best practices for autoscale |
| 12 | +Azure Monitor autoscale applies only to [Azure Virtual Machine Scale Sets](https://azure.microsoft.com/services/virtual-machine-scale-sets/), [Azure Cloud Services](https://azure.microsoft.com/services/cloud-services/), the [Web Apps feature of Azure App Service](https://azure.microsoft.com/services/app-service/web/), and [Azure API Management](../../api-management/api-management-key-concepts.md). |
13 | 13 |
|
14 | 14 | ## Autoscale concepts
|
15 |
| -* A resource can have only *one* autoscale setting |
16 |
| -* An autoscale setting can have one or more profiles and each profile can have one or more autoscale rules. |
| 15 | +* A resource can have only *one* autoscale setting. |
| 16 | +* An autoscale setting can have one or more profiles, and each profile can have one or more autoscale rules. |
17 | 17 | * An autoscale setting scales instances horizontally, which is *out* by increasing the instances and *in* by decreasing the number of instances.
|
18 |
| - An autoscale setting has a maximum, minimum, and default value of instances. |
| 18 | +* An autoscale setting has a maximum, minimum, and default value of instances. |
19 | 19 | * An autoscale job always reads the associated metric to scale by, checking if it has crossed the configured threshold for scale-out or scale-in. You can view a list of metrics that autoscale can scale by at [Azure Monitor autoscaling common metrics](autoscale-common-metrics.md).
|
20 |
| -* All thresholds are calculated at an instance level. For example, "scale out by one instance when average CPU > 80% when instance count is 2", means scale-out when the average CPU across all instances is greater than 80%. |
21 |
| -* All autoscale failures are logged to the Activity Log. You can then configure an [activity log alert](../alerts/activity-log-alerts.md) so that you can be notified via email, SMS, or webhooks whenever there's an autoscale failure. |
22 |
| -* Similarly, all successful scale actions are posted to the Activity Log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there's a successful autoscale action. You can also configure email or webhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting. |
| 20 | +* All thresholds are calculated at an instance level. An example is "scale out by one instance when average CPU > 80% when instance count is 2." It means scale-out when the average CPU across all instances is greater than 80%. |
| 21 | +* All autoscale failures are logged to the activity log. You can then configure an [activity log alert](../alerts/activity-log-alerts.md) so that you can be notified via email, SMS, or webhooks whenever there's an autoscale failure. |
| 22 | +* Similarly, all successful scale actions are posted to the activity log. You can then configure an activity log alert so that you can be notified via email, SMS, or webhooks whenever there's a successful autoscale action. You can also configure email or webhook notifications to get notified for successful scale actions via the notifications tab on the autoscale setting. |
23 | 23 |
|
24 | 24 | ## Autoscale best practices
|
25 | 25 | Use the following best practices as you use autoscale.
|
26 | 26 |
|
27 | 27 | ### Ensure the maximum and minimum values are different and have an adequate margin between them
|
28 |
| -If you have a setting that has minimum=2, maximum=2 and the current instance count is 2, no scale action can occur. Keep an adequate margin between the maximum and minimum instance counts, which are inclusive. Autoscale always scales between these limits. |
| 28 | +If you have a setting that has minimum=2, maximum=2, and the current instance count is 2, no scale action can occur. Keep an adequate margin between the maximum and minimum instance counts, which are inclusive. Autoscale always scales between these limits. |
29 | 29 |
|
30 |
| -### Manual scaling is reset by autoscale min and max |
31 |
| -If you manually update the instance count to a value above or below the maximum, the autoscale engine automatically scales back to the minimum (if below) or the maximum (if above). For example, you set the range between 3 and 6. If you have one running instance, the autoscale engine scales to three instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run autoscale will scale it back to six instances on its next run. Manual scaling is temporary unless you reset the autoscale rules as well. |
| 30 | +### Manual scaling is reset by autoscale minimum and maximum |
| 31 | +If you manually update the instance count to a value above or below the maximum, the autoscale engine automatically scales back to the minimum (if below) or the maximum (if above). For example, you set the range between 3 and 6. If you have one running instance, the autoscale engine scales to three instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run autoscale will scale it back to six instances on its next run. Manual scaling is temporary unless you also reset the autoscale rules. |
32 | 32 |
|
33 | 33 | ### Always use a scale-out and scale-in rule combination that performs an increase and decrease
|
34 |
| -If you use only one part of the combination, autoscale will only take action in a single direction (scale out, or in) until it reaches the maximum, or minimum instance counts, as defined in the profile. This isn't optimal, ideally you want your resource to scale up at times of high usage to ensure availability. Similarly, at times of low usage you want your resource to scale down, so you can realize cost savings. |
| 34 | +If you use only one part of the combination, autoscale only takes action in a single direction (scale out or in) until it reaches the maximum, or minimum instance counts, as defined in the profile. This situation isn't optimal. Ideally, you want your resource to scale up at times of high usage to ensure availability. Similarly, at times of low usage, you want your resource to scale down so that you can realize cost savings. |
35 | 35 |
|
36 |
| -When you use a scale-in and scale-out rule, ideally use the same metric to control both. Otherwise, it’s possible that the scale-in and scale-out conditions could be met at the same time resulting in some level of flapping. For example, the following rule combination isn't* recommended because there's no scale-in rule for memory usage: |
| 36 | +When you use a scale-in and scale-out rule, ideally use the same metric to control both. Otherwise, it's possible that the scale-in and scale-out conditions could be met at the same time and result in some level of flapping. For example, we don't recommend the following rule combination because there's no scale-in rule for memory usage: |
37 | 37 |
|
38 |
| -* If CPU > 90%, scale-out by 1 |
39 |
| -* If Memory > 90%, scale-out by 1 |
40 |
| -* If CPU < 45%, scale-in by 1 |
| 38 | +* If CPU > 90%, scale out by 1 |
| 39 | +* If Memory > 90%, scale out by 1 |
| 40 | +* If CPU < 45%, scale in by 1 |
41 | 41 |
|
42 |
| -In this example, you can have a situation in which the memory usage is over 90% but the CPU usage is under 45%. This can lead to flapping for as long as both conditions are met. |
| 42 | +In this example, you can have a situation in which the memory usage is over 90% but the CPU usage is under 45%. This scenario can lead to flapping for as long as both conditions are met. |
43 | 43 |
|
44 | 44 | ### Choose the appropriate statistic for your diagnostics metric
|
45 |
| -For diagnostics metrics, you can choose among *Average*, *Minimum*, *Maximum* and *Total* as a metric to scale by. The most common statistic is *Average*. |
| 45 | +For diagnostics metrics, you can choose among *Average*, *Minimum*, *Maximum*, and *Total* as a metric to scale by. The most common statistic is *Average*. |
46 | 46 |
|
47 | 47 | ### Considerations for scaling threshold values for special metrics
|
48 |
| -For special metrics such as Storage or Service Bus Queue length metric, the threshold is the average number of messages available per current number of instances. Carefully choose the threshold value for this metric. |
| 48 | +For special metrics such as an Azure Storage or Azure Service Bus queue length metric, the threshold is the average number of messages available per current number of instances. Carefully choose the threshold value for this metric. |
49 | 49 |
|
50 |
| -Let's illustrate it with an example to ensure you understand the behavior better. |
| 50 | +Let's illustrate it with an example to ensure you understand the behavior better: |
51 | 51 |
|
52 |
| -* Increase instances by 1 count when Storage Queue message count >= 50 |
53 |
| -* Decrease instances by 1 count when Storage Queue message count <= 10 |
| 52 | +* Increase instances by 1 count when Storage queue message count >= 50 |
| 53 | +* Decrease instances by 1 count when Storage queue message count <= 10 |
54 | 54 |
|
55 | 55 | Consider the following sequence:
|
56 | 56 |
|
57 |
| -1. There are two storage queue instances. |
58 |
| -2. Messages keep coming and when you review the storage queue, the total count reads 50. You might assume that autoscale should start a scale-out action. However, note that it's still 50/2 = 25 messages per instance. So, scale-out doesn't occur. For the first scale-out to happen, the total message count in the storage queue should be 100. |
59 |
| -3. Next, assume that the total message count reaches 100. |
60 |
| -4. A third storage queue instance is added due to a scale-out action. The next scale-out action won't happen until the total message count in the queue reaches 150 because 150/3 = 50. |
61 |
| -5. Now the number of messages in the queue gets smaller. With three instances, the first scale-in action happens when the total messages in all queues add up to 30 because 30/3 = 10 messages per instance, which is the scale-in threshold. |
| 57 | +1. There are two Storage queue instances. |
| 58 | +1. Messages keep coming and when you review the Storage queue, the total count reads 50. You might assume that autoscale should start a scale-out action. However, notice that it's still 50/2 = 25 messages per instance. So, scale-out doesn't occur. For the first scale-out action to happen, the total message count in the Storage queue should be 100. |
| 59 | +1. Next, assume that the total message count reaches 100. |
| 60 | +1. A third Storage queue instance is added because of a scale-out action. The next scale-out action won't happen until the total message count in the queue reaches 150 because 150/3 = 50. |
| 61 | +1. Now the number of messages in the queue gets smaller. With three instances, the first scale-in action happens when the total messages in all queues add up to 30 because 30/3 = 10 messages per instance, which is the scale-in threshold. |
62 | 62 |
|
63 | 63 | ### Considerations for scaling when multiple rules are configured in a profile
|
64 | 64 |
|
65 |
| -There are cases where you may have to set multiple rules in a profile. The following autoscale rules are used by the autoscale engine when multiple rules are set. |
| 65 | +There are cases where you might have to set multiple rules in a profile. The following autoscale rules are used by the autoscale engine when multiple rules are set: |
66 | 66 |
|
67 |
| -On *scale-out*, autoscale runs if any rule is met. |
68 |
| -On *scale-in*, autoscale require all rules to be met. |
| 67 | +- On *scale-out*, autoscale runs if any rule is met. |
| 68 | +- On *scale-in*, autoscale requires all rules to be met. |
69 | 69 |
|
70 |
| -To illustrate, assume that you have the following four autoscale rules: |
| 70 | +To illustrate, assume that you have four autoscale rules: |
71 | 71 |
|
72 |
| -* If CPU < 30%, scale-in by 1 |
73 |
| -* If Memory < 50%, scale-in by 1 |
74 |
| -* If CPU > 75%, scale-out by 1 |
75 |
| -* If Memory > 75%, scale-out by 1 |
| 72 | +* If CPU < 30%, scale in by 1 |
| 73 | +* If Memory < 50%, scale in by 1 |
| 74 | +* If CPU > 75%, scale out by 1 |
| 75 | +* If Memory > 75%, scale out by 1 |
76 | 76 |
|
77 |
| -Then the follow occurs: |
| 77 | +Then the following action occurs: |
78 | 78 |
|
79 | 79 | * If CPU is 76% and Memory is 50%, we scale out.
|
80 | 80 | * If CPU is 50% and Memory is 76%, we scale out.
|
81 | 81 |
|
82 |
| -On the other hand, if CPU is 25% and memory is 51% autoscale does **not** scale-in. In order to scale-in, CPU must be 29% and Memory 49%. |
| 82 | +On the other hand, if CPU is 25% and Memory is 51%, autoscale *doesn't* scale in. To scale in, CPU must be 29% and Memory 49%. |
83 | 83 |
|
84 | 84 | ### Always select a safe default instance count
|
85 | 85 |
|
86 |
| -The default instance count is important because autoscale scales your service to that count when metrics aren't available. Therefore, select a default instance count that's safe for your workloads. |
| 86 | +The default instance count is important because autoscale scales your service to that count when metrics aren't available. As a result, select a default instance count that's safe for your workloads. |
87 | 87 |
|
88 | 88 | ### Configure autoscale notifications
|
89 | 89 |
|
90 |
| -Autoscale will post to the Activity Log if any of the following conditions occur: |
| 90 | +Autoscale posts to the activity log if any of the following conditions occur: |
91 | 91 |
|
92 | 92 | * Autoscale issues a scale operation.
|
93 | 93 | * Autoscale service successfully completes a scale action.
|
94 | 94 | * Autoscale service fails to take a scale action.
|
95 | 95 | * Metrics aren't available for autoscale service to make a scale decision.
|
96 | 96 | * Metrics are available (recovery) again to make a scale decision.
|
97 |
| -* Autoscale detects flapping and aborts the scale attempt. You'll see a log type of `Flapping` in this situation. If you see this, consider whether your thresholds are too narrow. |
98 |
| -* Autoscale detects flapping but is still able to successfully scale. You'll see a log type of `FlappingOccurred` in this situation. If you see this, the autoscale engine has attempted to scale (for example, from 4 instances to 2), but has determined that this would cause flapping. Instead, the autoscale engine has scaled to a different number of instances (for example, using 3 instances instead of 2), which no longer causes flapping, so it has scaled to this number of instances. |
| 97 | +* Autoscale detects flapping and aborts the scale attempt. You see a log type of `Flapping` in this situation. If you see this log type, consider whether your thresholds are too narrow. |
| 98 | +* Autoscale detects flapping but is still able to successfully scale. You see a log type of `FlappingOccurred` in this situation. If you see this log type, the autoscale engine has attempted to scale (for example, from four instances to two) but has determined that this change would cause flapping. Instead, the autoscale engine has scaled to a different number of instances (for example, using three instances instead of two), which no longer causes flapping, so it has scaled to this number of instances. |
99 | 99 |
|
100 |
| -You can also use an Activity Log alert to monitor the health of the autoscale engine. Here are examples to [create an Activity Log Alert to monitor all autoscale engine operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-alert) or to [create an Activity Log Alert to monitor all failed autoscale scale in/scale out operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-failed-alert). |
| 100 | +You can also use an activity log alert to monitor the health of the autoscale engine. One example shows how to [create an activity log alert to monitor all autoscale engine operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-alert). Another example shows how to [create an activity log alert to monitor all failed autoscale scale-in/scale-out operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-failed-alert). |
101 | 101 |
|
102 | 102 | In addition to using activity log alerts, you can also configure email or webhook notifications to get notified for scale actions via the notifications tab on the autoscale setting.
|
103 | 103 |
|
104 |
| -## Send data securely using TLS 1.2 |
| 104 | +## Send data securely by using TLS 1.2 |
105 | 105 |
|
106 |
| -To ensure the security of data in transit to Azure Monitor, we strongly encourage you to configure the agent to use at least Transport Layer Security (TLS) 1.2. Older versions of TLS/Secure Sockets Layer (SSL) have been found to be vulnerable and while they still currently work to allow backwards compatibility, they are **not recommended**, and the industry is quickly moving to abandon support for these older protocols. |
| 106 | +To ensure the security of data in transit to Azure Monitor, we strongly encourage you to configure the agent to use at least Transport Layer Security (TLS) 1.2. Older versions of TLS/Secure Sockets Layer (SSL) have been found to be vulnerable. Although they still currently work to allow backwards compatibility, we *don't* recommend them. The industry is quickly moving to abandon support for these older protocols. |
107 | 107 |
|
108 |
| -The [PCI Security Standards Council](https://www.pcisecuritystandards.org/) has set a deadline of [June 30th, 2018](https://www.pcisecuritystandards.org/pdfs/PCI_SSC_Migrating_from_SSL_and_Early_TLS_Resource_Guide.pdf) to disable older versions of TLS/SSL and upgrade to more secure protocols. Once Azure drops legacy support, if your agents can't communicate over at least TLS 1.2 you wouldn't be able to send data to Azure Monitor Logs. |
| 108 | +The [PCI Security Standards Council](https://www.pcisecuritystandards.org/) has set a deadline of [June 30, 2018](https://www.pcisecuritystandards.org/pdfs/PCI_SSC_Migrating_from_SSL_and_Early_TLS_Resource_Guide.pdf), to disable older versions of TLS/SSL and upgrade to more secure protocols. After Azure drops legacy support, if your agents can't communicate over at least TLS 1.2, you won't be able to send data to Azure Monitor Logs. |
109 | 109 |
|
110 |
| -We recommend you do NOT explicit set your agent to only use TLS 1.2 unless absolutely necessary. Allowing the agent to automatically detect, negotiate, and take advantage of future security standards is preferable. Otherwise you may miss the added security of the newer standards and possibly experience problems if TLS 1.2 is ever deprecated in favor of those newer standards. |
| 110 | +We recommend that you *don't* explicitly set your agent to only use TLS 1.2 unless necessary. Allowing the agent to automatically detect, negotiate, and take advantage of future security standards is preferable. Otherwise, you might miss the added security of the newer standards and possibly experience problems if TLS 1.2 is ever deprecated in favor of those newer standards. |
111 | 111 |
|
112 |
| - |
113 |
| -## Next Steps |
| 112 | +## Next steps |
114 | 113 | - [Autoscale flapping](./autoscale-flapping.md)
|
115 |
| -- [Create an Activity Log Alert to monitor all autoscale engine operations on your subscription.](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-alert) |
116 |
| -- [Create an Activity Log Alert to monitor all failed autoscale scale in/scale out operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-failed-alert) |
| 114 | +- [Create an activity log alert to monitor all autoscale engine operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-alert) |
| 115 | +- [Create an activity log alert to monitor all failed autoscale scale-in/scale-out operations on your subscription](https://github.com/Azure/azure-quickstart-templates/tree/master/demos/monitor-autoscale-failed-alert) |
0 commit comments