You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -3,7 +3,7 @@ title: Performance and scaling best practices for small to medium workloads in A
3
3
titleSuffix: Azure Kubernetes Service
4
4
description: Learn the best practices for performance and scaling for small to medium workloads in Azure Kubernetes Service (AKS).
5
5
ms.topic: conceptual
6
-
ms.date: 10/30/2023
6
+
ms.date: 11/01/2023
7
7
---
8
8
9
9
# Best practices for performance and scaling for small to medium workloads in Azure Kubernetes Service (AKS)
@@ -25,19 +25,91 @@ In this article, you learn about:
25
25
> [!NOTE]
26
26
> This article focuses on best practices for **small to medium workloads**. For best practices for **large workloads**, see [Performance and scaling best practices for large workloads in Azure Kubernetes Service (AKS)](./best-practices-performance-scale-large.md).
27
27
28
-
## Cluster autoscaling
28
+
## Application autoscaling vs. infrastructure autoscaling
29
29
30
-
### Using the Cluster Autoscaler
30
+
### Application autoscaling
31
31
32
-
You should configure Cluster Autoscaler profile settings according to your workload demands, while also considering the balance between performance and cost.
32
+
Application autoscaling is useful when dealing with cost optimization or infrastructure limitations. A well-configured autoscaler maintains high availability for your application while also minimizing costs. You only pay for the resources required to maintain availability, regardless of the demand.
33
+
34
+
For example, if an existing node has space but not enough IPs in the subnet, it might be able to skip the creation of a new node and instead immediately start running the application on a new pod.
35
+
36
+
#### Horizontal Pod autoscaling
37
+
38
+
Implementing [horizontal pod autoscaling](LINK) is useful for applications with a steady and predictable resource demand. The Horizontal Pod Autoscaler (HPA) dynamically scales the number of pod replicas, which effectively distributes the load across multiple pods and nodes. This scaling mechanism is typically most beneficial for applications that can be decomposed into smaller, independent components capable of running in parallel.
39
+
40
+
The HPA provides resource utilization metrics by default. You can also integrate custom metrics or leverage tools like the [Kubernetes Event-Driven Autoscaler (KEDA) (Preview)](LINK). These extensions allow the HPA to make scaling decisions based on multiple perspectives and criteria, providing a more holistic view of your application's performance. This is especially helpful for applications with varying complex scaling requirements.
41
+
42
+
> [!NOTE]
43
+
> If maintaining high availability for your application is a top priority, we recommend leaving a slightly higher buffer for the minimum pod number for your HPA to account for scaling time.
44
+
45
+
#### Vertical Pod autoscaling
46
+
47
+
Implementing [vertical pod autoscaling](LINK) is useful for applications with fluctuating and unpredictable resource demands. The Vertical Pod Autoscaler (VPA) allows you to fine-tune resource requests, including CPU and memory, for individual pods, enabling precise control over resource allocation. This granularity minimizes resource waste and enhances the overall efficiency of cluster utilization. The VPA also streamlines application management by automating resource allocation, freeing up resources for critical tasks.
48
+
49
+
You shouldn't use the VPA in conjunction with the HPA on the same CPU or memory metrics. This combination can lead to conflicts, as both autoscalers attempt to respond to changes in demand using the same metrics. However, you can use the VPA for CPU or memory in conjunction with the HPA for custom metrics to prevent overlap and ensure that each autoscaler focuses on distinct aspects of workload scaling.
50
+
51
+
> [!NOTE]
52
+
> The VPA works based on historical data. We recommend waiting at least *24 hours* after deploying the VPA before applying any changes to give it time to collect recommendation data.
53
+
54
+
### Infrastructure autoscaling
55
+
56
+
#### Cluster autoscaling
57
+
58
+
Implementing cluster autoscaling is useful if your existing nodes lack sufficient capacity, as it helps with scaling up and provisioning new nodes.
33
59
34
60
In cases where your cluster handles substantial, but infrequent workloads with a primary focus on performance, we recommend increasing the scan interval and the utilization threshold. This adjustment helps batch multiple scaling operations into a single call, optimizing scale time and use of compute read/write quotas. This configuration also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing pod scheduling efficiency. For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true` to minimize unnecessary scale down operations.
35
61
36
62
If you want a cost-optimized profile, we recommend reducing the node unneeded time, utilization threshold, and scale-down delay after add operations. You can also increase *Max-bulk-delete* to help delete nodes in bulk. This configuration helps reduce the number of nodes in the cluster, which reduces the cost of the cluster. However, this configuration can also increase the time it takes to scale up the cluster.
37
63
64
+
When considering cluster autoscaling, the decision of when to remove a node involves a tradeoff between optimizing resource utilization and ensuring resource availability. Eliminating underutilized nodes enhances cluster utilization but might result in new workloads having to wait for resources to be provisioned before they can be deployed. It's important to find a balance between these two factors that aligns with your cluster and workload requirements and [configure the cluster autoscaler profile settings accordingly](LINK).
65
+
38
66
The Cluster Autoscaler profile settings apply universally to all autoscaler-enabled node pools in your cluster. This means that any scaling actions occurring in one autoscaler-enabled node pool might impact the autoscaling behavior in another node pool. It's important to apply consistent and synchronized profile settings across all relevant node pools to ensure that the autoscaler behaves as expected.
39
67
40
-
## Application autoscaling
68
+
##### Overprovisioning
69
+
70
+
Overprovisioning is a strategy that helps mitigate the risk of application pressure by ensuring there's an excess of readily available resources. This approach is especially useful for applications that experience highly variable loads and cluster scaling patterns that show frequent scale ups and scale downs.
71
+
72
+
To determine the optimal amount of overprovisioning, you can use the following formula: $1-buffer/1+traffic$
73
+
74
+
For example, let's say you want to avoid hitting 100% CPU utilization in your cluster. You might opt for a 30% buffer to maintain a safety margin. If you anticipate an average traffic growth rate of 40%, you might consider overprovisioning by 50%, as calculated by the formula: $1-30/1+40=50$
75
+
76
+
An effective overprovisioning method involves the use of *pause pods*. Pause pods are low-priority deployments that can be easily replaced by high-priority deployments. You create low-priority pods that serve the sole purpose of reserving buffer space. When a high-priority pod requires space, the pause pods are removed and rescheduled on another node or a new node to accommodate the high-priority pod.
77
+
78
+
The following YAML shows an example pause pod manifest:
79
+
80
+
```yml
81
+
apiVersion: scheduling.k8s.io/v1
82
+
kind: PriorityClass
83
+
metadata:
84
+
name: overprovisioning
85
+
value: -1
86
+
globalDefault: false
87
+
description: "Priority class used by overprovisioning."
88
+
---
89
+
apiVersion: apps/v1
90
+
kind: Deployment
91
+
metadata:
92
+
name: overprovisioning
93
+
namespace: kube-system
94
+
spec:
95
+
replicas: 1
96
+
selector:
97
+
matchLabels:
98
+
run: overprovisioning
99
+
template:
100
+
metadata:
101
+
labels:
102
+
run: overprovisioning
103
+
spec:
104
+
priorityClassName: overprovisioning
105
+
containers:
106
+
- name: reserve-resources
107
+
image: your-custome-pause-image
108
+
resources:
109
+
requests:
110
+
cpu: 1
111
+
memory: 4Gi
112
+
```
41
113
42
114
## Node scaling and efficiency
43
115
@@ -47,29 +119,41 @@ The Cluster Autoscaler profile settings apply universally to all autoscaler-enab
47
119
48
120
Node scaling allows you to dynamically adjust the number of nodes in your cluster based on workload demands. It's important to understand that adding more nodes to a cluster isn't always the best solution for improving performance. To ensure optimal performance, you should carefully monitor resource utilization and scheduling policies to ensure nodes are being used efficiently.
49
121
50
-
### Node image version
122
+
### Node images
51
123
52
124
> **Best practice guidance**:
53
125
>
54
126
> Use the latest node image version to ensure that you have the latest security patches and bug fixes.
55
127
56
-
Using the latest node image version provides the best performance experience. AKS ships performance improvements within the weekly image releases.
128
+
Using the latest node image version provides the best performance experience. AKS ships performance improvements within the weekly image releases. Falling behind on updates might have a negative impact on performance, so it's important to avoid large gaps between versions.
129
+
130
+
#### Azure Linux
131
+
132
+
The [Azure Linux Container Host on AKS](LINK) uses a native AKS image and provides a single place for Linux development. Every package is built from source and validated, ensuring your services run on proven components.
133
+
134
+
Azure Linux is lightweight, only including the necessary set of packages to run container workloads. It provides a reduced attack surface and eliminates patching and maintenance of unnecessary packages. At its base layer, it has a Microsoft-hardened kernel tuned for Azure. This image is ideal for performance-sensitive workloads and platform engineers or operators that manage fleets of AKS clusters.
57
135
58
-
### Virtual machine (VM) selection
136
+
#### Ubuntu 2204
137
+
138
+
The [Ubuntu 2204 image](LINK) is the default node image for AKS. It's a lightweight and efficient operating system optimized for running containerized workloads. This means that it can help reduce resource usage and improve overall performance. The image includes the latest security patches and updates, which help ensure that your workloads are protected from vulnerabilities.
139
+
140
+
The Ubuntu 2204 image is fully supported by Microsoft and the Ubuntu community and can help you achieve better performance and security for your containerized workloads.
141
+
142
+
### Virtual machines (VMs)
59
143
60
144
> **Best practice guidance**:
61
145
>
62
146
> When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention.
63
147
64
-
*Create* and *scale* latency have direct relationships with the VM SKUs you use in your workloads. The larger and more powerful the VM, the better the performance. For ***mission critical or product workloads***, we recommend using VMs with newer hardware generations, such as v4 and v5 machines, and at least an 8-core CPU.
148
+
Application performance is closely tied to the VM SKUs you use in your workloads. Larger and more powerful VMs, generally provide better performance. For *mission critical or product workloads*, we recommend using VMs with at least an 8-core CPU. VMs with newer hardware generations, like v4 and v5, can also help improve performance. Keep in mind that create and scale latency might vary depending on the VM SKUs you use.
65
149
66
-
OS disks are responsible for storing the operating system and its associated files, and the VMs are responsible for running the applications. When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention. For example, if the OS disk is significantly smaller than the VMs, it can limit the amount of space available for application data and cause the system to run out of disk space. If the OS disk has lower performance than the VMs, it can cause a bottleneck, limiting the overall performance of the system. Make sure the size and performance are balanced to ensure optimal performance in Kubernetes.
150
+
OS disks are responsible for storing the operating system and its associated files, and the VMs are responsible for running the applications. When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention. For example, if the OS disk is significantly smaller than the VMs, it can limit the amount of space available for application data and cause the system to run out of disk space. If the OS disk has lower performance than the VMs, it can become a bottleneck and limit the overall performance of the system. Make sure the size and performance are balanced to ensure optimal performance in Kubernetes.
67
151
68
152
### Node pools
69
153
70
154
For scaling performance and reliability, we recommend using a dedicated system node pool. The system node pool reserves resources for critical components like system OS daemons and kernel memory. We recommend running your application in a user node pool to increase the availability of allocatable resources for your workloads. This configuration also helps mitigate the risk of resource competition between the system and application.
71
155
72
-
### Create provisioning
156
+
### Create operations
73
157
74
158
Review the extensions and add-ons you have enabled during create provisioning. Extensions and add-ons can add latency to overall duration of create operations. If you don't need an extension or add-on, we recommend removing it to improve create latency.
75
159
@@ -79,9 +163,7 @@ You can also use availability zones to provide a higher level of availability to
79
163
80
164
The Kubernetes data plane is responsible for managing network traffic between containers and services. Issues with the data plane can lead to slow response times, degraded performance, and application downtime. It's important to carefully monitor and optimize data plane configurations, such as network latency, resource allocation, container density, and network policies, to ensure your containerized applications run smoothly and efficiently.
81
165
82
-
### Storage
83
-
84
-
#### OS disks
166
+
### Storage types
85
167
86
168
AKS recommends and defaults to using ephemeral OS disks. Ephemeral OS disks are created on local VM storage and aren't saved to remote Azure storage like managed OS disks. They have faster reimaging and boot times, enabling faster cluster operations, and they provide lower read/write latency on the OS disk of AKS agent nodes. Ephemeral OS disks work well for stateless workloads, where applications are tolerant of individual VM failures but not of VM deployment time or individual VM reimaging instances. Only certain VM SKUs support ephemeral OS disks, so you need to ensure that your desired SKU generation and size is compatible. For more information, see [Ephemeral OS disks in Azure Kubernetes Service (AKS)](LINK).
87
169
@@ -96,15 +178,17 @@ The following table provides a breakdown of suggested use cases for OS disks sup
96
178
| Standard SSD OS disks | • Consistent performance.<br/> • Better availability and latency compared to Standard HDD disks. | • Web servers.<br/> • Low input/output operations per second (IOPS) application servers.<br/> • Lightly used enterprise applications.<br/> • Dev/test workloads. |
97
179
| Standard HDD disks | • Low cost.<br/> • Exhibits variability in performance and latency. | • Backup storage.<br/> • Mass storage with infrequent access. |
98
180
99
-
#####IOPS and throughput
181
+
#### IOPS and throughput
100
182
101
183
Input/output operations per second (IOPS) refers to the number of read and write operations that a disk can perform in a second. Throughout refers to the amount of data that can be transferred in a given time period.
102
184
103
185
In Kubernetes, the IOPS and throughput of a disk can have a significant impact on the performance of the system. When a Kubernetes cluster runs multiple applications, the disk IOPS and throughput might cause a bottleneck, especially for applications requiring high disk performance. It's important to choose a disk that can handle the workload and provide sufficient IOPS and throughput.
104
186
105
187
Ephemeral OS disks can provide dynamic IOPS and throughput for your application, whereas managed disks have capped IOPS and throughput. For more information, see [LINK](LINK).
106
188
107
-
### Networking
189
+
[Azure Premium SSD v2]() is designed for IO-intense enterprise workloads that require sub-millisecond disk latencies and high IOPS and throughput at a low cost. It's suited for a broad range of workloads, such as SQL server, Oracle, MariaDB, SAP, Cassandra, MongoDB, big data/analytics, gaming, and more. This disk type is the highest performing option currently available for persistent volumes.
190
+
191
+
### Networking types
108
192
109
193
> **Best practice guidance**:
110
194
>
@@ -116,8 +200,8 @@ We recommend using [Dynamic IP allocation](LINK) or [CNI Overlay](LINK) networki
116
200
117
201
For more information, see [Dynamic IP allocation overview](LINK) and [CNI Overlay overview](LINK).
118
202
119
-
### Pods
203
+
### Pod scheduling
120
204
121
-
The memory and CPU resources allocated to a VM have a direct impact on the performance of the pods running on the VM. When a pod is created, it's assigned a certain amount of memory and CPU resources, which are used to run the application. If the VM doesn't have enough memory or CPU resources available, it can cause the pods to slow down or even crash. If the VM has too much memory or CPU resources available, it can cause the pods to run inefficiently, wasting resources and increasing costs. It's important to ensure that your VMs have the appropriate amount of memory and CPU resources to ensure optimal performance of your Kubernetes pods. To help ensure optimal performance, you can monitor resource usage and adjust the allocation as needed to ensure the VM uses the resources efficiently and effectively. You can set the maximum pods per node based on your capacity planning using `--max-pods`. For more information, see [Configure maximum number of pods per node](LINK).
205
+
The memory and CPU resources allocated to a VM have a direct impact on the performance of the pods running on the VM. When a pod is created, it's assigned a certain amount of memory and CPU resources, which are used to run the application. If the VM doesn't have enough memory or CPU resources available, it can cause the pods to slow down or even crash. If the VM has too much memory or CPU resources available, it can cause the pods to run inefficiently, wasting resources and increasing costs.
122
206
123
-
Overprovisioning can help improve the performance and reliability of scale operations using the AKS Cluster Autoscaler. You can configure overprovisioning using deployment running pause pods with a low-assigned priority level, which keeps resources that other pods can use. If there aren't enough resources, the pause pods are preempted, new pods take their place, the pause pods become unschedulable, and the Cluster Autoscaler scales up the cluster. For more information, see [DOC](LINK).
207
+
It's important to ensure that your VMs have the appropriate amount of memory and CPU resources to ensure optimal performance of your Kubernetes pods. To help ensure optimal performance, you can monitor resource usage and adjust the allocation as needed to ensure the VM uses the resources efficiently and effectively. You can set the maximum pods per node based on your capacity planning using `--max-pods`. For more information, see [Configure maximum number of pods per node](LINK).
0 commit comments