You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/operator-best-practices-advanced-scheduler.md
+26-18Lines changed: 26 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,16 @@ titleSuffix: Azure Kubernetes Service
4
4
description: Learn the cluster operator best practices for using advanced scheduler features such as taints and tolerations, node selectors and affinity, or inter-pod affinity and anti-affinity in Azure Kubernetes Service (AKS)
5
5
services: container-service
6
6
ms.topic: conceptual
7
-
ms.date: 03/09/2021
7
+
ms.date: 11/11/2022
8
8
9
9
---
10
10
11
11
# Best practices for advanced scheduler features in Azure Kubernetes Service (AKS)
12
12
13
13
As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. Advanced features provided by the Kubernetes scheduler let you control:
14
+
14
15
* Which pods can be scheduled on certain nodes.
15
-
* How multi-pod applications can be appropriately distributed across the cluster.
16
+
* How multi-pod applications can be appropriately distributed across the cluster.
16
17
17
18
This best practices article focuses on advanced Kubernetes scheduling features for cluster operators. In this article, you learn how to:
18
19
@@ -23,22 +24,24 @@ This best practices article focuses on advanced Kubernetes scheduling features f
23
24
24
25
## Provide dedicated nodes using taints and tolerations
25
26
26
-
> **Best practice guidance:**
27
+
> **Best practice guidance:**
27
28
>
28
29
> Limit access for resource-intensive applications, such as ingress controllers, to specific nodes. Keep node resources available for workloads that require them, and don't allow scheduling of other workloads on the nodes.
29
30
30
-
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
31
+
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
31
32
32
-
Since this node resource hardware is typically expensive to deploy, limit the workloads that can be scheduled on these nodes. Instead, you'd dedicate some nodes in the cluster to run ingress services and prevent other workloads.
33
+
Because this node resource hardware is typically expensive to deploy, limit the workloads that can be scheduled on these nodes. Instead, dedicate some nodes in the cluster to run ingress services and prevent other workloads.
33
34
34
-
This support for different nodes is provided by using multiple node pools. An AKS cluster provides one or more node pools.
35
+
This support for different nodes is provided by using multiple node pools. An AKS cluster supports one or more node pools.
35
36
36
37
The Kubernetes scheduler uses taints and tolerations to restrict what workloads can run on nodes.
37
38
38
39
* Apply a **taint** to a node to indicate only specific pods can be scheduled on them.
39
40
* Then apply a **toleration** to a pod, allowing them to *tolerate* a node's taint.
40
41
41
-
When you deploy a pod to an AKS cluster, Kubernetes only schedules pods on nodes whose taint aligns with the toleration. For example, assume you added a node pool in your AKS cluster for nodes with GPU support. You define name, such as *gpu*, then a value for scheduling. Setting this value to *NoSchedule* restricts the Kubernetes scheduler from scheduling pods with undefined toleration on the node.
42
+
When you deploy a pod to an AKS cluster, Kubernetes only schedules pods on nodes whose taint aligns with the toleration. Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node, marking the the node so that it does not accept any pods that do not tolerate the taints.
43
+
44
+
For example, assume you added a node pool in your AKS cluster for nodes with GPU support. You define name, such as *gpu*, then a value for scheduling. Setting this value to *NoSchedule* restricts the Kubernetes scheduler from scheduling pods with undefined toleration on the node.
42
45
43
46
```azurecli-interactive
44
47
az aks nodepool add \
@@ -85,9 +88,11 @@ For more information about how to use multiple node pools in AKS, see [Create an
85
88
When you upgrade a node pool in AKS, taints and tolerations follow a set pattern as they're applied to new nodes:
86
89
87
90
#### Default clusters that use VM scale sets
91
+
88
92
You can [taint a node pool][taint-node-pool] from the AKS API to have newly scaled out nodes receive API specified node taints.
89
93
90
94
Let's assume:
95
+
91
96
1. You begin with a two-node cluster: *node1* and *node2*.
92
97
1. You upgrade the node pool.
93
98
1. Two additional nodes are created: *node3* and *node4*.
@@ -97,6 +102,7 @@ Let's assume:
97
102
#### Clusters without VM scale set support
98
103
99
104
Again, let's assume:
105
+
100
106
1. You have a two-node cluster: *node1* and *node2*.
101
107
1. You upgrade the node pool.
102
108
1. An additional node is created: *node3*.
@@ -112,8 +118,8 @@ When you scale a node pool in AKS, taints and tolerations do not carry over by d
112
118
113
119
## Control pod scheduling using node selectors and affinity
114
120
115
-
> **Best practice guidance**
116
-
>
121
+
> **Best practice guidance**
122
+
>
117
123
> Control the scheduling of pods on nodes using node selectors, node affinity, or inter-pod affinity. These settings allow the Kubernetes scheduler to logically isolate workloads, such as by hardware in the node.
118
124
119
125
Taints and tolerations logically isolate resources with a hard cut-off. If the pod doesn't tolerate a node's taint, it isn't scheduled on the node.
@@ -153,7 +159,8 @@ spec:
153
159
cpu: 4.0
154
160
memory: 16Gi
155
161
nodeSelector:
156
-
hardware: highmem
162
+
hardware:
163
+
values: highmem
157
164
```
158
165
159
166
When you use these scheduler options, work with your application developers and owners to allow them to correctly define their pod specifications.
@@ -162,7 +169,8 @@ For more information about using node selectors, see [Assigning Pods to Nodes][k
162
169
163
170
### Node affinity
164
171
165
-
A node selector is a basic solution for assigning pods to a given node. *Node affinity* provides more flexibility, allowing you to define what happens if the pod can't be matched with a node. You can:
172
+
A node selector is a basic solution for assigning pods to a given node. *Node affinity* provides more flexibility, allowing you to define what happens if the pod can't be matched with a node. You can:
173
+
166
174
* *Require* that Kubernetes scheduler matches a pod with a labeled host. Or,
167
175
* *Prefer* a match but allow the pod to be scheduled on a different host if no match is available.
168
176
@@ -191,7 +199,8 @@ spec:
191
199
- matchExpressions:
192
200
- key: hardware
193
201
operator: In
194
-
values: highmem
202
+
values:
203
+
- highmem
195
204
```
196
205
197
206
The *IgnoredDuringExecution* part of the setting indicates that the pod shouldn't be evicted from the node if the node labels change. The Kubernetes scheduler only uses the updated node labels for new pods being scheduled, not pods already scheduled on the nodes.
@@ -202,9 +211,10 @@ For more information, see [Affinity and anti-affinity][k8s-affinity].
202
211
203
212
One final approach for the Kubernetes scheduler to logically isolate workloads is using inter-pod affinity or anti-affinity. These settings define that pods either *shouldn't* or *should* be scheduled on a node that has an existing matching pod. By default, the Kubernetes scheduler tries to schedule multiple pods in a replica set across nodes. You can define more specific rules around this behavior.
204
213
205
-
For example, you have a web application that also uses an Azure Cache for Redis.
206
-
1. You use pod anti-affinity rules to request that the Kubernetes scheduler distributes replicas across nodes.
207
-
1. You use affinity rules to ensure each web app component is scheduled on the same host as a corresponding cache.
214
+
For example, you have a web application that also uses an Azure Cache for Redis.
215
+
216
+
* You use pod anti-affinity rules to request that the Kubernetes scheduler distributes replicas across nodes.
217
+
* You use affinity rules to ensure each web app component is scheduled on the same host as a corresponding cache.
208
218
209
219
The distribution of pods across nodes looks like the following example:
210
220
@@ -213,7 +223,7 @@ The distribution of pods across nodes looks like the following example:
213
223
| webapp-1 | webapp-2 | webapp-3 |
214
224
| cache-1 | cache-2 | cache-3 |
215
225
216
-
Inter-pod affinity and anti-affinity provide a more complex deployment than node selectors or node affinity. With the deployment, you logically isolate resources and control how Kubernetes schedules pods on nodes.
226
+
Inter-pod affinity and anti-affinity provide a more complex deployment than node selectors or node affinity. With the deployment, you logically isolate resources and control how Kubernetes schedules pods on nodes.
217
227
218
228
For a complete example of this web application with Azure Cache for Redis example, see [Co-locate pods on the same node][k8s-pod-affinity].
219
229
@@ -226,14 +236,12 @@ This article focused on advanced Kubernetes scheduler features. For more informa
226
236
* [Authentication and authorization][aks-best-practices-identity]
0 commit comments