You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[Deployment level best practices](#deployment-level-best-practices)| • [Pod Disruption Budgets (PDBs)](#pod-disruption-budgets-pdbs) <br/> • [Pod CPU and memory limits](#pod-cpu-and-memory-limits) <br/> • [Pre-stop hooks](#pre-stop-hooks) <br/> • [maxUnavailable](#maxunavailable) <br/> • [Pod anti-affinity](#pod-anti-affinity) <br/> • [Readiness, liveness, and startup probes](#readiness-liveness-and-startup-probes) <br/> • [Multi-replica applications](#multi-replica-applications)|
18
18
|[Cluster and node pool level best practices](#cluster-and-node-pool-level-best-practices)| • [Availability zones](#availability-zones) <br/> • [Cluster autoscaling](#cluster-autoscaling) <br/> • [Standard Load Balancer](#standard-load-balancer) <br/> • [System node pools](#system-node-pools) <br/> • [Accelerated Networking](#accelerated-networking) <br/> • [Image versions](#image-versions) <br/> • [Azure CNI for dynamic IP allocation](#azure-cni-for-dynamic-ip-allocation) <br/> • [v5 SKU VMs](#v5-sku-vms) <br/> • [Do *not* use B series VMs](#do-not-use-b-series-vms) <br/> • [Premium Disks](#premium-disks) <br/> • [Container Insights](#container-insights) <br/> • [Azure Policy](#azure-policy)|
19
19
20
20
## Deployment level best practices
@@ -30,19 +30,19 @@ The following deployment level best practices help ensure high availability and
30
30
>
31
31
> Use Pod Disruption Budgets (PDBs) to ensure that a minimum number of pods remain available during *voluntary disruptions*, such as upgrade operations or accidental pod deletions.
32
32
33
-
[Pod Disruption Budgets (PDBs)](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) allow you to define how deployments or replica sets respond during voluntary disruptions, such as upgrade operations or accidental pod deletions. Using PDBs, you can define a minimum or maximum unavailable resource count.
33
+
[Pod Disruption Budgets (PDBs)](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) allow you to define how deployments or replica sets respond during voluntary disruptions, such as upgrade operations or accidental pod deletions. Using PDBs, you can define a minimum or maximum unavailable resource count. PDBs only affect the Eviction API for voluntary disruptions.
34
34
35
35
For example, let's say you need to perform a cluster upgrade and already have a PDB defined. Before performing the cluster upgrade, the Kubernetes scheduler ensures that the minimum number of pods defined in the PDB are available. If the upgrade would cause the number of available pods to fall below the minimum defined in the PDBs, the scheduler schedules extra pods on other nodes before allowing the upgrade to proceed. If you don't set a PDB, the scheduler doesn't have any constraints on the number of pods that can be unavailable during the upgrade, which can lead to a lack of resources and potential cluster outages.
36
36
37
-
In the following example PDB definition file, the `minAvailable` field sets the minimum number of pods that must remain available during voluntary disruptions:
37
+
In the following example PDB definition file, the `minAvailable` field sets the minimum number of pods that must remain available during voluntary disruptions. The value can be an absolute number (for example, *3*) or a percentage of the desired number of pods (for example, *10%*).
38
38
39
39
```yaml
40
40
apiVersion: policy/v1
41
41
kind: PodDisruptionBudget
42
42
metadata:
43
43
name: mypdb
44
44
spec:
45
-
minAvailable: 3# Minimum number of pods that must remain available
45
+
minAvailable: 3# Minimum number of pods that must remain available during voluntary disruptions
46
46
selector:
47
47
matchLabels:
48
48
app: myapp
@@ -109,7 +109,7 @@ For more information, see [Assign CPU Resources to Containers and Pods](https://
109
109
110
110
> **Best practice guidance**
111
111
>
112
-
> Use pre-stop hooks to ensure graceful termination of a container.
112
+
> When applicable, use pre-stop hooks to ensure graceful termination of a container.
113
113
114
114
A `PreStop` hook is called immediately before a container is terminated due to an API request or management event, such as preemption, resource contention, or a liveness/startup probe failure. A call to the `PreStop` hook fails if the container is already in a terminated or completed state, and the hook must complete before the TERM signal to stop the container is sent. The pod's termination grace period countdown begins before the `PreStop` hook is executed, so the container eventually terminates within the termination grace period.
115
115
@@ -139,11 +139,11 @@ For more information, see [Container lifecycle hooks](https://kubernetes.io/docs
139
139
140
140
> **Best practice guidance**
141
141
>
142
-
> Define the maximum number of pods that can be unavailable during a rolling upgrade using the `maxUnavailable` field in your deployment to ensure that a minimum number of pods remain available during the upgrade.
142
+
> Define the maximum number of pods that can be unavailable during a rolling update using the `maxUnavailable` field in your deployment to ensure that a minimum number of pods remain available during the upgrade.
143
143
144
-
The `maxUnavailable` field specifies the maximum number of pods that can be unavailable during the upgrade process. The value can be an absolute number (for example, *five*) or a percentage of the desired number of pods (for example, *10%*).
144
+
The `maxUnavailable` field specifies the maximum number of pods that can be unavailable during the update process. The value can be an absolute number (for example, *3*) or a percentage of the desired number of pods (for example, *10%*). `maxUnavailable` pertains to the Delete API, which is used during rolling updates.
145
145
146
-
The following example deployment manifest uses the `maxAvailable` field to set the maximum number of pods that can be unavailable during the upgrade process:
146
+
The following example deployment manifest uses the `maxAvailable` field to set the maximum number of pods that can be unavailable during the update process:
147
147
148
148
```yaml
149
149
apiVersion: apps/v1
@@ -199,8 +199,9 @@ spec:
199
199
- key: topology.kubernetes.io/zone
200
200
operator: In
201
201
values:
202
-
- antarctica-east1
203
-
- antarctica-west1
202
+
- 0 # Azure Availability Zone 0
203
+
- 1 # Azure Availability Zone 1
204
+
- 2 # Azure Availability Zone 2
204
205
preferredDuringSchedulingIgnoredDuringExecution:
205
206
- weight: 1
206
207
preference:
@@ -225,18 +226,16 @@ For more information, see [Affinity and anti-affinity in Kubernetes](https://kub
225
226
>
226
227
> For more information, see [Best practices for multiple zones](https://kubernetes.io/docs/setup/best-practices/multiple-zones/) and [Overview of availability zones for AKS clusters](./availability-zones.md#overview-of-availability-zones-for-aks-clusters).
227
228
228
-
### Readinessand liveness probes
229
+
### Readiness, liveness, and startup probes
229
230
230
231
> **Best practice guidance**
231
232
>
232
-
> Configure readinessand liveness probes to improve resiliency for high load and lower container restarts.
233
+
> Configure readiness, liveness, and startup probes when applicable to improve resiliency for high loads and lower container restarts.
233
234
234
235
#### Readiness probes
235
236
236
237
In Kubernetes, the kubelet uses readiness probes to know when a container is ready to start accepting traffic. A pod is considered *ready* when all of its containers are ready. When a pod is *not ready*, it's removed from service load balancers. For more information, see [Readiness Probes in Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes).
237
238
238
-
For containerized applications that serve traffic, you should verify that your container is ready to handle incoming requests. [Azure Container Instances](../container-instances/container-instances-overview.md) supports readiness probes to include configurations so that your container can't be accessed under certain conditions.
239
-
240
239
The following example pod definition file shows a readiness probe configuration:
241
240
242
241
```yaml
@@ -255,19 +254,58 @@ For more information, see [Configure readiness probes](../container-instances/co
255
254
256
255
In Kubernetes, the kubelet uses liveness probes to know when to restart a container. If a container fails its liveness probe, the container is restarted. For more information, see [Liveness Probes in Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
257
256
258
-
Containerized applications can run for extended periods of time, resulting in broken states in need of repair by restarting the container. [Azure Container Instances](../container-instances/container-instances-overview.md) supports liveness probes to include configurations so that your container can be restarted under certain conditions.
259
-
260
257
The following example pod definition file shows a liveness probe configuration:
261
258
262
259
```yaml
260
+
livenessProbe:
261
+
exec:
262
+
command:
263
+
- cat
264
+
- /tmp/healthy
265
+
```
266
+
267
+
Another kind of liveness probe uses an HTTP GET request. The following example pod definition file shows an HTTP GET request liveness probe configuration:
268
+
269
+
```yaml
270
+
apiVersion: v1
271
+
kind: Pod
272
+
metadata:
273
+
labels:
274
+
test: liveness
275
+
name: liveness-http
276
+
spec:
277
+
containers:
278
+
- name: liveness
279
+
image: registry.k8s.io/liveness
280
+
args:
281
+
- /server
263
282
livenessProbe:
264
-
exec:
265
-
command:
266
-
- cat
267
-
- /tmp/healthy
283
+
httpGet:
284
+
path: /healthz
285
+
port: 8080
286
+
httpHeaders:
287
+
- name: Custom-Header
288
+
value: Awesome
289
+
initialDelaySeconds: 3
290
+
periodSeconds: 3
268
291
```
269
292
270
-
For more information, see [Configure liveness probes](../container-instances/container-instances-liveness-probe.md).
293
+
For more information, see [Configure liveness probes](../container-instances/container-instances-liveness-probe.md) and [Define a liveness HTTP request](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request).
294
+
295
+
#### Startup probes
296
+
297
+
In Kubernetes, the kubelet uses startup probes to know when a container application has started. When you configure a startup probe, readiness and liveness probes don't start until the startup probe succeeds, ensuring the readiness and liveness probes don't interfere with application startup. For more information, see [Startup Probes in Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes).
298
+
299
+
The following example pod definition file shows a startup probe configuration:
300
+
301
+
```yaml
302
+
startupProbe:
303
+
httpGet:
304
+
path: /healthz
305
+
port: 8080
306
+
failureThreshold: 30
307
+
periodSeconds: 10
308
+
```
271
309
272
310
### Multi-replica applications
273
311
@@ -386,7 +424,7 @@ Use the autoscaler on node pools to configure the minimum and maximum scale limi
386
424
387
425
For more information, see [Use the cluster autoscaler on node pools](./cluster-autoscaler.md#use-the-cluster-autoscaler-on-node-pools).
388
426
389
-
#### At least two nodes per system node pool
427
+
#### At least three nodes per system node pool
390
428
391
429
> **Best practice guidance**
392
430
>
@@ -414,17 +452,21 @@ For more information, see [Accelerated Networking overview](../virtual-network/a
414
452
>
415
453
> Images shouldn't use the `latest` tag.
416
454
455
+
#### Container image tags
456
+
417
457
Using the `latest` tag for [container images](https://kubernetes.io/docs/concepts/containers/images/) can lead to unpredictable behavior and makes it difficult to track which version of the image is running in your cluster. You can minimize these risks by integrating and running scan and remediation tools in your containers at build and runtime. For more information, see [Best practices for container image management in AKS](./operator-best-practices-container-image-management.md).
418
458
459
+
#### Node image upgrades
460
+
419
461
AKS provides multiple auto-upgrade channels for node OS image upgrades. You can use these channels to control the timing of upgrades. We recommend joining these auto-upgrade channels to ensure that your nodes are running the latest security patches and updates. For more information, see [Auto-upgrade node OS images in AKS](./auto-upgrade-node-os-image.md).
420
462
421
463
### Standard tier for production workloads
422
464
423
465
> **Best practice guidance**
424
466
>
425
-
> Use the standard tier for product workloads for greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default.
467
+
> Use the Standard tier for product workloads for greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default. If you need LTS, consider using the Premium tier.
426
468
427
-
The standard tier for Azure Kubernetes Service (AKS) provides a financially backed 99.9% uptime [service-level agreement (SLA)](https://www.azure.cn/en-us/support/sla/kubernetes-service/) for your production workloads. The standard tier also provides greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default. For more information, see [Standard pricing tier for AKS cluster management](./free-standard-pricing-tiers.md).
469
+
The Standard tier for Azure Kubernetes Service (AKS) provides a financially backed 99.9% uptime [service-level agreement (SLA)](https://www.azure.cn/en-us/support/sla/kubernetes-service/) for your production workloads. The standard tier also provides greater cluster reliability and resources, support for up to 5,000 nodes in a cluster, and Uptime SLA enabled by default. For more information, see [Pricing tiers for AKS cluster management](./free-standard-pricing-tiers.md).
0 commit comments