Incorporated newly-added content into guides

schaffererin · schaffererin · commit 8e4ecc5eed51 · 2023-11-01T12:32:55.000-07:00
diff --git a/articles/aks/best-practices-performance-scale-large.md b/articles/aks/best-practices-performance-scale-large.md
@@ -3,7 +3,7 @@ title: Performance and scaling best practices for large workloads in Azure Kuber
 titleSuffix: Azure Kubernetes Service
 description: Learn the best practices for performance and scaling for large workloads in Azure Kubernetes Service (AKS).
 ms.topic: conceptual
-ms.date: 10/30/2023
+ms.date: 11/01/2023
 ---
 
 # Best practices for performance and scaling for large workloads in Azure Kubernetes Service (AKS)
diff --git a/articles/aks/best-practices-performance-scale.md b/articles/aks/best-practices-performance-scale.md
@@ -3,7 +3,7 @@ title: Performance and scaling best practices for small to medium workloads in A
 titleSuffix: Azure Kubernetes Service
 description: Learn the best practices for performance and scaling for small to medium workloads in Azure Kubernetes Service (AKS).
 ms.topic: conceptual
-ms.date: 10/30/2023
+ms.date: 11/01/2023
 ---
 
 # Best practices for performance and scaling for small to medium workloads in Azure Kubernetes Service (AKS)
@@ -25,19 +25,91 @@ In this article, you learn about:
 > [!NOTE]
 > This article focuses on best practices for **small to medium workloads**. For best practices for **large workloads**, see [Performance and scaling best practices for large workloads in Azure Kubernetes Service (AKS)](./best-practices-performance-scale-large.md).
 
-## Cluster autoscaling
+## Application autoscaling vs. infrastructure autoscaling
 
-### Using the Cluster Autoscaler
+### Application autoscaling
 
-You should configure Cluster Autoscaler profile settings according to your workload demands, while also considering the balance between performance and cost.
+Application autoscaling is useful when dealing with cost optimization or infrastructure limitations. A well-configured autoscaler maintains high availability for your application while also minimizing costs. You only pay for the resources required to maintain availability, regardless of the demand.
+
+For example, if an existing node has space but not enough IPs in the subnet, it might be able to skip the creation of a new node and instead immediately start running the application on a new pod.
+
+#### Horizontal Pod autoscaling
+
+Implementing [horizontal pod autoscaling](LINK) is useful for applications with a steady and predictable resource demand. The Horizontal Pod Autoscaler (HPA) dynamically scales the number of pod replicas, which effectively distributes the load across multiple pods and nodes. This scaling mechanism is typically most beneficial for applications that can be decomposed into smaller, independent components capable of running in parallel.
+
+The HPA provides resource utilization metrics by default. You can also integrate custom metrics or leverage tools like the [Kubernetes Event-Driven Autoscaler (KEDA) (Preview)](LINK). These extensions allow the HPA to make scaling decisions based on multiple perspectives and criteria, providing a more holistic view of your application's performance. This is especially helpful for applications with varying complex scaling requirements.
+
+> [!NOTE]
+> If maintaining high availability for your application is a top priority, we recommend leaving a slightly higher buffer for the minimum pod number for your HPA to account for scaling time.
+
+#### Vertical Pod autoscaling
+
+Implementing [vertical pod autoscaling](LINK) is useful for applications with fluctuating and unpredictable resource demands. The Vertical Pod Autoscaler (VPA) allows you to fine-tune resource requests, including CPU and memory, for individual pods, enabling precise control over resource allocation. This granularity minimizes resource waste and enhances the overall efficiency of cluster utilization. The VPA also streamlines application management by automating resource allocation, freeing up resources for critical tasks.
+
+You shouldn't use the VPA in conjunction with the HPA on the same CPU or memory metrics. This combination can lead to conflicts, as both autoscalers attempt to respond to changes in demand using the same metrics. However, you can use the VPA for CPU or memory in conjunction with the HPA for custom metrics to prevent overlap and ensure that each autoscaler focuses on distinct aspects of workload scaling.
+
+> [!NOTE]
+> The VPA works based on historical data. We recommend waiting at least *24 hours* after deploying the VPA before applying any changes to give it time to collect recommendation data.
+
+### Infrastructure autoscaling
+
+#### Cluster autoscaling
+
+Implementing cluster autoscaling is useful if your existing nodes lack sufficient capacity, as it helps with scaling up and provisioning new nodes.
 
 In cases where your cluster handles substantial, but infrequent workloads with a primary focus on performance, we recommend increasing the scan interval and the utilization threshold. This adjustment helps batch multiple scaling operations into a single call, optimizing scale time and use of compute read/write quotas. This configuration also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing pod scheduling efficiency. For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true` to minimize unnecessary scale down operations.
 
 If you want a cost-optimized profile, we recommend reducing the node unneeded time, utilization threshold, and scale-down delay after add operations. You can also increase *Max-bulk-delete* to help delete nodes in bulk. This configuration helps reduce the number of nodes in the cluster, which reduces the cost of the cluster. However, this configuration can also increase the time it takes to scale up the cluster.
 
+When considering cluster autoscaling, the decision of when to remove a node involves a tradeoff between optimizing resource utilization and ensuring resource availability. Eliminating underutilized nodes enhances cluster utilization but might result in new workloads having to wait for resources to be provisioned before they can be deployed. It's important to find a balance between these two factors that aligns with your cluster and workload requirements and [configure the cluster autoscaler profile settings accordingly](LINK).
+
 The Cluster Autoscaler profile settings apply universally to all autoscaler-enabled node pools in your cluster. This means that any scaling actions occurring in one autoscaler-enabled node pool might impact the autoscaling behavior in another node pool. It's important to apply consistent and synchronized profile settings across all relevant node pools to ensure that the autoscaler behaves as expected.
 
-## Application autoscaling
+##### Overprovisioning
+
+Overprovisioning is a strategy that helps mitigate the risk of application pressure by ensuring there's an excess of readily available resources. This approach is especially useful for applications that experience highly variable loads and cluster scaling patterns that show frequent scale ups and scale downs.
+
+To determine the optimal amount of overprovisioning, you can use the following formula: $1-buffer/1+traffic$
+
+For example, let's say you want to avoid hitting 100% CPU utilization in your cluster. You might opt for a 30% buffer to maintain a safety margin. If you anticipate an average traffic growth rate of 40%, you might consider overprovisioning by 50%, as calculated by the formula: $1-30/1+40=50$
+
+An effective overprovisioning method involves the use of *pause pods*. Pause pods are low-priority deployments that can be easily replaced by high-priority deployments. You create low-priority pods that serve the sole purpose of reserving buffer space. When a high-priority pod requires space, the pause pods are removed and rescheduled on another node or a new node to accommodate the high-priority pod.
+
+The following YAML shows an example pause pod manifest:
+
+```yml
+apiVersion: scheduling.k8s.io/v1 
+kind: PriorityClass 
+metadata: 
+  name: overprovisioning 
+value: -1 
+globalDefault: false 
+description: "Priority class used by overprovisioning." 
+--- 
+apiVersion: apps/v1 
+kind: Deployment 
+metadata: 
+  name: overprovisioning 
+  namespace: kube-system 
+spec: 
+  replicas: 1 
+  selector: 
+    matchLabels: 
+      run: overprovisioning 
+  template: 
+    metadata: 
+      labels: 
+        run: overprovisioning 
+    spec: 
+      priorityClassName: overprovisioning 
+      containers: 
+      - name: reserve-resources 
+        image: your-custome-pause-image 
+        resources: 
+          requests: 
+            cpu: 1 
+            memory: 4Gi 
+```
 
 ## Node scaling and efficiency
 
@@ -47,29 +119,41 @@ The Cluster Autoscaler profile settings apply universally to all autoscaler-enab
 
 Node scaling allows you to dynamically adjust the number of nodes in your cluster based on workload demands. It's important to understand that adding more nodes to a cluster isn't always the best solution for improving performance. To ensure optimal performance, you should carefully monitor resource utilization and scheduling policies to ensure nodes are being used efficiently.
 
-### Node image version
+### Node images
 
 > **Best practice guidance**:
 >
 > Use the latest node image version to ensure that you have the latest security patches and bug fixes.
 
-Using the latest node image version provides the best performance experience. AKS ships performance improvements within the weekly image releases.
+Using the latest node image version provides the best performance experience. AKS ships performance improvements within the weekly image releases. Falling behind on updates might have a negative impact on performance, so it's important to avoid large gaps between versions.
+
+#### Azure Linux
+
+The [Azure Linux Container Host on AKS](LINK) uses a native AKS image and provides a single place for Linux development. Every package is built from source and validated, ensuring your services run on proven components.
+
+Azure Linux is lightweight, only including the necessary set of packages to run container workloads. It provides a reduced attack surface and eliminates patching and maintenance of unnecessary packages. At its base layer, it has a Microsoft-hardened kernel tuned for Azure. This image is ideal for performance-sensitive workloads and platform engineers or operators that manage fleets of AKS clusters.
 
-### Virtual machine (VM) selection
+#### Ubuntu 2204
+
+The [Ubuntu 2204 image](LINK) is the default node image for AKS. It's a lightweight and efficient operating system optimized for running containerized workloads. This means that it can help reduce resource usage and improve overall performance. The image includes the latest security patches and updates, which help ensure that your workloads are protected from vulnerabilities.
+
+The Ubuntu 2204 image is fully supported by Microsoft and the Ubuntu community and can help you achieve better performance and security for your containerized workloads.
+
+### Virtual machines (VMs)
 
 > **Best practice guidance**:
 >
 > When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention.
 
-*Create* and *scale* latency have direct relationships with the VM SKUs you use in your workloads. The larger and more powerful the VM, the better the performance. For ***mission critical or product workloads***, we recommend using VMs with newer hardware generations, such as v4 and v5 machines, and at least an 8-core CPU.
+Application performance is closely tied to the VM SKUs you use in your workloads. Larger and more powerful VMs, generally provide better performance. For *mission critical or product workloads*, we recommend using VMs with at least an 8-core CPU. VMs with newer hardware generations, like v4 and v5, can also help improve performance. Keep in mind that create and scale latency might vary depending on the VM SKUs you use.
 
-OS disks are responsible for storing the operating system and its associated files, and the VMs are responsible for running the applications. When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention. For example, if the OS disk is significantly smaller than the VMs, it can limit the amount of space available for application data and cause the system to run out of disk space. If the OS disk has lower performance than the VMs, it can cause a bottleneck, limiting the overall performance of the system. Make sure the size and performance are balanced to ensure optimal performance in Kubernetes.
+OS disks are responsible for storing the operating system and its associated files, and the VMs are responsible for running the applications. When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention. For example, if the OS disk is significantly smaller than the VMs, it can limit the amount of space available for application data and cause the system to run out of disk space. If the OS disk has lower performance than the VMs, it can become a bottleneck and limit the overall performance of the system. Make sure the size and performance are balanced to ensure optimal performance in Kubernetes.
 
 ### Node pools
 
 For scaling performance and reliability, we recommend using a dedicated system node pool. The system node pool reserves resources for critical components like system OS daemons and kernel memory. We recommend running your application in a user node pool to increase the availability of allocatable resources for your workloads. This configuration also helps mitigate the risk of resource competition between the system and application.
 
-### Create provisioning
+### Create operations
 
 Review the extensions and add-ons you have enabled during create provisioning. Extensions and add-ons can add latency to overall duration of create operations. If you don't need an extension or add-on, we recommend removing it to improve create latency.
 
@@ -79,9 +163,7 @@ You can also use availability zones to provide a higher level of availability to
 
 The Kubernetes data plane is responsible for managing network traffic between containers and services. Issues with the data plane can lead to slow response times, degraded performance, and application downtime. It's important to carefully monitor and optimize data plane configurations, such as network latency, resource allocation, container density, and network policies, to ensure your containerized applications run smoothly and efficiently.
 
-### Storage
-
-#### OS disks
+### Storage types
 
 AKS recommends and defaults to using ephemeral OS disks. Ephemeral OS disks are created on local VM storage and aren't saved to remote Azure storage like managed OS disks. They have faster reimaging and boot times, enabling faster cluster operations, and they provide lower read/write latency on the OS disk of AKS agent nodes. Ephemeral OS disks work well for stateless workloads, where applications are tolerant of individual VM failures but not of VM deployment time or individual VM reimaging instances. Only certain VM SKUs support ephemeral OS disks, so you need to ensure that your desired SKU generation and size is compatible. For more information, see [Ephemeral OS disks in Azure Kubernetes Service (AKS)](LINK).
 
@@ -96,15 +178,17 @@ The following table provides a breakdown of suggested use cases for OS disks sup
 | Standard SSD OS disks | • Consistent performance.<br/> • Better availability and latency compared to Standard HDD disks. | • Web servers.<br/> • Low input/output operations per second (IOPS) application servers.<br/> • Lightly used enterprise applications.<br/> • Dev/test workloads. |
 | Standard HDD disks | • Low cost.<br/> • Exhibits variability in performance and latency. | • Backup storage.<br/> • Mass storage with infrequent access. |
 
-##### IOPS and throughput
+#### IOPS and throughput
 
 Input/output operations per second (IOPS) refers to the number of read and write operations that a disk can perform in a second. Throughout refers to the amount of data that can be transferred in a given time period.
 
 In Kubernetes, the IOPS and throughput of a disk can have a significant impact on the performance of the system. When a Kubernetes cluster runs multiple applications, the disk IOPS and throughput might cause a bottleneck, especially for applications requiring high disk performance. It's important to choose a disk that can handle the workload and provide sufficient IOPS and throughput.
 
 Ephemeral OS disks can provide dynamic IOPS and throughput for your application, whereas managed disks have capped IOPS and throughput. For more information, see [LINK](LINK).
 
-### Networking
+[Azure Premium SSD v2]() is designed for IO-intense enterprise workloads that require sub-millisecond disk latencies and high IOPS and throughput at a low cost. It's suited for a broad range of workloads, such as SQL server, Oracle, MariaDB, SAP, Cassandra, MongoDB, big data/analytics, gaming, and more. This disk type is the highest performing option currently available for persistent volumes.
+
+### Networking types
 
 > **Best practice guidance**:
 >
@@ -116,8 +200,8 @@ We recommend using [Dynamic IP allocation](LINK) or [CNI Overlay](LINK) networki
 
 For more information, see [Dynamic IP allocation overview](LINK) and [CNI Overlay overview](LINK).
 
-### Pods
+### Pod scheduling
 
-The memory and CPU resources allocated to a VM have a direct impact on the performance of the pods running on the VM. When a pod is created, it's assigned a certain amount of memory and CPU resources, which are used to run the application. If the VM doesn't have enough memory or CPU resources available, it can cause the pods to slow down or even crash. If the VM has too much memory or CPU resources available, it can cause the pods to run inefficiently, wasting resources and increasing costs. It's important to ensure that your VMs have the appropriate amount of memory and CPU resources to ensure optimal performance of your Kubernetes pods. To help ensure optimal performance, you can monitor resource usage and adjust the allocation as needed to ensure the VM uses the resources efficiently and effectively. You can set the maximum pods per node based on your capacity planning using `--max-pods`. For more information, see [Configure maximum number of pods per node](LINK).
+The memory and CPU resources allocated to a VM have a direct impact on the performance of the pods running on the VM. When a pod is created, it's assigned a certain amount of memory and CPU resources, which are used to run the application. If the VM doesn't have enough memory or CPU resources available, it can cause the pods to slow down or even crash. If the VM has too much memory or CPU resources available, it can cause the pods to run inefficiently, wasting resources and increasing costs.
 
-Overprovisioning can help improve the performance and reliability of scale operations using the AKS Cluster Autoscaler. You can configure overprovisioning using deployment running pause pods with a low-assigned priority level, which keeps resources that other pods can use. If there aren't enough resources, the pause pods are preempted, new pods take their place, the pause pods become unschedulable, and the Cluster Autoscaler scales up the cluster. For more information, see [DOC](LINK).
+It's important to ensure that your VMs have the appropriate amount of memory and CPU resources to ensure optimal performance of your Kubernetes pods. To help ensure optimal performance, you can monitor resource usage and adjust the allocation as needed to ensure the VM uses the resources efficiently and effectively. You can set the maximum pods per node based on your capacity planning using `--max-pods`. For more information, see [Configure maximum number of pods per node](LINK).