Merge pull request #221228 from schaffererin/gpu-aks-docs-updates

prmerger-automator[bot] · web-flow · commit e6d0c6ca6d75 · 2022-12-13T00:03:23.000Z
Adding small introductory notes to each article and adding links to other GPU doc(s) per GH issue request
diff --git a/articles/aks/gpu-cluster.md b/articles/aks/gpu-cluster.md
@@ -13,13 +13,13 @@ ms.date: 08/06/2021
 Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes. For more information on available GPU-enabled VMs, see [GPU optimized VM sizes in Azure][gpu-skus]. For AKS node pools, we recommend a minimum size of *Standard_NC6*. Note that the NVv4 series (based on AMD GPUs) are not yet supported with AKS.
 
 > [!NOTE]
-> GPU-enabled VMs contain specialized hardware that is subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
+> GPU-enabled VMs contain specialized hardware subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
 
 Currently, using GPU-enabled node pools is only available for Linux node pools.
 
 ## Before you begin
 
-This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
+This article helps you provision nodes with schedulable GPUs on new and existing AKS clusters. This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
 
 You also need the Azure CLI version 2.0.64 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI][install-azure-cli].
 
@@ -406,6 +406,8 @@ To run Apache Spark jobs, see [Run Apache Spark jobs on AKS][aks-spark].
 
 For more information about running machine learning (ML) workloads on Kubernetes, see [Kubeflow Labs][kubeflow-labs].
 
+For more information on features of the Kubernetes scheduler, see [Best practices for advanced scheduler features in AKS][advanced-scheduler-aks].
+
 For information on using Azure Kubernetes Service with Azure Machine Learning, see the following articles:
 
 * [Configure a Kubernetes cluster for ML model training or deployment][azureml-aks].
@@ -438,3 +440,4 @@ For information on using Azure Kubernetes Service with Azure Machine Learning, s
 [azureml-deploy]: ../machine-learning/how-to-deploy-managed-online-endpoints.md
 [azureml-triton]: ../machine-learning/how-to-deploy-with-triton.md
 [aks-container-insights]: monitor-aks.md#container-insights
+[advanced-scheduler-aks]: /aks/operator-best-practices-advanced-scheduler.md
diff --git a/articles/aks/operator-best-practices-advanced-scheduler.md b/articles/aks/operator-best-practices-advanced-scheduler.md
@@ -18,17 +18,19 @@ As you manage clusters in Azure Kubernetes Service (AKS), you often need to isol
 This best practices article focuses on advanced Kubernetes scheduling features for cluster operators. In this article, you learn how to:
 
 > [!div class="checklist"]
+>
 > * Use taints and tolerations to limit what pods can be scheduled on nodes.
 > * Give preference to pods to run on certain nodes with node selectors or node affinity.
 > * Split apart or group together pods with inter-pod affinity or anti-affinity.
+> * Restrict scheduling of workloads that require GPUs only on nodes with schedulable GPUs.
 
 ## Provide dedicated nodes using taints and tolerations
 
 > **Best practice guidance:**
 >
 > Limit access for resource-intensive applications, such as ingress controllers, to specific nodes. Keep node resources available for workloads that require them, and don't allow scheduling of other workloads on the nodes.
 
-When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
+When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. For more information, see [Use GPUs on AKS][use-gpus-aks]. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
 
 Because this node resource hardware is typically expensive to deploy, limit the workloads that can be scheduled on these nodes. Instead, dedicate some nodes in the cluster to run ingress services and prevent other workloads.
 
@@ -245,3 +247,4 @@ This article focused on advanced Kubernetes scheduler features. For more informa
 [aks-best-practices-identity]: operator-best-practices-identity.md
 [use-multiple-node-pools]: use-multiple-node-pools.md
 [taint-node-pool]: use-multiple-node-pools.md#specify-a-taint-label-or-tag-for-a-node-pool
+[use-gpus-aks]: /aks/gpu-cluster.md