Skip to content

Commit e6d0c6c

Browse files
Merge pull request #221228 from schaffererin/gpu-aks-docs-updates
Adding small introductory notes to each article and adding links to other GPU doc(s) per GH issue request
2 parents b237406 + 61b474e commit e6d0c6c

File tree

2 files changed

+9
-3
lines changed

2 files changed

+9
-3
lines changed

articles/aks/gpu-cluster.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ ms.date: 08/06/2021
1313
Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes. For more information on available GPU-enabled VMs, see [GPU optimized VM sizes in Azure][gpu-skus]. For AKS node pools, we recommend a minimum size of *Standard_NC6*. Note that the NVv4 series (based on AMD GPUs) are not yet supported with AKS.
1414

1515
> [!NOTE]
16-
> GPU-enabled VMs contain specialized hardware that is subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
16+
> GPU-enabled VMs contain specialized hardware subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
1717
1818
Currently, using GPU-enabled node pools is only available for Linux node pools.
1919

2020
## Before you begin
2121

22-
This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
22+
This article helps you provision nodes with schedulable GPUs on new and existing AKS clusters. This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
2323

2424
You also need the Azure CLI version 2.0.64 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI][install-azure-cli].
2525

@@ -406,6 +406,8 @@ To run Apache Spark jobs, see [Run Apache Spark jobs on AKS][aks-spark].
406406

407407
For more information about running machine learning (ML) workloads on Kubernetes, see [Kubeflow Labs][kubeflow-labs].
408408

409+
For more information on features of the Kubernetes scheduler, see [Best practices for advanced scheduler features in AKS][advanced-scheduler-aks].
410+
409411
For information on using Azure Kubernetes Service with Azure Machine Learning, see the following articles:
410412

411413
* [Configure a Kubernetes cluster for ML model training or deployment][azureml-aks].
@@ -438,3 +440,4 @@ For information on using Azure Kubernetes Service with Azure Machine Learning, s
438440
[azureml-deploy]: ../machine-learning/how-to-deploy-managed-online-endpoints.md
439441
[azureml-triton]: ../machine-learning/how-to-deploy-with-triton.md
440442
[aks-container-insights]: monitor-aks.md#container-insights
443+
[advanced-scheduler-aks]: /aks/operator-best-practices-advanced-scheduler.md

articles/aks/operator-best-practices-advanced-scheduler.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,19 @@ As you manage clusters in Azure Kubernetes Service (AKS), you often need to isol
1818
This best practices article focuses on advanced Kubernetes scheduling features for cluster operators. In this article, you learn how to:
1919

2020
> [!div class="checklist"]
21+
>
2122
> * Use taints and tolerations to limit what pods can be scheduled on nodes.
2223
> * Give preference to pods to run on certain nodes with node selectors or node affinity.
2324
> * Split apart or group together pods with inter-pod affinity or anti-affinity.
25+
> * Restrict scheduling of workloads that require GPUs only on nodes with schedulable GPUs.
2426
2527
## Provide dedicated nodes using taints and tolerations
2628

2729
> **Best practice guidance:**
2830
>
2931
> Limit access for resource-intensive applications, such as ingress controllers, to specific nodes. Keep node resources available for workloads that require them, and don't allow scheduling of other workloads on the nodes.
3032
31-
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
33+
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. For more information, see [Use GPUs on AKS][use-gpus-aks]. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
3234

3335
Because this node resource hardware is typically expensive to deploy, limit the workloads that can be scheduled on these nodes. Instead, dedicate some nodes in the cluster to run ingress services and prevent other workloads.
3436

@@ -245,3 +247,4 @@ This article focused on advanced Kubernetes scheduler features. For more informa
245247
[aks-best-practices-identity]: operator-best-practices-identity.md
246248
[use-multiple-node-pools]: use-multiple-node-pools.md
247249
[taint-node-pool]: use-multiple-node-pools.md#specify-a-taint-label-or-tag-for-a-node-pool
250+
[use-gpus-aks]: /aks/gpu-cluster.md

0 commit comments

Comments
 (0)