You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/gpu-cluster.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,13 @@ ms.date: 08/06/2021
13
13
Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes. For more information on available GPU-enabled VMs, see [GPU optimized VM sizes in Azure][gpu-skus]. For AKS node pools, we recommend a minimum size of *Standard_NC6*. Note that the NVv4 series (based on AMD GPUs) are not yet supported with AKS.
14
14
15
15
> [!NOTE]
16
-
> GPU-enabled VMs contain specialized hardware that is subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
16
+
> GPU-enabled VMs contain specialized hardware subject to higher pricing and region availability. For more information, see the [pricing][azure-pricing] tool and [region availability][azure-availability].
17
17
18
18
Currently, using GPU-enabled node pools is only available for Linux node pools.
19
19
20
20
## Before you begin
21
21
22
-
This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
22
+
This article helps you provision nodes with schedulable GPUs on new and existing AKS clusters. This article assumes that you have an existing AKS cluster. If you need an AKS cluster, see the AKS quickstart [using the Azure CLI][aks-quickstart-cli], [using Azure PowerShell][aks-quickstart-powershell], or [using the Azure portal][aks-quickstart-portal].
23
23
24
24
You also need the Azure CLI version 2.0.64 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI][install-azure-cli].
25
25
@@ -406,6 +406,8 @@ To run Apache Spark jobs, see [Run Apache Spark jobs on AKS][aks-spark].
406
406
407
407
For more information about running machine learning (ML) workloads on Kubernetes, see [Kubeflow Labs][kubeflow-labs].
408
408
409
+
For more information on features of the Kubernetes scheduler, see [Best practices for advanced scheduler features in AKS][advanced-scheduler-aks].
410
+
409
411
For information on using Azure Kubernetes Service with Azure Machine Learning, see the following articles:
410
412
411
413
*[Configure a Kubernetes cluster for ML model training or deployment][azureml-aks].
@@ -438,3 +440,4 @@ For information on using Azure Kubernetes Service with Azure Machine Learning, s
Copy file name to clipboardExpand all lines: articles/aks/operator-best-practices-advanced-scheduler.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,17 +18,19 @@ As you manage clusters in Azure Kubernetes Service (AKS), you often need to isol
18
18
This best practices article focuses on advanced Kubernetes scheduling features for cluster operators. In this article, you learn how to:
19
19
20
20
> [!div class="checklist"]
21
+
>
21
22
> * Use taints and tolerations to limit what pods can be scheduled on nodes.
22
23
> * Give preference to pods to run on certain nodes with node selectors or node affinity.
23
24
> * Split apart or group together pods with inter-pod affinity or anti-affinity.
25
+
> * Restrict scheduling of workloads that require GPUs only on nodes with schedulable GPUs.
24
26
25
27
## Provide dedicated nodes using taints and tolerations
26
28
27
29
> **Best practice guidance:**
28
30
>
29
31
> Limit access for resource-intensive applications, such as ingress controllers, to specific nodes. Keep node resources available for workloads that require them, and don't allow scheduling of other workloads on the nodes.
30
32
31
-
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
33
+
When you create your AKS cluster, you can deploy nodes with GPU support or a large number of powerful CPUs. For more information, see [Use GPUs on AKS][use-gpus-aks]. You can use these nodes for large data processing workloads such as machine learning (ML) or artificial intelligence (AI).
32
34
33
35
Because this node resource hardware is typically expensive to deploy, limit the workloads that can be scheduled on these nodes. Instead, dedicate some nodes in the cluster to run ingress services and prevent other workloads.
34
36
@@ -245,3 +247,4 @@ This article focused on advanced Kubernetes scheduler features. For more informa
0 commit comments