Skip to content

Commit 4811159

Browse files
authored
Sync release-hotfixes with main
Sync release-hotfixes with main
2 parents f1ea6a2 + f5f963a commit 4811159

File tree

7 files changed

+138
-78
lines changed

7 files changed

+138
-78
lines changed

AKS-Hybrid/deploy-ai-model.md

Lines changed: 37 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
44
author: sethmanheim
55
ms.author: sethm
66
ms.topic: how-to
7-
ms.date: 12/03/2024
7+
ms.date: 12/06/2024
8+
ms.reviewer: haojiehang
9+
ms.lastreviewed: 12/03/2024
810

911
---
1012

@@ -14,13 +16,11 @@ ms.date: 12/03/2024
1416

1517
This article describes how to deploy an AI model on AKS Arc with the Kubernetes AI toolchain operator (KAITO). The AI toolchain operator (KAITO) is an add-on for AKS Arc, and it simplifies the experience of running OSS AI models on your AKS Arc clusters. To enable this feature, follow this workflow:
1618

17-
1. Create a node pool with GPU.
18-
1. Deploy KAITO operator.
19-
1. Deploy AI model.
19+
1. Deploy KAITO on an existing cluster.
20+
1. Add a GPU node pool.
21+
1. Deploy the AI model.
2022
1. Validate the model deployment.
2123

22-
The following deployment instructions are also available in [the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/How-to-use-kaito-in-aks-arc.md).
23-
2424
> [!IMPORTANT]
2525
> These preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. Azure Kubernetes Service, enabled by Azure Arc previews are partially covered by customer support on a best-effort basis.
2626
@@ -31,42 +31,52 @@ Before you begin, make sure you have the following prerequisites:
3131
1. The following details from your infrastructure administrator:
3232

3333
- An AKS Arc cluster that's up and running. For more information, see [Create Kubernetes clusters using Azure CLI](aks-create-clusters-cli.md).
34+
- Make sure that the AKS Arc cluster runs on the Azure Local cluster with a supported GPU model. Before you create the node pool, you must also identify the correct GPU VM SKUs based on the model. For more information, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
3435
- We recommend using a computer running Linux for this feature.
35-
- Your local **kubectl** environment configured to point to your AKS Arc cluster.
36-
- Run `az connectedk8s proxy` to connect to your AKS Arc cluster from your development machine.
36+
- Use `az connectedk8s proxy` to connect to your AKS Arc cluster.
3737

38-
1. Make sure your AKS Arc cluster is enabled with GPUs. You can ask your infrastructure administrator to set it up for you. You must also identify the right VM SKUs for your AKS Arc cluster before you create the node pool. For instructions, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
3938
1. Make sure that **helm** and **kubectl** are installed on your local machine.
4039

4140
- If you need to install or upgrade, see [Install Helm](https://helm.sh/docs/intro/install/).
42-
- If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
41+
- If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
42+
43+
## Deploy KAITO from GitHub
44+
45+
You must have a running AKS Arc cluster with a default node pool. To deploy the KAITO operator, follow these steps:
46+
47+
1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
48+
1. Install the KAITO operator using the following command:
49+
50+
```bash
51+
helm install workspace ./charts/kaito/workspace --namespace kaito-workspace --create-namespace
52+
```
4353

44-
## Create a GPU node pool
54+
## Add a GPU node pool
4555

46-
To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
56+
Before you add a GPU node, make sure that Azure Local is enabled with a supported GPU model, and that the GPU drivers are installed on all the host nodes. To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
4757

4858
### [Azure portal](#tab/portal)
4959

5060
To create a GPU node pool using the Azure portal, follow these steps:
5161

5262
1. Sign in to the Azure portal and find your AKS Arc cluster.
53-
1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux. Fill in the other required fields and create the node pool resource.
63+
1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux nodes. Fill in the other required fields and create the node pool.
5464

5565
:::image type="content" source="media/deploy-ai-model/nodepools-portal.png" alt-text="Screenshot of node pools portal page." lightbox="media/deploy-ai-model/nodepools-portal.png":::
5666

5767
### [Azure CLI](#tab/azurecli)
5868

59-
To create a GPU node pool using the Azure CLI, run the following command. The GPU VM SKU used in the following example is for A16; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
69+
To create a GPU node pool using Azure CLI, run the following command. The GPU VM SKU used in the following example is for the **A16** model; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
6070

6171
```azurecli
62-
az aksarc nodepool add --name "samplenodepool" --cluster-name "samplecluster" --resource-group "sample-rg" --node-vm-size "samplenodepoolsize" --os-type "Linux"
72+
az aksarc nodepool add --name "samplenodepool" --cluster-name "samplecluster" --resource-group "sample-rg" --node-vm-size "Standard_NC16_A16" --os-type "Linux"
6373
```
6474

6575
---
6676

67-
### Validate the GPU node pool
77+
### Validate the node pool deployment
6878

69-
After the node pool creation command succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**:
79+
After the node pool creation succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**. The other node is from the default node pool that was created during the cluster creation:
7080

7181
```bash
7282
kubectl get nodes
@@ -103,25 +113,14 @@ capacity:
103113
ephemeral-storage: 103110508Ki
104114
```
105115

106-
## Deploy KAITO operator from GitHub
107-
108-
To deploy the KAITO operator, follow these steps:
109-
110-
1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
111-
1. Install the KAITO operator using the following command:
112-
113-
```bash
114-
helm install workspace ./charts/kaito/workspace --namespace kaito-workspace --create-namespace
115-
```
116-
117116
## Deploy the AI model
118117

119118
To deploy the AI model, follow these steps:
120119

121-
1. Create a YAML file with the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
120+
1. Create a YAML file using the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
122121

123-
- The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml), in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
124-
- We recommend using `labelSelector` and `preferredNodes` to select the GPU nodes. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want; the next steps shows the node labeling command.
122+
- The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml) in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
123+
- We recommend using `labelSelector` and `preferredNodes` to explicitly select the GPU node by name. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want, as long as the labels match. The next step shows how to label the node.
125124

126125
```yaml
127126
apiVersion: kaito.sh/v1alpha1
@@ -131,7 +130,7 @@ To deploy the AI model, follow these steps:
131130
resource:
132131
labelSelector:
133132
matchLabels:
134-
app: llm-inference
133+
apps: llm-inference
135134
preferredNodes:
136135
- moc-le4aoguwyd9
137136
inference:
@@ -145,7 +144,7 @@ To deploy the AI model, follow these steps:
145144
kubectl label node moc-le4aoguwyd9 app=llm-inference
146145
```
147146

148-
1. Apply the YAML file and wait until the workplace deployment is completed:
147+
1. Apply the YAML file and wait until the workplace deployment completes:
149148

150149
```bash
151150
kubectl apply -f sampleyamlfile.yaml
@@ -155,7 +154,7 @@ To deploy the AI model, follow these steps:
155154

156155
To validate the model deployment, follow these steps:
157156

158-
1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the sample prompt.
157+
1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the prompt:
159158

160159
```bash
161160
kubectl get workspace
@@ -165,10 +164,10 @@ To validate the model deployment, follow these steps:
165164

166165
```output
167166
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
168-
workspace-falcon-7b Standard_NC12s_v3 True True True 18h
167+
workspace-falcon-7b Standard_NC16_A16 True True True 18h
169168
```
170169

171-
1. In the previous example, the inference service **workspace-falcon-7b** is exposed internally and can be accessed with the cluster IP. You can test the model with the following sample prompt. For more information about features in the Kaito inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
170+
1. After the resource and inference is ready, the **workspace-falcon-7b** inference service is exposed internally and can be accessed with a cluster IP. You can test the model with the following prompt. For more information about features in the KAITO inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
172171

173172
```bash
174173
export CLUSTERIP=$(kubectl get svc workspace-falcon-7b -o jsonpath="{.spec.clusterIPs[0]}")
@@ -195,14 +194,12 @@ To validate the model deployment, follow these steps:
195194
196195
## Troubleshooting
197196
198-
If the pod does not get deployed, or **ResourceReady** is empty or **false** when **kubectl** retrieves workspaces, it's usually because the preferred node isn't labeled correctly. Check the node label by running `kubectl get node <yourNodeName> --show-labels`.
199-
200-
For example, in your YAML file, the following code specifies that the node must have the label `apps=falcon-7b`:
197+
If the pod is not deployed properly or the **ResourceReady** field shows empty or **false**, it's usually because the preferred GPU node isn't labeled correctly. Check the node label with `kubectl get node <yourNodeName> --show-labels`. For example, in the YAML file, the following code specifies that the node must have the label `apps=llm-inference`:
201198
202199
```yaml
203200
labelSelector:
204201
matchLabels:
205-
apps: falcon-7b
202+
apps: llm-inference
206203
```
207204
208205
## Next steps

0 commit comments

Comments
 (0)