You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AKS-Hybrid/deploy-ai-model.md
+37-40Lines changed: 37 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
4
4
author: sethmanheim
5
5
ms.author: sethm
6
6
ms.topic: how-to
7
-
ms.date: 12/03/2024
7
+
ms.date: 12/06/2024
8
+
ms.reviewer: haojiehang
9
+
ms.lastreviewed: 12/03/2024
8
10
9
11
---
10
12
@@ -14,13 +16,11 @@ ms.date: 12/03/2024
14
16
15
17
This article describes how to deploy an AI model on AKS Arc with the Kubernetes AI toolchain operator (KAITO). The AI toolchain operator (KAITO) is an add-on for AKS Arc, and it simplifies the experience of running OSS AI models on your AKS Arc clusters. To enable this feature, follow this workflow:
16
18
17
-
1.Create a node pool with GPU.
18
-
1.Deploy KAITO operator.
19
-
1. Deploy AI model.
19
+
1.Deploy KAITO on an existing cluster.
20
+
1.Add a GPU node pool.
21
+
1. Deploy the AI model.
20
22
1. Validate the model deployment.
21
23
22
-
The following deployment instructions are also available in [the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/How-to-use-kaito-in-aks-arc.md).
23
-
24
24
> [!IMPORTANT]
25
25
> These preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. Azure Kubernetes Service, enabled by Azure Arc previews are partially covered by customer support on a best-effort basis.
26
26
@@ -31,42 +31,52 @@ Before you begin, make sure you have the following prerequisites:
31
31
1. The following details from your infrastructure administrator:
32
32
33
33
- An AKS Arc cluster that's up and running. For more information, see [Create Kubernetes clusters using Azure CLI](aks-create-clusters-cli.md).
34
+
- Make sure that the AKS Arc cluster runs on the Azure Local cluster with a supported GPU model. Before you create the node pool, you must also identify the correct GPU VM SKUs based on the model. For more information, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
34
35
- We recommend using a computer running Linux for this feature.
35
-
- Your local **kubectl** environment configured to point to your AKS Arc cluster.
36
-
- Run `az connectedk8s proxy` to connect to your AKS Arc cluster from your development machine.
36
+
- Use `az connectedk8s proxy` to connect to your AKS Arc cluster.
37
37
38
-
1. Make sure your AKS Arc cluster is enabled with GPUs. You can ask your infrastructure administrator to set it up for you. You must also identify the right VM SKUs for your AKS Arc cluster before you create the node pool. For instructions, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
39
38
1. Make sure that **helm** and **kubectl** are installed on your local machine.
40
39
41
40
- If you need to install or upgrade, see [Install Helm](https://helm.sh/docs/intro/install/).
42
-
- If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
41
+
- If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
42
+
43
+
## Deploy KAITO from GitHub
44
+
45
+
You must have a running AKS Arc cluster with a default node pool. To deploy the KAITO operator, follow these steps:
46
+
47
+
1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
48
+
1. Install the KAITO operator using the following command:
To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
56
+
Before you add a GPU node, make sure that Azure Local is enabled with a supported GPU model, and that the GPU drivers are installed on all the host nodes. To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
47
57
48
58
### [Azure portal](#tab/portal)
49
59
50
60
To create a GPU node pool using the Azure portal, follow these steps:
51
61
52
62
1. Sign in to the Azure portal and find your AKS Arc cluster.
53
-
1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux. Fill in the other required fields and create the node pool resource.
63
+
1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux nodes. Fill in the other required fields and create the node pool.
54
64
55
65
:::image type="content" source="media/deploy-ai-model/nodepools-portal.png" alt-text="Screenshot of node pools portal page." lightbox="media/deploy-ai-model/nodepools-portal.png":::
56
66
57
67
### [Azure CLI](#tab/azurecli)
58
68
59
-
To create a GPU node pool using the Azure CLI, run the following command. The GPU VM SKU used in the following example is for A16; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
69
+
To create a GPU node pool using Azure CLI, run the following command. The GPU VM SKU used in the following example is for the **A16** model; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
After the node pool creation command succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**:
79
+
After the node pool creation succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**. The other node is from the default node pool that was created during the cluster creation:
70
80
71
81
```bash
72
82
kubectl get nodes
@@ -103,25 +113,14 @@ capacity:
103
113
ephemeral-storage: 103110508Ki
104
114
```
105
115
106
-
## Deploy KAITO operator from GitHub
107
-
108
-
To deploy the KAITO operator, follow these steps:
109
-
110
-
1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
111
-
1. Install the KAITO operator using the following command:
1. Create a YAML file with the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
120
+
1. Create a YAML file using the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
122
121
123
-
- The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml), in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
124
-
- We recommend using `labelSelector` and `preferredNodes` to select the GPU nodes. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want; the next steps shows the node labeling command.
122
+
- The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml) in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
123
+
- We recommend using `labelSelector` and `preferredNodes` to explicitly select the GPU node by name. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want, as long as the labels match. The next step shows how to label the node.
125
124
126
125
```yaml
127
126
apiVersion: kaito.sh/v1alpha1
@@ -131,7 +130,7 @@ To deploy the AI model, follow these steps:
131
130
resource:
132
131
labelSelector:
133
132
matchLabels:
134
-
app: llm-inference
133
+
apps: llm-inference
135
134
preferredNodes:
136
135
- moc-le4aoguwyd9
137
136
inference:
@@ -145,7 +144,7 @@ To deploy the AI model, follow these steps:
1. Apply the YAML file and wait until the workplace deployment is completed:
147
+
1. Apply the YAML file and wait until the workplace deployment completes:
149
148
150
149
```bash
151
150
kubectl apply -f sampleyamlfile.yaml
@@ -155,7 +154,7 @@ To deploy the AI model, follow these steps:
155
154
156
155
To validate the model deployment, follow these steps:
157
156
158
-
1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the sample prompt.
157
+
1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the prompt:
159
158
160
159
```bash
161
160
kubectl get workspace
@@ -165,10 +164,10 @@ To validate the model deployment, follow these steps:
165
164
166
165
```output
167
166
NAME INSTANCE RESOURCEREADY INFERENCEREADY JOBSTARTED WORKSPACESUCCEEDED AGE
1.In the previous example, the inference service **workspace-falcon-7b** is exposed internally and can be accessed with the cluster IP. You can test the model with the following sample prompt. For more information about features in the Kaito inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
170
+
1.After the resource and inference is ready, the **workspace-falcon-7b**inference service is exposed internally and can be accessed with a cluster IP. You can test the model with the following prompt. For more information about features in the KAITO inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
172
171
173
172
```bash
174
173
export CLUSTERIP=$(kubectl get svc workspace-falcon-7b -o jsonpath="{.spec.clusterIPs[0]}")
@@ -195,14 +194,12 @@ To validate the model deployment, follow these steps:
195
194
196
195
## Troubleshooting
197
196
198
-
If the pod does not get deployed, or **ResourceReady** is empty or **false** when **kubectl** retrieves workspaces, it's usually because the preferred node isn't labeled correctly. Check the node label by running `kubectl get node <yourNodeName> --show-labels`.
199
-
200
-
For example, in your YAML file, the following code specifies that the node must have the label `apps=falcon-7b`:
197
+
If the pod is not deployed properly or the **ResourceReady** field shows empty or **false**, it's usually because the preferred GPU node isn't labeled correctly. Check the node label with `kubectl get node <yourNodeName> --show-labels`. For example, in the YAML file, the following code specifies that the node must have the label `apps=llm-inference`:
0 commit comments