You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AKS-Arc/deploy-ai-model.md
+35-9Lines changed: 35 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
4
4
author: sethmanheim
5
5
ms.author: sethm
6
6
ms.topic: how-to
7
-
ms.date: 05/09/2025
7
+
ms.date: 05/12/2025
8
8
ms.reviewer: haojiehang
9
-
ms.lastreviewed: 05/09/2025
9
+
ms.lastreviewed: 05/12/2025
10
10
11
11
---
12
12
@@ -33,7 +33,7 @@ Before you begin, make sure you have the following prerequisites:
33
33
- Install the **aksarc** extension, and make sure the version is at least 1.5.37. To get the list of installed CLI extensions, run `az extension list -o table`.
34
34
- If you use a Powershell terminal, make sure the version is at least 7.4.
35
35
36
-
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace](https://huggingface.co/content-policy).
36
+
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace](https://huggingface.co/content-policy).
37
37
38
38
The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
39
39
@@ -104,6 +104,22 @@ To deploy the AI model, follow these steps:
104
104
1. Create a YAML file with the following sample file. In this example, we use the Phi 3.5 Mini model by specifying the preset name as **phi-3.5-mini-instruct**. If you want to use other LLMs, use the preset name from the KAITO repo. You should also make sure that the LLM can deploy on the VM SKU based on the matrix table in the "Model VM SKU Matrix" section.
105
105
106
106
```yaml
107
+
apiVersion: kaito.sh/v1beta1
108
+
kind: Workspace
109
+
metadata:
110
+
name: workspace-llm
111
+
resource:
112
+
instanceType: <GPU_VM_SKU> # Update this value with GPU VM SKU
113
+
labelSelector:
114
+
matchLabels:
115
+
apps: llm-inference
116
+
preferredNodes:
117
+
- moc-l36c6vu97d5 # Update the value with GPU VM name
118
+
inference:
119
+
preset:
120
+
name: phi-3.5-mini-instruct # Update preset name as needed
121
+
config: "ds-inference-params"
122
+
---
107
123
apiVersion: v1
108
124
kind: ConfigMap
109
125
metadata:
@@ -116,7 +132,6 @@ To deploy the AI model, follow these steps:
116
132
swap-space: 4
117
133
gpu-memory-utilization: 0.9
118
134
max-model-len: 4096
119
-
# For more options, see https://docs.vllm.ai/en/latest/serving/engine_args.html.
120
135
```
121
136
122
137
1. Apply the YAML and wait until the deployment completes. Make sure that internet connectivity is good so that the model can be downloaded from the Hugging Face website within a few minutes. When the inference workspace is successfully provisioned, both **ResourceReady** and **InferenceReady** become **True**. See the "Troubleshooting" section if you encounter any failures in the workspace deployment.
@@ -138,14 +153,26 @@ After the resource and inference states become ready, the inference service is e
138
153
```bash
139
154
export CLUSTERIP=$(kubectl get svc workspace-llm -o jsonpath="{.spec.clusterIPs[0]}")
140
155
141
-
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"hello how are you\"}"
156
+
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions
157
+
-H "Content-Type: application/json"
158
+
-d '{
159
+
"model": "phi-3.5-mini-instruct",
160
+
"prompt": "What is kubernetes?",
161
+
"max_tokens": 20,
162
+
"temperature": 0
163
+
}'
142
164
```
143
165
144
166
```powershell
145
167
$CLUSTERIP = $(kubectl get svc workspace-llm -o jsonpath="{.spec.clusterIPs[0]}" )
146
-
$jsonContent = '{"prompt":"hello how are you"}'
147
-
148
-
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d $jsonContent
168
+
$jsonContent = '{
169
+
"model": "phi-3.5-mini-instruct",
170
+
"prompt": "What is kubernetes?",
171
+
"max_tokens": 20,
172
+
"temperature": 0
173
+
}'
174
+
175
+
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions -H "accept: application/json" -H "Content-Type: application/json" -d $jsonContent
149
176
```
150
177
151
178
## Clean up resources
@@ -178,7 +205,6 @@ The following table shows the supported GPU models and their corresponding VM SK
178
205
179
206
1. If you want to deploy an LLM and see the error **OutOfMemoryError: CUDA out of memory**, please raise an issue in the [KAITO repo](https://github.com/kaito-project/kaito/).
180
207
1. If you see the error **(ExtensionOperationFailed) The extension operation failed with the following error: Unable to get a response from the Agent in time** during extension installation, [see this TSG](/troubleshoot/azure/azure-kubernetes/extensions/cluster-extension-deployment-errors#error-unable-to-get-a-response-from-the-agent-in-time) and ensure the extension agent in the AKS Arc cluster can connect to Azure.
181
-
1. If you see an error such as **Unexpected error: (ExtensionOperationFailed) The extension operation failed with the following error: Error: [ InnerError: [Helm installation failed : Resource already existing in your cluster : Recommendation Manually delete the resource(s) that currently exist in your cluster and try installation again.**, it's possible that you previously enabled the KAITO extension on the cluster. Make sure to delete the KAITO namespace and try again.
182
208
1. If you see an error during prompt testing such as **{"detail":[{"type":"json_invalid","loc":["body",1],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting property name enclosed in double quotes"}}]}**, it's possible that your PowerShell terminal version is 5.1. Make sure the terminal version is at least 7.4.
0 commit comments