Skip to content

Commit 81f6e2c

Browse files
committed
Fix per comments
1 parent f6fe7f4 commit 81f6e2c

File tree

1 file changed

+35
-9
lines changed

1 file changed

+35
-9
lines changed

AKS-Arc/deploy-ai-model.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
44
author: sethmanheim
55
ms.author: sethm
66
ms.topic: how-to
7-
ms.date: 05/09/2025
7+
ms.date: 05/12/2025
88
ms.reviewer: haojiehang
9-
ms.lastreviewed: 05/09/2025
9+
ms.lastreviewed: 05/12/2025
1010

1111
---
1212

@@ -33,7 +33,7 @@ Before you begin, make sure you have the following prerequisites:
3333
- Install the **aksarc** extension, and make sure the version is at least 1.5.37. To get the list of installed CLI extensions, run `az extension list -o table`.
3434
- If you use a Powershell terminal, make sure the version is at least 7.4.
3535

36-
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace](https://huggingface.co/content-policy).
36+
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace](https://huggingface.co/content-policy).
3737

3838
The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
3939

@@ -104,6 +104,22 @@ To deploy the AI model, follow these steps:
104104
1. Create a YAML file with the following sample file. In this example, we use the Phi 3.5 Mini model by specifying the preset name as **phi-3.5-mini-instruct**. If you want to use other LLMs, use the preset name from the KAITO repo. You should also make sure that the LLM can deploy on the VM SKU based on the matrix table in the "Model VM SKU Matrix" section.
105105

106106
```yaml
107+
apiVersion: kaito.sh/v1beta1
108+
kind: Workspace
109+
metadata:
110+
name: workspace-llm
111+
resource:
112+
instanceType: <GPU_VM_SKU> # Update this value with GPU VM SKU
113+
labelSelector:
114+
matchLabels:
115+
apps: llm-inference
116+
preferredNodes:
117+
- moc-l36c6vu97d5 # Update the value with GPU VM name
118+
inference:
119+
preset:
120+
name: phi-3.5-mini-instruct # Update preset name as needed
121+
config: "ds-inference-params"
122+
---
107123
apiVersion: v1
108124
kind: ConfigMap
109125
metadata:
@@ -116,7 +132,6 @@ To deploy the AI model, follow these steps:
116132
swap-space: 4
117133
gpu-memory-utilization: 0.9
118134
max-model-len: 4096
119-
# For more options, see https://docs.vllm.ai/en/latest/serving/engine_args.html.
120135
```
121136
122137
1. Apply the YAML and wait until the deployment completes. Make sure that internet connectivity is good so that the model can be downloaded from the Hugging Face website within a few minutes. When the inference workspace is successfully provisioned, both **ResourceReady** and **InferenceReady** become **True**. See the "Troubleshooting" section if you encounter any failures in the workspace deployment.
@@ -138,14 +153,26 @@ After the resource and inference states become ready, the inference service is e
138153
```bash
139154
export CLUSTERIP=$(kubectl get svc workspace-llm -o jsonpath="{.spec.clusterIPs[0]}")
140155

141-
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"hello how are you\"}"
156+
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions
157+
-H "Content-Type: application/json"
158+
-d '{
159+
"model": "phi-3.5-mini-instruct",
160+
"prompt": "What is kubernetes?",
161+
"max_tokens": 20,
162+
"temperature": 0
163+
}'
142164
```
143165

144166
```powershell
145167
$CLUSTERIP = $(kubectl get svc workspace-llm -o jsonpath="{.spec.clusterIPs[0]}" )
146-
$jsonContent = '{"prompt":"hello how are you"}'
147-
148-
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/chat -H "accept: application/json" -H "Content-Type: application/json" -d $jsonContent
168+
$jsonContent = '{
169+
"model": "phi-3.5-mini-instruct",
170+
"prompt": "What is kubernetes?",
171+
"max_tokens": 20,
172+
"temperature": 0
173+
}'
174+
175+
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$CLUSTERIP/v1/completions -H "accept: application/json" -H "Content-Type: application/json" -d $jsonContent
149176
```
150177

151178
## Clean up resources
@@ -178,7 +205,6 @@ The following table shows the supported GPU models and their corresponding VM SK
178205

179206
1. If you want to deploy an LLM and see the error **OutOfMemoryError: CUDA out of memory**, please raise an issue in the [KAITO repo](https://github.com/kaito-project/kaito/).
180207
1. If you see the error **(ExtensionOperationFailed) The extension operation failed with the following error: Unable to get a response from the Agent in time** during extension installation, [see this TSG](/troubleshoot/azure/azure-kubernetes/extensions/cluster-extension-deployment-errors#error-unable-to-get-a-response-from-the-agent-in-time) and ensure the extension agent in the AKS Arc cluster can connect to Azure.
181-
1. If you see an error such as **Unexpected error: (ExtensionOperationFailed) The extension operation failed with the following error: Error: [ InnerError: [Helm installation failed : Resource already existing in your cluster : Recommendation Manually delete the resource(s) that currently exist in your cluster and try installation again.**, it's possible that you previously enabled the KAITO extension on the cluster. Make sure to delete the KAITO namespace and try again.
182208
1. If you see an error during prompt testing such as **{"detail":[{"type":"json_invalid","loc":["body",1],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting property name enclosed in double quotes"}}]}**, it's possible that your PowerShell terminal version is 5.1. Make sure the terminal version is at least 7.4.
183209

184210
## Next steps

0 commit comments

Comments
 (0)