You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AKS-Arc/deploy-ai-model.md
+17-16Lines changed: 17 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
4
4
author: sethmanheim
5
5
ms.author: sethm
6
6
ms.topic: how-to
7
-
ms.date: 05/07/2025
7
+
ms.date: 05/09/2025
8
8
ms.reviewer: haojiehang
9
-
ms.lastreviewed: 05/07/2025
9
+
ms.lastreviewed: 05/09/2025
10
10
11
11
---
12
12
@@ -33,7 +33,9 @@ Before you begin, make sure you have the following prerequisites:
33
33
- Install the **aksarc** extension, and make sure the version is at least 1.5.37. To get the list of installed CLI extensions, run `az extension list -o table`.
34
34
- If you use a Powershell terminal, make sure the version is at least 7.4.
35
35
36
-
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
36
+
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace here](https://huggingface.co/content-policy).
37
+
38
+
The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
37
39
38
40
## Create a cluster with KAITO
39
41
@@ -102,20 +104,19 @@ To deploy the AI model, follow these steps:
102
104
1. Create a YAML file with the following sample file. In this example, we use the Phi 3.5 Mini model by specifying the preset name as **phi-3.5-mini-instruct**. If you want to use other LLMs, use the preset name from the KAITO repo. You should also make sure that the LLM can deploy on the VM SKU based on the matrix table in the "Model VM SKU Matrix" section.
103
105
104
106
```yaml
105
-
apiVersion: kaito.sh/v1beta1
106
-
kind: Workspace
107
+
apiVersion: v1
108
+
kind: ConfigMap
107
109
metadata:
108
-
name: workspace-llm # Update the workspace name as needed
109
-
resource:
110
-
instanceType: <GPU_VM_SKU> # Update this value with GPU VM SKU
111
-
labelSelector:
112
-
matchLabels:
113
-
apps: llm-inference # Update the label as needed
114
-
preferredNodes:
115
-
- moc-l36c6vu97d5 # Update the value with GPU VM name
116
-
inference:
117
-
preset:
118
-
name: phi-3.5-mini-instruct # Update preset name as needed
110
+
name: ds-inference-params
111
+
data:
112
+
inference_config.yaml: |
113
+
max_probe_steps: 6 # Maximum number of steps to find the max available seq len fitting in the GPU memory.
114
+
vllm:
115
+
cpu-offload-gb: 0
116
+
swap-space: 4
117
+
gpu-memory-utilization: 0.9
118
+
max-model-len: 4096
119
+
# For more options, see https://docs.vllm.ai/en/latest/serving/engine_args.html.
119
120
```
120
121
121
122
1. Apply the YAML and wait until the deployment completes. Make sure that internet connectivity is good so that the model can be downloaded from the Hugging Face website within a few minutes. When the inference workspace is successfully provisioned, both **ResourceReady** and **InferenceReady** become **True**. See the "Troubleshooting" section if you encounter any failures in the workspace deployment.
0 commit comments