Skip to content

Commit 99c9f30

Browse files
committed
Incorp some more feedback
1 parent 9388f27 commit 99c9f30

File tree

1 file changed

+17
-16
lines changed

1 file changed

+17
-16
lines changed

AKS-Arc/deploy-ai-model.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
44
author: sethmanheim
55
ms.author: sethm
66
ms.topic: how-to
7-
ms.date: 05/07/2025
7+
ms.date: 05/09/2025
88
ms.reviewer: haojiehang
9-
ms.lastreviewed: 05/07/2025
9+
ms.lastreviewed: 05/09/2025
1010

1111
---
1212

@@ -33,7 +33,9 @@ Before you begin, make sure you have the following prerequisites:
3333
- Install the **aksarc** extension, and make sure the version is at least 1.5.37. To get the list of installed CLI extensions, run `az extension list -o table`.
3434
- If you use a Powershell terminal, make sure the version is at least 7.4.
3535

36-
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
36+
For all hosted model preset images and default resource configuration, see the [KAITO GitHub repository](https://github.com/kaito-project/kaito/tree/main/presets). All the preset models are originally from HuggingFace, and we do not change the model behavior during the redistribution. See the [content policy from HuggingFace here](https://huggingface.co/content-policy).
37+
38+
The AI toolchain operator extension currently supports KAITO version 0.4.5. Make a note of this in considering your choice of model from the KAITO model repository.
3739

3840
## Create a cluster with KAITO
3941

@@ -102,20 +104,19 @@ To deploy the AI model, follow these steps:
102104
1. Create a YAML file with the following sample file. In this example, we use the Phi 3.5 Mini model by specifying the preset name as **phi-3.5-mini-instruct**. If you want to use other LLMs, use the preset name from the KAITO repo. You should also make sure that the LLM can deploy on the VM SKU based on the matrix table in the "Model VM SKU Matrix" section.
103105

104106
```yaml
105-
apiVersion: kaito.sh/v1beta1
106-
kind: Workspace
107+
apiVersion: v1
108+
kind: ConfigMap
107109
metadata:
108-
name: workspace-llm # Update the workspace name as needed
109-
resource:
110-
instanceType: <GPU_VM_SKU> # Update this value with GPU VM SKU
111-
labelSelector:
112-
matchLabels:
113-
apps: llm-inference # Update the label as needed
114-
preferredNodes:
115-
- moc-l36c6vu97d5 # Update the value with GPU VM name
116-
inference:
117-
preset:
118-
name: phi-3.5-mini-instruct # Update preset name as needed
110+
name: ds-inference-params
111+
data:
112+
inference_config.yaml: |
113+
max_probe_steps: 6 # Maximum number of steps to find the max available seq len fitting in the GPU memory.
114+
vllm:
115+
cpu-offload-gb: 0
116+
swap-space: 4
117+
gpu-memory-utilization: 0.9
118+
max-model-len: 4096
119+
# For more options, see https://docs.vllm.ai/en/latest/serving/engine_args.html.
119120
```
120121
121122
1. Apply the YAML and wait until the deployment completes. Make sure that internet connectivity is good so that the model can be downloaded from the Hugging Face website within a few minutes. When the inference workspace is successfully provisioned, both **ResourceReady** and **InferenceReady** become **True**. See the "Troubleshooting" section if you encounter any failures in the workspace deployment.

0 commit comments

Comments
 (0)