MicrosoftDocs
diff --git a/‎AKS-Hybrid/deploy-ai-model.md
Lines changed: 37 additions & 40 deletions b/‎AKS-Hybrid/deploy-ai-model.md
Lines changed: 37 additions & 40 deletions
@@ -4,7 +4,9 @@ description: Learn how to deploy an AI model on AKS Arc with the Kubernetes AI t
 author: sethmanheim
 ms.author: sethm
 ms.topic: how-to
-ms.date: 12/03/2024
+ms.date: 12/06/2024
+ms.reviewer: haojiehang
+ms.lastreviewed: 12/03/2024
 
 ---
 
@@ -14,13 +16,11 @@ ms.date: 12/03/2024
 
 This article describes how to deploy an AI model on AKS Arc with the Kubernetes AI toolchain operator (KAITO). The AI toolchain operator (KAITO) is an add-on for AKS Arc, and it simplifies the experience of running OSS AI models on your AKS Arc clusters. To enable this feature, follow this workflow:
 
-1. Create a node pool with GPU.
-1. Deploy KAITO operator.
-1. Deploy AI model.
+1. Deploy KAITO on an existing cluster.
+1. Add a GPU node pool.
+1. Deploy the AI model.
 1. Validate the model deployment.
 
-The following deployment instructions are also available in [the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/How-to-use-kaito-in-aks-arc.md).
-
 > [!IMPORTANT]
 > These preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. Azure Kubernetes Service, enabled by Azure Arc previews are partially covered by customer support on a best-effort basis.
 
@@ -31,42 +31,52 @@ Before you begin, make sure you have the following prerequisites:
 1. The following details from your infrastructure administrator:
 
    - An AKS Arc cluster that's up and running. For more information, see [Create Kubernetes clusters using Azure CLI](aks-create-clusters-cli.md).
+   - Make sure that the AKS Arc cluster runs on the Azure Local cluster with a supported GPU model. Before you create the node pool, you must also identify the correct GPU VM SKUs based on the model. For more information, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
    - We recommend using a computer running Linux for this feature.
-   - Your local **kubectl** environment configured to point to your AKS Arc cluster.
-     - Run `az connectedk8s proxy` to connect to your AKS Arc cluster from your development machine.
+   - Use `az connectedk8s proxy` to connect to your AKS Arc cluster.
 
-1. Make sure your AKS Arc cluster is enabled with GPUs. You can ask your infrastructure administrator to set it up for you. You must also identify the right VM SKUs for your AKS Arc cluster before you create the node pool. For instructions, see [use GPU for compute-intensive workloads](deploy-gpu-node-pool.md).
 1. Make sure that **helm** and **kubectl** are installed on your local machine.
 
    - If you need to install or upgrade, see [Install Helm](https://helm.sh/docs/intro/install/).
-   - If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
+   - If you need to install **kubectl**, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).  
+
+## Deploy KAITO from GitHub
+
+You must have a running AKS Arc cluster with a default node pool. To deploy the KAITO operator, follow these steps:
+
+1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
+1. Install the KAITO operator using the following command:
+
+   ```bash
+   helm install workspace ./charts/kaito/workspace --namespace kaito-workspace --create-namespace
+   ```
 
-## Create a GPU node pool
+## Add a GPU node pool
 
-To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
+Before you add a GPU node, make sure that Azure Local is enabled with a supported GPU model, and that the GPU drivers are installed on all the host nodes. To create a GPU node pool using the Azure portal or Azure CLI, follow these steps:
 
 ### [Azure portal](#tab/portal)
 
 To create a GPU node pool using the Azure portal, follow these steps:
 
 1. Sign in to the Azure portal and find your AKS Arc cluster.
-1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux. Fill in the other required fields and create the node pool resource.
+1. Under **Settings** and **Node pools**, select **Add**. During the preview, we only support Linux nodes. Fill in the other required fields and create the node pool.
 
    :::image type="content" source="media/deploy-ai-model/nodepools-portal.png" alt-text="Screenshot of node pools portal page." lightbox="media/deploy-ai-model/nodepools-portal.png":::
 
 ### [Azure CLI](#tab/azurecli)
 
-To create a GPU node pool using the Azure CLI, run the following command. The GPU VM SKU used in the following example is for A16; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
+To create a GPU node pool using Azure CLI, run the following command. The GPU VM SKU used in the following example is for the **A16** model; for the full list of VM SKUs, see [Supported VM sizes](deploy-gpu-node-pool.md#supported-vm-sizes).
 
 ```azurecli
-az aksarc nodepool add --name "samplenodepool" --cluster-name "samplecluster" --resource-group "sample-rg" --node-vm-size "samplenodepoolsize" --os-type "Linux"
+az aksarc nodepool add --name "samplenodepool" --cluster-name "samplecluster" --resource-group "sample-rg" --node-vm-size "Standard_NC16_A16" --os-type "Linux"
 ```
 
 ---
 
-### Validate the GPU node pool
+### Validate the node pool deployment
 
-After the node pool creation command succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**:
+After the node pool creation succeeds, you can confirm whether the GPU node is provisioned using `kubectl get nodes`. In the following example, the GPU node is **moc-l1i9uh0ksne**. The other node is from the default node pool that was created during the cluster creation:
 
 ```bash
 kubectl get nodes
@@ -103,25 +113,14 @@ capacity:
   ephemeral-storage: 103110508Ki
 ```
 
-## Deploy KAITO operator from GitHub
-
-To deploy the KAITO operator, follow these steps:
-
-1. Clone the [KAITO repo](https://github.com/Azure/kaito.git) to your local machine.
-1. Install the KAITO operator using the following command:
-
-   ```bash
-   helm install workspace ./charts/kaito/workspace --namespace kaito-workspace --create-namespace
-   ```
-
 ## Deploy the AI model
 
 To deploy the AI model, follow these steps:
 
-1. Create a YAML file with the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
+1. Create a YAML file using the following template. KAITO supports popular OSS models such as Falcon, Phi3, Llama2, and Mistral. This list might increase over time.
 
-   - The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml), in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
-   - We recommend using `labelSelector` and `preferredNodes` to select the GPU nodes. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want; the next steps shows the node labeling command.
+   - The **PresetName** is used to specify which model to deploy, and you can find its value in the [supported model file](https://github.com/Azure/kaito/blob/main/presets/models/supported_models.yaml) in the GitHub repo. In the following example, `falcon-7b-instruct` is used for the model deployment.
+   - We recommend using `labelSelector` and `preferredNodes` to explicitly select the GPU node by name. In the following example, `app: llm-inference` is used for the GPU node `moc-le4aoguwyd9`. You can choose any node label you want, as long as the labels match. The next step shows how to label the node.
 
    ```yaml
    apiVersion: kaito.sh/v1alpha1
@@ -131,7 +130,7 @@ To deploy the AI model, follow these steps:
    resource:
     labelSelector:
       matchLabels:
-        app: llm-inference
+        apps: llm-inference
     preferredNodes:
     - moc-le4aoguwyd9
    inference:
@@ -145,7 +144,7 @@ To deploy the AI model, follow these steps:
    kubectl label node moc-le4aoguwyd9 app=llm-inference
    ```
 
-1. Apply the YAML file and wait until the workplace deployment is completed:
+1. Apply the YAML file and wait until the workplace deployment completes:
 
    ```bash
    kubectl apply -f sampleyamlfile.yaml
@@ -155,7 +154,7 @@ To deploy the AI model, follow these steps:
 
 To validate the model deployment, follow these steps:
 
-1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the sample prompt.
+1. Validate the workspace using the `kubectl get workspace` command. Also make sure that both the `ResourceReady` and `InferenceReady` fields are set to **True** before testing with the prompt:
 
    ```bash
    kubectl get workspace
@@ -165,10 +164,10 @@ To validate the model deployment, follow these steps:
 
    ```output
    NAME                 INSTANCE               RESOURCEREADY   INFERENCEREADY   JOBSTARTED   WORKSPACESUCCEEDED   AGE
-   workspace-falcon-7b  Standard_NC12s_v3      True            True                          True                 18h
+   workspace-falcon-7b  Standard_NC16_A16      True            True                          True                 18h
    ```
 
-1. In the previous example, the inference service **workspace-falcon-7b** is exposed internally and can be accessed with the cluster IP. You can test the model with the following sample prompt. For more information about features in the Kaito inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
+1. After the resource and inference is ready, the **workspace-falcon-7b** inference service is exposed internally and can be accessed with a cluster IP. You can test the model with the following prompt. For more information about features in the KAITO inference, see the [instructions in the KAITO repo](https://github.com/kaito-project/kaito/blob/main/docs/inference/README.md#inference-workload).
 
    ```bash
    export CLUSTERIP=$(kubectl get svc workspace-falcon-7b -o jsonpath="{.spec.clusterIPs[0]}") 
@@ -195,14 +194,12 @@ To validate the model deployment, follow these steps:
 
 ## Troubleshooting
 
-If the pod does not get deployed, or **ResourceReady** is empty or **false** when **kubectl** retrieves workspaces, it's usually because the preferred node isn't labeled correctly. Check the node label by running `kubectl get node <yourNodeName> --show-labels`.
-
-For example, in your YAML file, the following code specifies that the node must have the label `apps=falcon-7b`:
+If the pod is not deployed properly or the **ResourceReady** field shows empty or **false**, it's usually because the preferred GPU node isn't labeled correctly. Check the node label with `kubectl get node <yourNodeName> --show-labels`. For example, in the YAML file, the following code specifies that the node must have the label `apps=llm-inference`:
 
 ```yaml
 labelSelector:
   matchLabels:
-    apps: falcon-7b
+    apps: llm-inference
 ```
 
 ## Next steps