|
| 1 | +--- |
| 2 | +title: Deploy an AI model on Azure Kubernetes Service (AKS) with the AI toolchain operator (preview) |
| 3 | +description: Learn how to enable the AI toolchain operator add-on on Azure Kubernetes Service (AKS) to simplify OSS AI model management and deployment. |
| 4 | +ms.topic: article |
| 5 | +ms.custom: azure-kubernetes-service, devx-track-azurecli |
| 6 | +ms.date: 02/28/2024 |
| 7 | +author: schaffererin |
| 8 | +ms.author: schaffererin |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Deploy an AI model on Azure Kubernetes Service (AKS) with the AI toolchain operator (preview) |
| 13 | + |
| 14 | +The AI toolchain operator (KAITO) is a managed add-on for AKS that simplifies the experience of running OSS AI models on your AKS clusters. The AI toolchain operator automatically provisions the necessary GPU nodes and sets up the associated inference server as an endpoint server to your AI models. Using this add-on reduces your onboarding time and enables you to focus on AI model usage and development rather than infrastructure setup. |
| 15 | + |
| 16 | +This article shows you how to enable the AI toolchain operator add-on and deploy an AI model on AKS. |
| 17 | + |
| 18 | +[!INCLUDE [preview features callout](~/reusable-content/ce-skilling/azure/includes/aks/includes/preview/preview-callout.md)] |
| 19 | + |
| 20 | +## Before you begin |
| 21 | + |
| 22 | +* This article assumes a basic understanding of Kubernetes concepts. For more information, see [Kubernetes core concepts for AKS](./concepts-clusters-workloads.md). |
| 23 | +* For ***all hosted model inference images*** and recommended infrastructure setup, see the [KAITO GitHub repository](https://github.com/Azure/kaito). |
| 24 | +* The AI toolchain operator add-on currently supports KAITO version **v0.1.0**, please make a note of this in considering your choice of model from the KAITO model repository. |
| 25 | + |
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +* If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin. |
| 29 | + * If you have multiple Azure subscriptions, make sure you select the correct subscription in which the resources will be created and charged using the [az account set](https://learn.microsoft.com/en-us/cli/azure/account?view=azure-cli-latest#az-account-set) command. |
| 30 | + |
| 31 | + > [!NOTE] |
| 32 | + > The subscription you use must have GPU VM quota. |
| 33 | +
|
| 34 | +* Azure CLI version 2.47.0 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI](/cli/azure/install-azure-cli). |
| 35 | +* The Kubernetes command-line client, kubectl, installed and configured. For more information, see [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/). |
| 36 | +* [Install the Azure CLI AKS preview extension](#install-the-azure-cli-preview-extension). |
| 37 | +* [Register the AI toolchain operator add-on feature flag](#register-the-ai-toolchain-operator-add-on-feature-flag). |
| 38 | + |
| 39 | +## Set up resource group |
| 40 | + |
| 41 | +Set up a resource group with a random ID. Create an Azure resource group using the [az group create](https://learn.microsoft.com/en-us/cli/azure/group?view=azure-cli-latest#az-group-create) command. |
| 42 | + |
| 43 | +```bash |
| 44 | +export RANDOM_ID="$(openssl rand -hex 3)" |
| 45 | +export AZURE_RESOURCE_GROUP="myKaitoResourceGroup$RANDOM_ID" |
| 46 | +export REGION="centralus" |
| 47 | +export CLUSTER_NAME="myClusterName$RANDOM_ID" |
| 48 | + |
| 49 | +az group create \ |
| 50 | + --name $AZURE_RESOURCE_GROUP \ |
| 51 | + --location $REGION \ |
| 52 | +``` |
| 53 | + |
| 54 | +## Install the Azure CLI preview extension |
| 55 | + |
| 56 | +Install the Azure CLI preview extension using the [az extension add](https://learn.microsoft.com/en-us/cli/azure/extension?view=azure-cli-latest#az-extension-add) command. Then update the extension to make sure you have the latest version using the [az extension update](https://learn.microsoft.com/en-us/cli/azure/extension?view=azure-cli-latest#az-extension-update) command. |
| 57 | + |
| 58 | +```bash |
| 59 | +az extension add --name aks-preview |
| 60 | +az extension update --name aks-preview |
| 61 | +``` |
| 62 | + |
| 63 | +## Register the AI toolchain operator add-on feature flag |
| 64 | + |
| 65 | +Register the AIToolchainOperatorPreview feature flag using the az feature register command. |
| 66 | +It takes a few minutes for the registration to complete. |
| 67 | + |
| 68 | +```bash |
| 69 | +az feature register --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview" |
| 70 | +``` |
| 71 | + |
| 72 | +## Verify the AI toolchain operator add-on registration |
| 73 | + |
| 74 | +Verify the registration using the [az feature show](https://learn.microsoft.com/en-us/cli/azure/feature?view=azure-cli-latest#az-feature-show) command. |
| 75 | + |
| 76 | +```bash |
| 77 | +while true; do |
| 78 | + status=$(az feature show --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview" --query "properties.state" -o tsv) |
| 79 | + if [ "$status" == "Registered" ]; then |
| 80 | + break |
| 81 | + else |
| 82 | + sleep 15 |
| 83 | + fi |
| 84 | +done |
| 85 | +``` |
| 86 | + |
| 87 | +## Create an AKS cluster with the AI toolchain operator add-on enabled |
| 88 | + |
| 89 | +Create an AKS cluster with the AI toolchain operator add-on enabled using the [az aks create](https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-create) command with the `--enable-ai-toolchain-operator` and `--enable-oidc-issuer` flags. |
| 90 | + |
| 91 | +```bash |
| 92 | +az aks create --location ${REGION} \ |
| 93 | + --resource-group ${AZURE_RESOURCE_GROUP} \ |
| 94 | + --name ${CLUSTER_NAME} \ |
| 95 | + --enable-oidc-issuer \ |
| 96 | + --node-os-upgrade-channel SecurityPatch \ |
| 97 | + --auto-upgrade-channel stable \ |
| 98 | + --enable-ai-toolchain-operator \ |
| 99 | + --generate-ssh-keys \ |
| 100 | + --k8s-support-plan KubernetesOfficial |
| 101 | +``` |
| 102 | + |
| 103 | +## Connect to your cluster |
| 104 | + |
| 105 | +Configure `kubectl` to connect to your cluster using the [az aks get-credentials](https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials) command. |
| 106 | + |
| 107 | +```bash |
| 108 | +az aks get-credentials --resource-group ${AZURE_RESOURCE_GROUP} --name ${CLUSTER_NAME} |
| 109 | +``` |
| 110 | + |
| 111 | +## Establish a federated identity credential |
| 112 | + |
| 113 | +Create the federated identity credential between the managed identity, AKS OIDC issuer, and subject using the [az identity federated-credential create](https://learn.microsoft.com/en-us/cli/azure/identity/federated-credential?view=azure-cli-latest) command. |
| 114 | + |
| 115 | +```bash |
| 116 | +export MC_RESOURCE_GROUP=$(az aks show --resource-group ${AZURE_RESOURCE_GROUP} \ |
| 117 | + --name ${CLUSTER_NAME} \ |
| 118 | + --query nodeResourceGroup \ |
| 119 | + -o tsv) |
| 120 | +export KAITO_IDENTITY_NAME="ai-toolchain-operator-${CLUSTER_NAME}" |
| 121 | +export AKS_OIDC_ISSUER=$(az aks show --resource-group "${AZURE_RESOURCE_GROUP}" \ |
| 122 | + --name "${CLUSTER_NAME}" \ |
| 123 | + --query "oidcIssuerProfile.issuerUrl" \ |
| 124 | + -o tsv) |
| 125 | + |
| 126 | +az identity federated-credential create --name "kaito-federated-identity" \ |
| 127 | + --identity-name "${KAITO_IDENTITY_NAME}" \ |
| 128 | + -g "${MC_RESOURCE_GROUP}" \ |
| 129 | + --issuer "${AKS_OIDC_ISSUER}" \ |
| 130 | + --subject system:serviceaccount:"kube-system:kaito-gpu-provisioner" \ |
| 131 | + --audience api://AzureADTokenExchange |
| 132 | +``` |
| 133 | + |
| 134 | +## Verify that your deployment is running |
| 135 | + |
| 136 | +Restart the KAITO GPU provisioner deployment on your pods using the `kubectl rollout restart` command: |
| 137 | + |
| 138 | +```bash |
| 139 | +kubectl rollout restart deployment/kaito-gpu-provisioner -n kube-system |
| 140 | +``` |
| 141 | + |
| 142 | +## Deploy a default hosted AI model |
| 143 | + |
| 144 | +Deploy the Falcon 7B-instruct model from the KAITO model repository using the `kubectl apply` command. |
| 145 | + |
| 146 | +```bash |
| 147 | +kubectl apply -f https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_falcon_7b-instruct.yaml |
| 148 | +``` |
| 149 | + |
| 150 | +## Ask a question |
| 151 | + |
| 152 | +Verify deployment done: `kubectl get workspace workspace-falcon-7b-instruct -w`. |
| 153 | +Store IP: `export SERVICE_IP=$(kubectl get svc workspace-falcon-7b-instruct -o jsonpath='{.spec.clusterIP}')`. |
| 154 | +Ask question: `kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$SERVICE_IP/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"YOUR QUESTION HERE\"}"` |
| 155 | + |
| 156 | +```bash |
| 157 | +echo "See last step for details on how to ask questions to the model." |
| 158 | +``` |
| 159 | + |
| 160 | +## Next steps |
| 161 | + |
| 162 | +For more inference model options, see the [KAITO GitHub repository](https://github.com/Azure/kaito). |
0 commit comments