|
| 1 | +--- |
| 2 | +title: 'Tutorial: Deploy your first container app' |
| 3 | +description: Deploy a NVIDIA NIM to Azure Container Apps. |
| 4 | +services: container-apps |
| 5 | +author: craigshoemaker |
| 6 | +ms.service: azure-container-apps |
| 7 | +ms.topic: tutorial |
| 8 | +ms.date: 03/16/2025 |
| 9 | +ms.author: cachai |
| 10 | +ms.custom: mode-api, devx-track-azurecli, devx-track-azurepowershell |
| 11 | +ms.devlang: azurecli |
| 12 | +--- |
| 13 | + |
| 14 | +# Tutorial: Deploy an NVIDIA LLAMA3 NIM to Azure Container Apps |
| 15 | + |
| 16 | +NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models across various environments. By leveraging Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without managing the underlying infrastructure. |
| 17 | + |
| 18 | +In this tutorial, you'll deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs. |
| 19 | + |
| 20 | +## Prerequisites |
| 21 | + |
| 22 | +- An Azure account with an active subscription. |
| 23 | + - If you don't have one, you [can create one for free](https://azure.microsoft.com/free/). |
| 24 | +- Install the [Azure CLI](/cli/azure/install-azure-cli). |
| 25 | +- Have a NVIDIA NGC API Key. Obtain an API key from the [NVIDIA NGC website](https://catalog.ngc.nvidia.com). |
| 26 | + |
| 27 | +[!INCLUDE [container-apps-create-cli-steps.md](../../includes/container-apps-create-cli-steps.md)] |
| 28 | + |
| 29 | +[!INCLUDE [container-apps-set-environment-variables.md](../../includes/container-apps-set-environment-variables.md)] |
| 30 | + |
| 31 | +[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)] |
| 32 | + |
| 33 | +[!INCLUDE [container-apps-create-environment.md](../../includes/container-apps-create-environment.md)] |
| 34 | + |
| 35 | +## Initial setup |
| 36 | + |
| 37 | +1. Set up environment variables |
| 38 | + |
| 39 | +```bash |
| 40 | +RESOURCE_GROUP="my-resource-group" |
| 41 | +LOCATION="swedencentral" |
| 42 | +ACR_NAME="myacrname" |
| 43 | +CONTAINERAPPS_ENVIRONMENT="my-environment-name" |
| 44 | +CONTAINER_APP_NAME="llama3-nim" |
| 45 | +GPU_TYPE="Consumption-GPU-NC24-A100" |
| 46 | +``` |
| 47 | + |
| 48 | +1. Create an Azure resource group |
| 49 | + |
| 50 | +```azurecli |
| 51 | +az group create --name $RESOURCE_GROUP --location $LOCATION |
| 52 | +``` |
| 53 | + |
| 54 | +1. Create an Azure Container Registry (ACR) |
| 55 | + |
| 56 | +> [!NOTE] |
| 57 | +> This tutorial uses a premium Azure Contianer Registry as it is recommended when using serverless GPUs for improved cold start performance. If you do not wish to use a premium Azure Container Registry, modify the below command, so --sku is set to Basic. |
| 58 | +
|
| 59 | +```azurecli |
| 60 | +az acr create --resource-group $RESOURCE_GROUP --name $ACR_NAME --sku Premium --location $LOCATION |
| 61 | +``` |
| 62 | + |
| 63 | +## Pull the image from NGC and push to ACR |
| 64 | + |
| 65 | +> [!NOTE] |
| 66 | +> NVIDIA NICs each have their own hardware requirements. [Make sure the NIM](link) you select is supported by the GPU types available in Azure Container Apps. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs. |
| 67 | +
|
| 68 | +1. Authenticate with both the NVIDIA and azure container registries |
| 69 | + |
| 70 | +```bash |
| 71 | +docker login nvcr.io |
| 72 | +Username: $oauthtoken |
| 73 | +Password: <PASTE_API_KEY_HERE> |
| 74 | +``` |
| 75 | + |
| 76 | +```bash |
| 77 | +az acr login --name $ACR_NAME |
| 78 | +``` |
| 79 | + |
| 80 | +1. Pull the Llama3 NIM image and push it to your Azure Container Registry |
| 81 | + |
| 82 | +Pull the image |
| 83 | +```azurecli |
| 84 | +docker pull nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 |
| 85 | +``` |
| 86 | + |
| 87 | +Tag the image |
| 88 | +```azurecli |
| 89 | +docker tag nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0 |
| 90 | +``` |
| 91 | + |
| 92 | +Push the image |
| 93 | +```azurecli |
| 94 | +docker push $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0 |
| 95 | +``` |
| 96 | + |
| 97 | +## (Recommended: Optional) Enable artifact streaming. |
| 98 | + |
| 99 | +Many of the NIM images are large, and your container app may take a long time to start if you don't enable artifact streaming. To enable artifact streaming, follow these steps: |
| 100 | + |
| 101 | +```azurecli |
| 102 | +az acr artifact-streaming create --image jupyter/all-spark-notebook:latest |
| 103 | +``` |
| 104 | + |
| 105 | +```azurecli |
| 106 | +az acr artifact-streaming update --repository jupyter/all-spark-notebook --enable-streaming true |
| 107 | +``` |
| 108 | + |
| 109 | +```azurecli |
| 110 | +az acr artifact-streaming operation show --image jupyter/all-spark-notebook:newtag |
| 111 | +``` |
| 112 | + |
| 113 | +Note: Tis may take a few minutes. |
| 114 | + |
| 115 | +## Create your container app with the NGC API Key |
| 116 | + |
| 117 | +```azurecli |
| 118 | +az containerapp env create \ |
| 119 | + --name $CONTAINERAPPS_ENVIRONMENT \ |
| 120 | + --resource-group $RESOURCE_GROUP \ |
| 121 | + --location $LOCATION \ |
| 122 | + --workload-profiles enabled |
| 123 | +``` |
| 124 | + |
| 125 | +az containerapp env workload-profile add \ |
| 126 | + --resource-group $RESOURCE_GROUP \ |
| 127 | + --name $CONTAINERAPPS_ENVIRONMENT \ |
| 128 | + --workload-profile-type $GPU_TYPE \ |
| 129 | + --workload-profile-name <WORKLOAD_PROFILE_NAME> \ |
| 130 | + |
| 131 | +az containerapp secret set \ |
| 132 | + --name $CONTAINER_APP_NAME \ |
| 133 | + --resource-group $RESOURCE_GROUP \ |
| 134 | + --secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> |
| 135 | + |
| 136 | +```azurecli ///need add workload profile and verify |
| 137 | +az containerapp create \ |
| 138 | + --name $CONTAINER_APP_NAME \ |
| 139 | + --resource-group $RESOURCE_GROUP \ |
| 140 | + --environment $CONTAINERAPPS_ENVIRONMENT \ |
| 141 | + --image $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0 \ |
| 142 | + --cpu 24 \ |
| 143 | + --memory 220 \ |
| 144 | + --gpu "NvidiaA100" \ |
| 145 | + --secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \ |
| 146 | + --env-vars NGC_API_KEY=secretref:ngc-api-key \ |
| 147 | + --registry-server $ACR_NAME.azurecr.io \ |
| 148 | + --registry-username <ACR_USERNAME> \ |
| 149 | + --registry-password <ACR_PASSWORD> |
| 150 | +
|
| 151 | +## Test your NIM |
| 152 | +Once deployed, test the NIM by sending a request: |
| 153 | +```bash |
| 154 | +curl -X POST \ |
| 155 | + 'http://<YOUR_CONTAINER_APP_URL>/v1/completions' \ |
| 156 | + -H 'accept: application/json' \ |
| 157 | + -H 'Content-Type: application/json' \ |
| 158 | + -d '{ |
| 159 | + "model": "meta/llama3-8b-instruct", |
| 160 | + "prompt": "Once upon a time", |
| 161 | + "max_tokens": 64 |
| 162 | + }' |
| 163 | +``` |
| 164 | + |
| 165 | +## (Optional) Improving performance with volume mounts |
| 166 | +For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. This cache directory can be used to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article. |
| 167 | + |
| 168 | +## Clean up resources |
| 169 | + |
| 170 | +If you're not going to continue to use this application, run the following command to delete the resource group along with all the resources created in this tutorial. |
| 171 | + |
| 172 | +>[!CAUTION] |
| 173 | +> The following command deletes the specified resource group and all resources contained within it. If resources outside the scope of this tutorial exist in the specified resource group, they will also be deleted. |
| 174 | +
|
| 175 | +# [Bash](#tab/bash) |
| 176 | + |
| 177 | +```azurecli |
| 178 | +az group delete --name $RESOURCE_GROUP |
| 179 | +``` |
| 180 | + |
| 181 | +# [PowerShell](#tab/powershell) |
| 182 | + |
| 183 | +```azurepowershell |
| 184 | +Remove-AzResourceGroup -Name $ResourceGroupName -Force |
| 185 | +``` |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +> [!TIP] |
| 190 | +> Having issues? Let us know on GitHub by opening an issue in the [Azure Container Apps repo](https://github.com/microsoft/azure-container-apps). |
| 191 | +
|
| 192 | +## Next steps |
| 193 | + |
| 194 | +> [!div class="nextstepaction"] |
| 195 | +> [Communication between microservices](communicate-between-microservices.md) |
0 commit comments