Merge pull request #2 from craigshoemaker/aca/cary/gpu-tutorial

cachai2 · web-flow · commit f694186ba0a8 · 2025-03-17T14:01:30.000-07:00
[Container Apps] Update: NIM GPU tutorial
diff --git a/articles/container-apps/TOC.yml b/articles/container-apps/TOC.yml
@@ -280,6 +280,8 @@
       items:
       - name: Generate images with serverless GPUs
         href: gpu-image-generation.md
+      - name: Deploy an NVIDIA LLAMA3 NIM
+        href: tutorial-gpu-with-serverless-gpu.md
 - name: Microservices
   items:
     - name: Developing with Dapr
diff --git a/articles/container-apps/gpu-serverless-overview.md b/articles/container-apps/gpu-serverless-overview.md
@@ -7,15 +7,15 @@ ms.service: azure-container-apps
 ms.custom:
   - ignite-2024
 ms.topic: how-to
-ms.date: 11/06/2024
+ms.date: 03/17/2025
 ms.author: cshoe
 ---
 
 # Using serverless GPUs in Azure Container Apps (preview)
 
 Azure Container Apps provides access to GPUs on-demand without you having to manage the underlying infrastructure. As a serverless feature, you only pay for GPUs in use. When enabled, the number of GPUs used for your app rises and falls to meet the load demands of your application. Serverless GPUs enable you to seamlessly run your workloads with automatic scaling, optimized cold start, per-second billing with scale down to zero when not in use, and reduced operational overhead. 
 
-Serverless GPUs are only supported for Consumption workload profiles. The feature is not supported for Consumption-only environments.
+Serverless GPUs are only supported for Consumption workload profiles. The feature isn't supported for Consumption-only environments.
 
 > [!NOTE]
 > Access to GPUs is only available after you request GPU quotas. You can submit your GPU quota request via a [customer support case](/azure/azure-portal/supportability/how-to-create-azure-support-request).
@@ -93,7 +93,7 @@ Serverless GPUs are run on consumption GPU workload profiles. You manage a consu
 
 ## Improve GPU cold start
 
-You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry. For more details, see [enable artifact streaming](./https://learn.microsoft.com/en-us/azure/container-registry/container-registry-artifact-streaming?pivots=development-environment-azure-cli#pushimport-the-image-and-generate-the-streaming-artifact----azure-cli).
+You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry. For more information, see [enable artifact streaming](/azure/container-registry/container-registry-artifact-streaming?pivots=development-environment-azure-cli).
 
 > [!NOTE]
 > To use artifact streaming, your container images must be hosted in a premium Azure Container Registry.
diff --git a/articles/container-apps/tutorial-gpu-with-serverless-gpu.md b/articles/container-apps/tutorial-gpu-with-serverless-gpu.md
@@ -7,149 +7,183 @@ ms.service: azure-container-apps
 ms.topic: tutorial
 ms.date: 03/16/2025
 ms.author: cachai
-ms.custom: mode-api, devx-track-azurecli, devx-track-azurepowershell
 ms.devlang: azurecli
 ---
 
 # Tutorial: Deploy an NVIDIA LLAMA3 NIM to Azure Container Apps
 
-NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models across various environments. By leveraging Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without managing the underlying infrastructure.​
+NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models. When you use Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without having to manage the underlying infrastructure.​
 
-In this tutorial, you'll deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs.
+In this tutorial, you learn to deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs.
+
+This tutorial uses a premium instance of Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, make sure to modify the `az acr create` command in this tutorial to set `--sku` to `basic`.
 
 ## Prerequisites
 
-- An Azure account with an active subscription.
-  - If you don't have one, you [can create one for free](https://azure.microsoft.com/free/).
-- Install the [Azure CLI](/cli/azure/install-azure-cli).
-- Have a NVIDIA NGC API Key. Obtain an API key from the [NVIDIA NGC website](https://catalog.ngc.nvidia.com).
+| Resource | Description |
+|---|---|
+| Azure account | An Azure account with an active subscription.<br><br>If you don't have one, you [can create one for free](https://azure.microsoft.com/free/). |
+| Azure CLI | Install the [Azure CLI](/cli/azure/install-azure-cli). |
+| NVIDIA NGC API key | You can get an API key from the [NVIDIA GPU Cloud (NGC) website](https://catalog.ngc.nvidia.com). |
 
 [!INCLUDE [container-apps-create-cli-steps.md](../../includes/container-apps-create-cli-steps.md)]
 
-[!INCLUDE [container-apps-set-environment-variables.md](../../includes/container-apps-set-environment-variables.md)]
+1. Set up environment variables by naming the resource group and setting the location.
 
-[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
+    ```bash
+    RESOURCE_GROUP="my-resource-group"
+    LOCATION="swedencentral"
+    ```
 
-[!INCLUDE [container-apps-create-environment.md](../../includes/container-apps-create-environment.md)]
+    Next, generate a unique container registry name.
 
-## Initial setup
+    ```bash
+    SUFFIX=$(head /dev/urandom | tr -dc 'A-Za-z0-9' | head -c 6)
+    ACR_NAME="mygpututorialacr${SUFFIX}"
+    ```
 
-1. Set up environment variables
+    Finally, set variables to name the environment and identify the environment, workload profile type, container app name, and container.
 
-```bash
-RESOURCE_GROUP="my-resource-group"
-LOCATION="swedencentral"
-ACR_NAME="myacrname"
-CONTAINERAPPS_ENVIRONMENT="my-environment-name"
-CONTAINER_APP_NAME="llama3-nim"
-GPU_TYPE="Consumption-GPU-NC24-A100"
-```
+    ```bash
+    CONTAINERAPPS_ENVIRONMENT="my-environment-name"
+    GPU_TYPE="Consumption-GPU-NC24-A100"
+    CONTAINER_APP_NAME="llama3-nim"
+    CONTAINER_AND_TAG="llama3-8b-instruct:1.0.0"
+    ```
 
-1. Create an Azure resource group
+[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
 
-```azurecli
-az group create --name $RESOURCE_GROUP --location $LOCATION
-```
+[!INCLUDE [container-apps-create-environment.md](../../includes/container-apps-create-environment.md)]
 
 1. Create an Azure Container Registry (ACR)
 
+    > [!NOTE]
+    > This tutorial uses a premium Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, modify the following command and set `--sku` to `basic`.
+
+    ```azurecli
+    az acr create \
+      --resource-group $RESOURCE_GROUP \
+      --name $ACR_NAME \
+      --location $LOCATION \
+      --sku premium
+    ```
+
+## Pull, tag, and push your image
+
+Next, pull the image from NVIDIA GPU Cloud and push to Azure Container Registry.
+
 > [!NOTE]
-> This tutorial uses a premium Azure Contianer Registry as it is recommended when using serverless GPUs for improved cold start performance. If you do not wish to use a premium Azure Container Registry, modify the below command, so --sku is set to Basic.
+> NVIDIA NICs each has their own hardware requirements. Make sure the GPU type you select supports the [NIM](link) of your choice. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs.
 
-```azurecli
-az acr create --resource-group $RESOURCE_GROUP --name $ACR_NAME --sku Premium --location $LOCATION
-```
+1. Authenticate to the NVIDIA container registry.
+
+    ```bash
+    docker login nvcr.io
+    ```
+
+    After you run this command, the sign in process prompts you to enter a username. Enter **$oauthtoken** for your user name value.
+
+    Then you're prompted for a password. Enter your NVIDIA NGC API key here. Once authenticated to the NVIDIA registry, you can authenticate to the Azure registry.
+
+1. Authenticate to Azure Container Registry.
+
+    ```bash
+    az acr login --name $ACR_NAME
+    ```
+
+1. Pull the Llama3 NIM image.
+
+    ```azurecli
+    docker pull nvcr.io/nim/meta/$CONTAINER_AND_TAG
+    ```
+
+1. Tag the image.
+
+    ```azurecli
+    docker tag nvcr.io/nim/meta/$CONTAINER_AND_TAG $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
+    ```
+
+1. Push the image to Azure Container Registry.
+
+    ```azurecli
+    docker push $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
+    ```
+
+## Enable artifact streaming (recommended but optional)
 
-## Pull the image from NGC and push to ACR
+Many of the NIM images are large, and your container app can take a long time to start if you don't enable artifact streaming. Use the following steps to enable artifact streaming.
 
 > [!NOTE]
-> NVIDIA NICs each have their own hardware requirements. [Make sure the NIM](link) you select is supported by the GPU types available in Azure Container Apps. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs.
+> The following commands can take a few minutes to complete.
 
-1. Authenticate with both the NVIDIA and azure container registries
+1. Enable artifact streaming on your container registry.
 
-```bash
-docker login nvcr.io
-Username: $oauthtoken
-Password: <PASTE_API_KEY_HERE>
-```
+    ```azurecli
+    az acr artifact-streaming update \
+        --name $ACR_NAME \
+        --repository llama31_8b_ins \
+        --enable-streaming True
+    ```
 
-```bash
-az acr login --name $ACR_NAME
-```
+1. Enable artifact streaming on the container image.
 
-1. Pull the Llama3 NIM image and push it to your Azure Container Registry
+    ```azurecli
+    az acr artifact-streaming create \
+      --name $ACR_NAME \
+      --image llama31_8b_ins:latest
+    ```
 
-Pull the image
-```azurecli
-docker pull nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
-```
+## Create your container app
 
-Tag the image
-```azurecli
-docker tag nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0
-```
+Next you create a container app with the NVIDIA GPU Cloud API key.
 
-Push the image
-```azurecli
-docker push $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0
-```
+1. Create the container app.
 
-## (Recommended: Optional) Enable artifact streaming.
+    ```azurecli
+    az containerapp env create \
+      --name $CONTAINERAPPS_ENVIRONMENT \
+      --resource-group $RESOURCE_GROUP \
+      --location $LOCATION \
+      --enable-workload-profiles
+    ```
 
-Many of the NIM images are large, and your container app may take a long time to start if you don't enable artifact streaming. To enable artifact streaming, follow these steps:
+1. Add the GPU workload profile to your environment.
 
-```azurecli
-az acr artifact-streaming create --image jupyter/all-spark-notebook:latest
-```
+    ```azurecli
+    az containerapp env workload-profile add \
+        --resource-group $RESOURCE_GROUP \
+        --name $CONTAINERAPPS_ENVIRONMENT \
+        --workload-profile-type $GPU_TYPE \
+        --workload-profile-name LLAMA_PROFILE
+    ```
 
-```azurecli
-az acr artifact-streaming update --repository jupyter/all-spark-notebook --enable-streaming true
-```
+1. Create the container app.
 
-```azurecli
-az acr artifact-streaming operation show --image jupyter/all-spark-notebook:newtag
-```
+    ```azurecli
+    az containerapp create \
+      --name $CONTAINER_APP_NAME \
+      --resource-group $RESOURCE_GROUP \
+      --environment $CONTAINERAPPS_ENVIRONMENT \
+      --image $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG \
+      --cpu 24 \
+      --memory 220 \
+      --gpu "NVIDIAA100" \
+      --secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \
+      --env-vars NGC_API_KEY=secretref:ngc-api-key \
+      --registry-server $ACR_NAME.azurecr.io \
+      --registry-username <ACR_USERNAME> \
+      --registry-password <ACR_PASSWORD> \
+      --query properties.configuration.ingress.fqdn
+    ```
 
-Note: Tis may take a few minutes.
+    This command returns the URL of your container app. Set this value aside in a text editor for use in a following command.
 
-## Create your container app with the NGC API Key
+## Verify the application works
 
-```azurecli
-az containerapp env create \
-  --name $CONTAINERAPPS_ENVIRONMENT \
-  --resource-group $RESOURCE_GROUP \
-  --location $LOCATION \
-  --workload-profiles enabled
-```
+You can verify a successful deployment by sending a request `POST` request to your application.
+
+Before you run this command, make  sure you replace the `<YOUR_CONTAINER_APP_URL>` URL with your container app URL returned from the previous command.
 
-az containerapp env workload-profile add \
-    --resource-group $RESOURCE_GROUP \
-    --name $CONTAINERAPPS_ENVIRONMENT \
-    --workload-profile-type $GPU_TYPE \
-    --workload-profile-name <WORKLOAD_PROFILE_NAME> \
-
-az containerapp secret set \
-  --name $CONTAINER_APP_NAME \
-  --resource-group $RESOURCE_GROUP \
-  --secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE>
-
-```azurecli ///need add workload profile and verify
-az containerapp create \
-  --name $CONTAINER_APP_NAME \
-  --resource-group $RESOURCE_GROUP \
-  --environment $CONTAINERAPPS_ENVIRONMENT \
-  --image $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0 \
-  --cpu 24 \
-  --memory 220 \
-  --gpu "NvidiaA100" \
-  --secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \
-  --env-vars NGC_API_KEY=secretref:ngc-api-key \
-  --registry-server $ACR_NAME.azurecr.io \
-  --registry-username <ACR_USERNAME> \
-  --registry-password <ACR_PASSWORD>
-
-## Test your NIM
-Once deployed, test the NIM by sending a request:
 ```bash
 curl -X POST \
   'http://<YOUR_CONTAINER_APP_URL>/v1/completions' \
@@ -163,33 +197,24 @@ curl -X POST \
 ```
 
 ## (Optional) Improving performance with volume mounts
-For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. This cache directory can be used to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
+
+For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
 
 ## Clean up resources
 
 If you're not going to continue to use this application, run the following command to delete the resource group along with all the resources created in this tutorial.
 
 >[!CAUTION]
-> The following command deletes the specified resource group and all resources contained within it. If resources outside the scope of this tutorial exist in the specified resource group, they will also be deleted.
-
-# [Bash](#tab/bash)
+> The following command deletes the specified resource group and all resources contained within it. This command also deletes any resources outside the scope of this tutorial that exist in this resource group.
 
 ```azurecli
 az group delete --name $RESOURCE_GROUP
 ```
 
-# [PowerShell](#tab/powershell)
-
-```azurepowershell
-Remove-AzResourceGroup -Name $ResourceGroupName -Force
-```
-
----
-
 > [!TIP]
 > Having issues? Let us know on GitHub by opening an issue in the [Azure Container Apps repo](https://github.com/microsoft/azure-container-apps).
 
-## Next steps
+## Related content
 
-> [!div class="nextstepaction"]
-> [Communication between microservices](communicate-between-microservices.md)
+- [Serverless GPUs overview](./gpu-serverless-overview.md)
+- [Tutorial: Generate image with GPUs](./gpu-image-generation.md)