Skip to content

Commit f694186

Browse files
authored
Merge pull request #2 from craigshoemaker/aca/cary/gpu-tutorial
[Container Apps] Update: NIM GPU tutorial
2 parents 4456a00 + 5d63c54 commit f694186

File tree

3 files changed

+145
-118
lines changed

3 files changed

+145
-118
lines changed

articles/container-apps/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,8 @@
280280
items:
281281
- name: Generate images with serverless GPUs
282282
href: gpu-image-generation.md
283+
- name: Deploy an NVIDIA LLAMA3 NIM
284+
href: tutorial-gpu-with-serverless-gpu.md
283285
- name: Microservices
284286
items:
285287
- name: Developing with Dapr

articles/container-apps/gpu-serverless-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ ms.service: azure-container-apps
77
ms.custom:
88
- ignite-2024
99
ms.topic: how-to
10-
ms.date: 11/06/2024
10+
ms.date: 03/17/2025
1111
ms.author: cshoe
1212
---
1313

1414
# Using serverless GPUs in Azure Container Apps (preview)
1515

1616
Azure Container Apps provides access to GPUs on-demand without you having to manage the underlying infrastructure. As a serverless feature, you only pay for GPUs in use. When enabled, the number of GPUs used for your app rises and falls to meet the load demands of your application. Serverless GPUs enable you to seamlessly run your workloads with automatic scaling, optimized cold start, per-second billing with scale down to zero when not in use, and reduced operational overhead.
1717

18-
Serverless GPUs are only supported for Consumption workload profiles. The feature is not supported for Consumption-only environments.
18+
Serverless GPUs are only supported for Consumption workload profiles. The feature isn't supported for Consumption-only environments.
1919

2020
> [!NOTE]
2121
> Access to GPUs is only available after you request GPU quotas. You can submit your GPU quota request via a [customer support case](/azure/azure-portal/supportability/how-to-create-azure-support-request).
@@ -93,7 +93,7 @@ Serverless GPUs are run on consumption GPU workload profiles. You manage a consu
9393

9494
## Improve GPU cold start
9595

96-
You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry. For more details, see [enable artifact streaming](./https://learn.microsoft.com/en-us/azure/container-registry/container-registry-artifact-streaming?pivots=development-environment-azure-cli#pushimport-the-image-and-generate-the-streaming-artifact----azure-cli).
96+
You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry. For more information, see [enable artifact streaming](/azure/container-registry/container-registry-artifact-streaming?pivots=development-environment-azure-cli).
9797

9898
> [!NOTE]
9999
> To use artifact streaming, your container images must be hosted in a premium Azure Container Registry.

articles/container-apps/tutorial-gpu-with-serverless-gpu.md

Lines changed: 140 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -7,149 +7,183 @@ ms.service: azure-container-apps
77
ms.topic: tutorial
88
ms.date: 03/16/2025
99
ms.author: cachai
10-
ms.custom: mode-api, devx-track-azurecli, devx-track-azurepowershell
1110
ms.devlang: azurecli
1211
---
1312

1413
# Tutorial: Deploy an NVIDIA LLAMA3 NIM to Azure Container Apps
1514

16-
NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models across various environments. By leveraging Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without managing the underlying infrastructure.​
15+
NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models. When you use Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without having to manage the underlying infrastructure.​
1716

18-
In this tutorial, you'll deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs.
17+
In this tutorial, you learn to deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs.
18+
19+
This tutorial uses a premium instance of Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, make sure to modify the `az acr create` command in this tutorial to set `--sku` to `basic`.
1920

2021
## Prerequisites
2122

22-
- An Azure account with an active subscription.
23-
- If you don't have one, you [can create one for free](https://azure.microsoft.com/free/).
24-
- Install the [Azure CLI](/cli/azure/install-azure-cli).
25-
- Have a NVIDIA NGC API Key. Obtain an API key from the [NVIDIA NGC website](https://catalog.ngc.nvidia.com).
23+
| Resource | Description |
24+
|---|---|
25+
| Azure account | An Azure account with an active subscription.<br><br>If you don't have one, you [can create one for free](https://azure.microsoft.com/free/). |
26+
| Azure CLI | Install the [Azure CLI](/cli/azure/install-azure-cli). |
27+
| NVIDIA NGC API key | You can get an API key from the [NVIDIA GPU Cloud (NGC) website](https://catalog.ngc.nvidia.com). |
2628

2729
[!INCLUDE [container-apps-create-cli-steps.md](../../includes/container-apps-create-cli-steps.md)]
2830

29-
[!INCLUDE [container-apps-set-environment-variables.md](../../includes/container-apps-set-environment-variables.md)]
31+
1. Set up environment variables by naming the resource group and setting the location.
3032

31-
[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
33+
```bash
34+
RESOURCE_GROUP="my-resource-group"
35+
LOCATION="swedencentral"
36+
```
3237

33-
[!INCLUDE [container-apps-create-environment.md](../../includes/container-apps-create-environment.md)]
38+
Next, generate a unique container registry name.
3439

35-
## Initial setup
40+
```bash
41+
SUFFIX=$(head /dev/urandom | tr -dc 'A-Za-z0-9' | head -c 6)
42+
ACR_NAME="mygpututorialacr${SUFFIX}"
43+
```
3644

37-
1. Set up environment variables
45+
Finally, set variables to name the environment and identify the environment, workload profile type, container app name, and container.
3846

39-
```bash
40-
RESOURCE_GROUP="my-resource-group"
41-
LOCATION="swedencentral"
42-
ACR_NAME="myacrname"
43-
CONTAINERAPPS_ENVIRONMENT="my-environment-name"
44-
CONTAINER_APP_NAME="llama3-nim"
45-
GPU_TYPE="Consumption-GPU-NC24-A100"
46-
```
47+
```bash
48+
CONTAINERAPPS_ENVIRONMENT="my-environment-name"
49+
GPU_TYPE="Consumption-GPU-NC24-A100"
50+
CONTAINER_APP_NAME="llama3-nim"
51+
CONTAINER_AND_TAG="llama3-8b-instruct:1.0.0"
52+
```
4753

48-
1. Create an Azure resource group
54+
[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
4955

50-
```azurecli
51-
az group create --name $RESOURCE_GROUP --location $LOCATION
52-
```
56+
[!INCLUDE [container-apps-create-environment.md](../../includes/container-apps-create-environment.md)]
5357

5458
1. Create an Azure Container Registry (ACR)
5559

60+
> [!NOTE]
61+
> This tutorial uses a premium Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, modify the following command and set `--sku` to `basic`.
62+
63+
```azurecli
64+
az acr create \
65+
--resource-group $RESOURCE_GROUP \
66+
--name $ACR_NAME \
67+
--location $LOCATION \
68+
--sku premium
69+
```
70+
71+
## Pull, tag, and push your image
72+
73+
Next, pull the image from NVIDIA GPU Cloud and push to Azure Container Registry.
74+
5675
> [!NOTE]
57-
> This tutorial uses a premium Azure Contianer Registry as it is recommended when using serverless GPUs for improved cold start performance. If you do not wish to use a premium Azure Container Registry, modify the below command, so --sku is set to Basic.
76+
> NVIDIA NICs each has their own hardware requirements. Make sure the GPU type you select supports the [NIM](link) of your choice. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs.
5877
59-
```azurecli
60-
az acr create --resource-group $RESOURCE_GROUP --name $ACR_NAME --sku Premium --location $LOCATION
61-
```
78+
1. Authenticate to the NVIDIA container registry.
79+
80+
```bash
81+
docker login nvcr.io
82+
```
83+
84+
After you run this command, the sign in process prompts you to enter a username. Enter **$oauthtoken** for your user name value.
85+
86+
Then you're prompted for a password. Enter your NVIDIA NGC API key here. Once authenticated to the NVIDIA registry, you can authenticate to the Azure registry.
87+
88+
1. Authenticate to Azure Container Registry.
89+
90+
```bash
91+
az acr login --name $ACR_NAME
92+
```
93+
94+
1. Pull the Llama3 NIM image.
95+
96+
```azurecli
97+
docker pull nvcr.io/nim/meta/$CONTAINER_AND_TAG
98+
```
99+
100+
1. Tag the image.
101+
102+
```azurecli
103+
docker tag nvcr.io/nim/meta/$CONTAINER_AND_TAG $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
104+
```
105+
106+
1. Push the image to Azure Container Registry.
107+
108+
```azurecli
109+
docker push $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
110+
```
111+
112+
## Enable artifact streaming (recommended but optional)
62113

63-
## Pull the image from NGC and push to ACR
114+
Many of the NIM images are large, and your container app can take a long time to start if you don't enable artifact streaming. Use the following steps to enable artifact streaming.
64115
65116
> [!NOTE]
66-
> NVIDIA NICs each have their own hardware requirements. [Make sure the NIM](link) you select is supported by the GPU types available in Azure Container Apps. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs.
117+
> The following commands can take a few minutes to complete.
67118
68-
1. Authenticate with both the NVIDIA and azure container registries
119+
1. Enable artifact streaming on your container registry.
69120
70-
```bash
71-
docker login nvcr.io
72-
Username: $oauthtoken
73-
Password: <PASTE_API_KEY_HERE>
74-
```
121+
```azurecli
122+
az acr artifact-streaming update \
123+
--name $ACR_NAME \
124+
--repository llama31_8b_ins \
125+
--enable-streaming True
126+
```
75127
76-
```bash
77-
az acr login --name $ACR_NAME
78-
```
128+
1. Enable artifact streaming on the container image.
79129
80-
1. Pull the Llama3 NIM image and push it to your Azure Container Registry
130+
```azurecli
131+
az acr artifact-streaming create \
132+
--name $ACR_NAME \
133+
--image llama31_8b_ins:latest
134+
```
81135
82-
Pull the image
83-
```azurecli
84-
docker pull nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
85-
```
136+
## Create your container app
86137
87-
Tag the image
88-
```azurecli
89-
docker tag nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0
90-
```
138+
Next you create a container app with the NVIDIA GPU Cloud API key.
91139
92-
Push the image
93-
```azurecli
94-
docker push $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0
95-
```
140+
1. Create the container app.
96141
97-
## (Recommended: Optional) Enable artifact streaming.
142+
```azurecli
143+
az containerapp env create \
144+
--name $CONTAINERAPPS_ENVIRONMENT \
145+
--resource-group $RESOURCE_GROUP \
146+
--location $LOCATION \
147+
--enable-workload-profiles
148+
```
98149
99-
Many of the NIM images are large, and your container app may take a long time to start if you don't enable artifact streaming. To enable artifact streaming, follow these steps:
150+
1. Add the GPU workload profile to your environment.
100151
101-
```azurecli
102-
az acr artifact-streaming create --image jupyter/all-spark-notebook:latest
103-
```
152+
```azurecli
153+
az containerapp env workload-profile add \
154+
--resource-group $RESOURCE_GROUP \
155+
--name $CONTAINERAPPS_ENVIRONMENT \
156+
--workload-profile-type $GPU_TYPE \
157+
--workload-profile-name LLAMA_PROFILE
158+
```
104159
105-
```azurecli
106-
az acr artifact-streaming update --repository jupyter/all-spark-notebook --enable-streaming true
107-
```
160+
1. Create the container app.
108161
109-
```azurecli
110-
az acr artifact-streaming operation show --image jupyter/all-spark-notebook:newtag
111-
```
162+
```azurecli
163+
az containerapp create \
164+
--name $CONTAINER_APP_NAME \
165+
--resource-group $RESOURCE_GROUP \
166+
--environment $CONTAINERAPPS_ENVIRONMENT \
167+
--image $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG \
168+
--cpu 24 \
169+
--memory 220 \
170+
--gpu "NVIDIAA100" \
171+
--secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \
172+
--env-vars NGC_API_KEY=secretref:ngc-api-key \
173+
--registry-server $ACR_NAME.azurecr.io \
174+
--registry-username <ACR_USERNAME> \
175+
--registry-password <ACR_PASSWORD> \
176+
--query properties.configuration.ingress.fqdn
177+
```
112178
113-
Note: Tis may take a few minutes.
179+
This command returns the URL of your container app. Set this value aside in a text editor for use in a following command.
114180
115-
## Create your container app with the NGC API Key
181+
## Verify the application works
116182
117-
```azurecli
118-
az containerapp env create \
119-
--name $CONTAINERAPPS_ENVIRONMENT \
120-
--resource-group $RESOURCE_GROUP \
121-
--location $LOCATION \
122-
--workload-profiles enabled
123-
```
183+
You can verify a successful deployment by sending a request `POST` request to your application.
184+
185+
Before you run this command, make sure you replace the `<YOUR_CONTAINER_APP_URL>` URL with your container app URL returned from the previous command.
124186
125-
az containerapp env workload-profile add \
126-
--resource-group $RESOURCE_GROUP \
127-
--name $CONTAINERAPPS_ENVIRONMENT \
128-
--workload-profile-type $GPU_TYPE \
129-
--workload-profile-name <WORKLOAD_PROFILE_NAME> \
130-
131-
az containerapp secret set \
132-
--name $CONTAINER_APP_NAME \
133-
--resource-group $RESOURCE_GROUP \
134-
--secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE>
135-
136-
```azurecli ///need add workload profile and verify
137-
az containerapp create \
138-
--name $CONTAINER_APP_NAME \
139-
--resource-group $RESOURCE_GROUP \
140-
--environment $CONTAINERAPPS_ENVIRONMENT \
141-
--image $ACR_NAME.azurecr.io/llama3-8b-instruct:1.0.0 \
142-
--cpu 24 \
143-
--memory 220 \
144-
--gpu "NvidiaA100" \
145-
--secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \
146-
--env-vars NGC_API_KEY=secretref:ngc-api-key \
147-
--registry-server $ACR_NAME.azurecr.io \
148-
--registry-username <ACR_USERNAME> \
149-
--registry-password <ACR_PASSWORD>
150-
151-
## Test your NIM
152-
Once deployed, test the NIM by sending a request:
153187
```bash
154188
curl -X POST \
155189
'http://<YOUR_CONTAINER_APP_URL>/v1/completions' \
@@ -163,33 +197,24 @@ curl -X POST \
163197
```
164198
165199
## (Optional) Improving performance with volume mounts
166-
For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. This cache directory can be used to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
200+
201+
For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
167202
168203
## Clean up resources
169204
170205
If you're not going to continue to use this application, run the following command to delete the resource group along with all the resources created in this tutorial.
171206

172207
>[!CAUTION]
173-
> The following command deletes the specified resource group and all resources contained within it. If resources outside the scope of this tutorial exist in the specified resource group, they will also be deleted.
174-
175-
# [Bash](#tab/bash)
208+
> The following command deletes the specified resource group and all resources contained within it. This command also deletes any resources outside the scope of this tutorial that exist in this resource group.
176209

177210
```azurecli
178211
az group delete --name $RESOURCE_GROUP
179212
```
180213

181-
# [PowerShell](#tab/powershell)
182-
183-
```azurepowershell
184-
Remove-AzResourceGroup -Name $ResourceGroupName -Force
185-
```
186-
187-
---
188-
189214
> [!TIP]
190215
> Having issues? Let us know on GitHub by opening an issue in the [Azure Container Apps repo](https://github.com/microsoft/azure-container-apps).
191216

192-
## Next steps
217+
## Related content
193218

194-
> [!div class="nextstepaction"]
195-
> [Communication between microservices](communicate-between-microservices.md)
219+
- [Serverless GPUs overview](./gpu-serverless-overview.md)
220+
- [Tutorial: Generate image with GPUs](./gpu-image-generation.md)

0 commit comments

Comments
 (0)