Skip to content

Commit 7246603

Browse files
authored
Merge pull request #296406 from cachai2/nimtutorial
init
2 parents 07775b7 + 2038520 commit 7246603

File tree

3 files changed

+229
-4
lines changed

3 files changed

+229
-4
lines changed

articles/container-apps/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,8 @@
280280
items:
281281
- name: Generate images with serverless GPUs
282282
href: gpu-image-generation.md
283+
- name: Deploy an NVIDIA Llama3 NIM
284+
href: serverless-gpu-nim.md
283285
- name: Microservices
284286
items:
285287
- name: Developing with Dapr

articles/container-apps/gpu-serverless-overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.author: cshoe
1515

1616
Azure Container Apps provides access to GPUs on-demand without you having to manage the underlying infrastructure. As a serverless feature, you only pay for GPUs in use. When enabled, the number of GPUs used for your app rises and falls to meet the load demands of your application. Serverless GPUs enable you to seamlessly run your workloads with automatic scaling, optimized cold start, per-second billing with scale down to zero when not in use, and reduced operational overhead.
1717

18-
Serverless GPUs are only supported for Consumption workload profiles. The feature is not supported for Consumption-only environments.
18+
Serverless GPUs are only supported for Consumption workload profiles. The feature isn't supported for Consumption-only environments.
1919

2020
> [!NOTE]
2121
> Access to GPUs is only available after you request GPU quotas. You can submit your GPU quota request via a [customer support case](/azure/azure-portal/supportability/how-to-create-azure-support-request).
@@ -85,18 +85,18 @@ In the *Container* tab of the create process, set the following settings:
8585

8686
1. Under the *Container resource allocation* section, check the **GPU** checkbox.
8787

88-
1. For the *GPU Type**, select either the NVIDIA A100 or NVIDIA T4 option.
88+
1. For the **GPU Type**, select either the NVIDIA A100 or NVIDIA T4 option.
8989

9090
## Manage serverless GPU workload profile
9191

9292
Serverless GPUs are run on consumption GPU workload profiles. You manage a consumption GPU workload profile in the same manner as any other workload profile. You can manage your workload profile using the [CLI](workload-profiles-manage-cli.md) or the [Azure portal](workload-profiles-manage-portal.md).
9393

9494
## Improve GPU cold start
9595

96-
You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry.
96+
You can improve cold start on your GPU-enabled containers by enabling artifact streaming on your Azure Container Registry. For more information, see [enable artifact streaming](/azure/container-registry/container-registry-artifact-streaming?pivots=development-environment-azure-cli).
9797

9898
> [!NOTE]
99-
> To use artifact streaming, your container images must be hosted in Azure Container Registry.
99+
> To use artifact streaming, your container images must be hosted in a premium Azure Container Registry.
100100
101101
Use the following steps to enable image streaming:
102102

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
title: 'Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps'
3+
description: Deploy a NVIDIA NIM to Azure Container Apps.
4+
services: container-apps
5+
author: craigshoemaker
6+
ms.service: azure-container-apps
7+
ms.topic: tutorial
8+
ms.date: 03/16/2025
9+
ms.author: cachai
10+
ms.devlang: azurecli
11+
---
12+
13+
# Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps
14+
15+
NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices which simplify and accelerate how you build AI applications. These models are pre-packaged, scalable, and performance-tuned for direct deployment as secure endpoints on Azure Container Apps. When you use Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without having to manage the underlying infrastructure.​
16+
17+
In this tutorial, you learn to deploy a Llama3 NVIDIA NIM to Azure Container Apps using serverless GPUs.
18+
19+
This tutorial uses a premium instance of Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, make sure to modify the `az acr create` command in this tutorial to set `--sku` to `basic`.
20+
21+
## Prerequisites
22+
23+
| Resource | Description |
24+
|---|---|
25+
| Azure account | An Azure account with an active subscription.<br><br>If you don't have one, you [can create one for free](https://azure.microsoft.com/free/). |
26+
| Azure CLI | Install the [Azure CLI](/cli/azure/install-azure-cli). |
27+
| NVIDIA NGC API key | You can get an API key from the [NVIDIA GPU Cloud (NGC) website](https://catalog.ngc.nvidia.com). |
28+
29+
[!INCLUDE [container-apps-create-cli-steps.md](../../includes/container-apps-create-cli-steps.md)]
30+
31+
1. Set up environment variables by naming the resource group and setting the location.
32+
33+
```bash
34+
RESOURCE_GROUP="my-resource-group"
35+
LOCATION="swedencentral"
36+
```
37+
38+
Next, generate a unique container registry name.
39+
40+
```bash
41+
SUFFIX=$(head /dev/urandom | tr -dc 'a-z0-9' | head -c 6)
42+
ACR_NAME="mygpututorialacr${SUFFIX}"
43+
```
44+
45+
Finally, set variables to name the environment and identify the environment, workload profile type, container app name, and container.
46+
47+
```bash
48+
CONTAINERAPPS_ENVIRONMENT="my-environment-name"
49+
GPU_TYPE="Consumption-GPU-NC24-A100"
50+
CONTAINER_APP_NAME="llama3-nim"
51+
CONTAINER_AND_TAG="llama-3.1-8b-instruct:latest"
52+
```
53+
54+
[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
55+
56+
57+
1. Create an Azure Container Registry (ACR).
58+
59+
> [!NOTE]
60+
> This tutorial uses a premium Azure Container Registry to improve cold start performance when working with serverless GPUs. If you don't want to use a premium Azure Container Registry, modify the following command and set `--sku` to `basic`.
61+
62+
```azurecli
63+
az acr create \
64+
--resource-group $RESOURCE_GROUP \
65+
--name $ACR_NAME \
66+
--location $LOCATION \
67+
--sku premium
68+
```
69+
70+
## Pull, tag, and push your image
71+
72+
Next, pull the image from NVIDIA GPU Cloud and push to Azure Container Registry.
73+
74+
> [!NOTE]
75+
> NVIDIA NICs each have their own hardware requirements. Make sure the GPU type you select supports the [NIM](https://build.nvidia.com/models?filters=nimType%3Anim_type_run_anywhere&q=llama) of your choice. The Llama3 NIM used in this tutorial can run on NVIDIA A100 GPUs.
76+
77+
1. Authenticate to the NVIDIA container registry.
78+
79+
```bash
80+
docker login nvcr.io
81+
```
82+
83+
After you run this command, the sign in process prompts you to enter a username. Enter **$oauthtoken** for your user name value.
84+
85+
Then you're prompted for a password. Enter your NVIDIA NGC API key here. Once authenticated to the NVIDIA registry, you can authenticate to the Azure registry.
86+
87+
1. Authenticate to Azure Container Registry.
88+
89+
```bash
90+
az acr login --name $ACR_NAME
91+
```
92+
93+
1. Pull the Llama3 NIM image.
94+
95+
```azurecli
96+
docker pull nvcr.io/nim/meta/$CONTAINER_AND_TAG
97+
```
98+
99+
1. Tag the image.
100+
101+
```azurecli
102+
docker tag nvcr.io/nim/meta/$CONTAINER_AND_TAG $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
103+
```
104+
105+
1. Push the image to Azure Container Registry.
106+
107+
```azurecli
108+
docker push $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG
109+
```
110+
111+
## Enable artifact streaming (recommended but optional)
112+
113+
When your container app runs, it pulls the container from your container registry. When you have larger images like in the case of AI workloads, this image pull may take some time. By enabling artifact streaming, you reduce the time needed, and your container app can take a long time to start if you don't enable artifact streaming. Use the following steps to enable artifact streaming.
114+
115+
> [!NOTE]
116+
> The following commands can take a few minutes to complete.
117+
118+
1. Enable artifact streaming on your container registry.
119+
120+
```azurecli
121+
az acr artifact-streaming update \
122+
--name $ACR_NAME \
123+
--repository llama-3.1-8b-instruct \
124+
--enable-streaming True
125+
```
126+
127+
1. Enable artifact streaming on the container image.
128+
129+
```azurecli
130+
az acr artifact-streaming create \
131+
--name $ACR_NAME \
132+
--image $CONTAINER_AND_TAG
133+
```
134+
135+
## Create your container app
136+
137+
Next you create a container app with the NVIDIA GPU Cloud API key.
138+
139+
1. Create the container app.
140+
141+
```azurecli
142+
az containerapp env create \
143+
--name $CONTAINERAPPS_ENVIRONMENT \
144+
--resource-group $RESOURCE_GROUP \
145+
--location $LOCATION \
146+
--enable-workload-profiles
147+
```
148+
149+
1. Add the GPU workload profile to your environment.
150+
151+
```azurecli
152+
az containerapp env workload-profile add \
153+
--resource-group $RESOURCE_GROUP \
154+
--name $CONTAINERAPPS_ENVIRONMENT \
155+
--workload-profile-type $GPU_TYPE \
156+
--workload-profile-name LLAMA_PROFILE
157+
```
158+
159+
1. Create the container app.
160+
161+
```azurecli
162+
az containerapp create \
163+
--name $CONTAINER_APP_NAME \
164+
--resource-group $RESOURCE_GROUP \
165+
--environment $CONTAINERAPPS_ENVIRONMENT \
166+
--image $ACR_NAME.azurecr.io/$CONTAINER_AND_TAG \
167+
--cpu 24 \
168+
--memory 220 \
169+
--target-port 8000 \
170+
--ingress external \
171+
--secrets ngc-api-key=<PASTE_NGC_API_KEY_HERE> \
172+
--env-vars NGC_API_KEY=secretref:ngc-api-key \
173+
--registry-server $ACR_NAME.azurecr.io \
174+
--workload-profile-name LLAMA_PROFILE \
175+
--query properties.configuration.ingress.fqdn
176+
```
177+
178+
This command returns the URL of your container app. Set this value aside in a text editor for use in a following command.
179+
180+
## Verify the application works
181+
182+
You can verify a successful deployment by sending a request `POST` request to your application.
183+
184+
Before you run this command, make sure you replace the `<YOUR_CONTAINER_APP_URL>` URL with your container app URL returned from the previous command.
185+
186+
```bash
187+
curl -X POST \
188+
'http://<YOUR_CONTAINER_APP_URL>/v1/completions' \
189+
-H 'accept: application/json' \
190+
-H 'Content-Type: application/json' \
191+
-d '{
192+
"model": "meta/llama-3.1-8b-instruct",
193+
"prompt": [{"role":"user", "content":"Once upon a time..."}],
194+
"max_tokens": 64
195+
}'
196+
```
197+
198+
## Improving performance with volume mounts (optional)
199+
200+
When starting up and using artifact streaming with Azure Container Registry, Azure Container Apps is still pulling the images from the container registry at startup. This action results in a cold start even with the optimized artifact streaming.
201+
202+
For even faster cold start times, many of the NIMs provide a volume mount path to store your image in a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run.
203+
204+
To set up a volume mount for the Llama3 NIM, you need to set a volume mount on the `./opt/nim/.cache` as specified in the [NVIDIA Llama-3.1-8b documentation](https://build.nvidia.com/meta/llama-3_1-8b-instruct/deploy). To do so, follow the steps in the [volume mounts tutorial](./storage-mounts-azure-files.md) and set the volume mount path to `/opt/nim/.cache`.
205+
206+
## Clean up resources
207+
208+
If you're not going to continue to use this application, run the following command to delete the resource group along with all the resources created in this tutorial.
209+
210+
>[!CAUTION]
211+
> The following command deletes the specified resource group and all resources contained within it. This command also deletes any resources outside the scope of this tutorial that exist in this resource group.
212+
213+
```azurecli
214+
az group delete --name $RESOURCE_GROUP
215+
```
216+
217+
> [!TIP]
218+
> Having issues? Let us know on GitHub by opening an issue in the [Azure Container Apps repo](https://github.com/microsoft/azure-container-apps).
219+
220+
## Related content
221+
222+
- [Serverless GPUs overview](./gpu-serverless-overview.md)
223+
- [Tutorial: Generate image with GPUs](./gpu-image-generation.md)

0 commit comments

Comments
 (0)