You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/container-apps/tutorial-gpu-with-serverless-gpu.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'Tutorial: Deploy your first container app'
2
+
title: 'Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps'
3
3
description: Deploy a NVIDIA NIM to Azure Container Apps.
4
4
services: container-apps
5
5
author: craigshoemaker
@@ -10,7 +10,7 @@ ms.author: cachai
10
10
ms.devlang: azurecli
11
11
---
12
12
13
-
# Tutorial: Deploy an NVIDIA LLAMA3 NIM to Azure Container Apps
13
+
# Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps
14
14
15
15
NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models. When you use Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without having to manage the underlying infrastructure.
16
16
@@ -48,7 +48,7 @@ This tutorial uses a premium instance of Azure Container Registry to improve col
@@ -120,7 +120,7 @@ Many of the NIM images are large, and your container app can take a long time to
120
120
```azurecli
121
121
az acr artifact-streaming update \
122
122
--name $ACR_NAME \
123
-
--repository llama31_8b_ins \
123
+
--repository llama-3.1-8b-instruct \
124
124
--enable-streaming True
125
125
```
126
126
@@ -129,7 +129,7 @@ Many of the NIM images are large, and your container app can take a long time to
129
129
```azurecli
130
130
az acr artifact-streaming create \
131
131
--name $ACR_NAME \
132
-
--image llama31_8b_ins:latest
132
+
--image $CONTAINER_AND_TAG
133
133
```
134
134
135
135
## Create your container app
@@ -197,7 +197,7 @@ curl -X POST \
197
197
198
198
## (Optional) Improving performance with volume mounts
199
199
200
-
For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
200
+
When starting up and using artifact streaming with Azure Container Registry, your NIMs will still be pulling the images from the container registry at startup. This can add cold start even with the optimized artifact streaming. For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, you will need to set a volume mount on the `./opt/nim/.cache` as specified in the [NVIDIA Llama-3.1-8b documentation](https://build.nvidia.com/meta/llama-3_1-8b-instruct/deploy). To do so, follow the steps in the [volume mounts tutorial](./storage-mounts-azure-files.md) and set the volume mount path to `/opt/nim/.cache`.
0 commit comments