Skip to content

Commit 00bf857

Browse files
committed
update to llama-3.1-8b
1 parent c13bf85 commit 00bf857

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/container-apps/tutorial-gpu-with-serverless-gpu.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Tutorial: Deploy your first container app'
2+
title: 'Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps'
33
description: Deploy a NVIDIA NIM to Azure Container Apps.
44
services: container-apps
55
author: craigshoemaker
@@ -10,7 +10,7 @@ ms.author: cachai
1010
ms.devlang: azurecli
1111
---
1212

13-
# Tutorial: Deploy an NVIDIA LLAMA3 NIM to Azure Container Apps
13+
# Tutorial: Deploy an NVIDIA Llama3 NIM to Azure Container Apps
1414

1515
NVIDIA Inference Microservices (NIMs) are optimized, containerized AI inference microservices designed to simplify and accelerate the deployment of AI models. When you use Azure Container Apps with serverless GPUs, you can run these NIMs efficiently without having to manage the underlying infrastructure.​
1616

@@ -48,7 +48,7 @@ This tutorial uses a premium instance of Azure Container Registry to improve col
4848
CONTAINERAPPS_ENVIRONMENT="my-environment-name"
4949
GPU_TYPE="Consumption-GPU-NC24-A100"
5050
CONTAINER_APP_NAME="llama3-nim"
51-
CONTAINER_AND_TAG="llama3-8b-instruct:1.0.0"
51+
CONTAINER_AND_TAG="llama-3.1-8b-instruct:latest"
5252
```
5353

5454
[!INCLUDE [container-apps-create-resource-group.md](../../includes/container-apps-create-resource-group.md)]
@@ -120,7 +120,7 @@ Many of the NIM images are large, and your container app can take a long time to
120120
```azurecli
121121
az acr artifact-streaming update \
122122
--name $ACR_NAME \
123-
--repository llama31_8b_ins \
123+
--repository llama-3.1-8b-instruct \
124124
--enable-streaming True
125125
```
126126
@@ -129,7 +129,7 @@ Many of the NIM images are large, and your container app can take a long time to
129129
```azurecli
130130
az acr artifact-streaming create \
131131
--name $ACR_NAME \
132-
--image llama31_8b_ins:latest
132+
--image $CONTAINER_AND_TAG
133133
```
134134
135135
## Create your container app
@@ -197,7 +197,7 @@ curl -X POST \
197197
198198
## (Optional) Improving performance with volume mounts
199199
200-
For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, see this article.
200+
When starting up and using artifact streaming with Azure Container Registry, your NIMs will still be pulling the images from the container registry at startup. This can add cold start even with the optimized artifact streaming. For even faster cold start times, many of the NIMs provide a volume mount path to mount a cache directory. You can use this cache directory to store the model weights and other files that the NIM needs to run. To set up a volume mount for the Llama3 NIM, you will need to set a volume mount on the `./opt/nim/.cache` as specified in the [NVIDIA Llama-3.1-8b documentation](https://build.nvidia.com/meta/llama-3_1-8b-instruct/deploy). To do so, follow the steps in the [volume mounts tutorial](./storage-mounts-azure-files.md) and set the volume mount path to `/opt/nim/.cache`.
201201
202202
## Clean up resources
203203

0 commit comments

Comments
 (0)