Skip to content

Commit f3232f5

Browse files
Merge pull request #272550 from ssalgadodev/patch-88
Update how-to-deploy-models-llama.md
2 parents 7964e56 + 0c3bd96 commit f3232f5

File tree

1 file changed

+41
-41
lines changed

1 file changed

+41
-41
lines changed

articles/machine-learning/how-to-deploy-models-llama.md

Lines changed: 41 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: How to deploy Llama family of large language models with Azure Machine Learning studio
2+
title: How to deploy Meta Llama models with Azure Machine Learning studio
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to deploy Llama family of large language models with Azure Machine Learning studio.
4+
description: Learn how to deploy Meta Llama models with Azure Machine Learning studio.
55
manager: scottpolly
66
ms.service: machine-learning
77
ms.subservice: inferencing
@@ -17,37 +17,37 @@ ms.custom: [references_regions]
1717
---
1818

1919

20-
# How to deploy Llama family of large language models with Azure Machine Learning studio
20+
# How to deploy Meta Llama models with Azure Machine Learning studio
2121

22-
In this article, you learn about the Llama family of large language models (LLMs). You also learn how to use Azure Machine Learning studio to deploy models from this set either as a service with pay-as you go billing or with hosted infrastructure in real-time endpoints.
22+
In this article, you learn about the Meta Llama models (LLMs). You also learn how to use Azure Machine Learning studio to deploy models from this set either as a service with pay-as you go billing or with hosted infrastructure in real-time endpoints.
2323

2424
> [!IMPORTANT]
25-
> Read more about the Llama 3 on Azure AI Model Catalog announcement from [Microsoft](https://aka.ms/Llama3Announcement) and from [Meta](https://aka.ms/meta-llama3-announcement-blog).
25+
> Read more about the announcement of Meta Llama 3 models available now on Azure AI Model Catalog: [Microsoft Tech Community Blog](https://aka.ms/Llama3Announcement) and from [Meta Announcement Blog](https://aka.ms/meta-llama3-announcement-blog).
2626
27-
The Llama family of LLMs is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. The model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF), called Llama-3-chat. See the following GitHub samples to explore integrations with [LangChain](https://aka.ms/meta-llama3-langchain-sample), [LiteLLM](https://aka.ms/meta-llama3-litellm-sample), [OpenAI](https://aka.ms/meta-llama3-openai-sample) and the [Azure API](https://aka.ms/meta-llama3-azure-api-sample).
27+
Meta Llama 3 models and tools are a collection of pretrained and fine-tuned generative text models ranging in scale from 8 billion to 70 billion parameters. The Meta Llama model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF), called Meta-Llama-3-8B-Instruct and Meta-Llama-3-70B-Instruct. See the following GitHub samples to explore integrations with [LangChain](https://aka.ms/meta-llama3-langchain-sample), [LiteLLM](https://aka.ms/meta-llama3-litellm-sample), [OpenAI](https://aka.ms/meta-llama3-openai-sample) and the [Azure API](https://aka.ms/meta-llama3-azure-api-sample).
2828

2929
[!INCLUDE [machine-learning-preview-generic-disclaimer](includes/machine-learning-preview-generic-disclaimer.md)]
3030

31-
## Deploy Llama models with pay-as-you-go
31+
## Deploy Meta Llama models with pay-as-you-go
3232

3333
Certain models in the model catalog can be deployed as a service with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
3434

35-
Llama models deployed as a service with pay-as-you-go are offered by Meta AI through Microsoft Azure Marketplace, and they might add more terms of use and pricing.
35+
Meta Llama models are deployed as a service with pay-as-you-go are offered by Meta AI through Microsoft Azure Marketplace, and they might add more terms of use and pricing.
3636

3737
### Azure Marketplace model offerings
3838

39-
The following models are available in Azure Marketplace for Llama when deployed as a service with pay-as-you-go:
39+
The following models are available in Azure Marketplace for Meta Llama models when deployed as a service with pay-as-you-go:
4040

41-
# [Llama 3](#tab/llama-three)
41+
# [Meta Llama 3](#tab/llama-three)
4242

4343
* [Meta Llama-3-8B (preview)](https://aka.ms/aistudio/landing/meta-llama-3-8b-base)
44-
* [Meta Llama-3 8B-Chat (preview)](https://aka.ms/aistudio/landing/meta-llama-3-8b-base)
44+
* [Meta Llama-3 8B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3-8b-chat)
4545
* [Meta Llama-3-70B (preview)](https://aka.ms/aistudio/landing/meta-llama-3-70b-base)
46-
* [Meta Llama-3 70B-Chat (preview)](https://aka.ms/aistudio/landing/meta-llama-3-70b-chat)
46+
* [Meta Llama-3 70B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3-70b-chat)
4747

48-
If you need to deploy a different model, [deploy it to real-time endpoints](#deploy-llama-models-to-real-time-endpoints) instead.
48+
If you need to deploy a different model, [deploy it to real-time endpoints](#deploy-meta-llama-models-to-real-time-endpoints) instead.
4949

50-
# [Llama 2](#tab/llama-two)
50+
# [Meta Llama 2](#tab/llama-two)
5151

5252
* Meta Llama-2-7B (preview)
5353
* Meta Llama 2 7B-Chat (preview)
@@ -56,7 +56,7 @@ If you need to deploy a different model, [deploy it to real-time endpoints](#dep
5656
* Meta Llama-2-70B (preview)
5757
* Meta Llama 2 70B-Chat (preview)
5858

59-
If you need to deploy a different model, [deploy it to real-time endpoints](#deploy-llama-models-to-real-time-endpoints) instead.
59+
If you need to deploy a different model, [deploy it to real-time endpoints](#deploy-meta-llama-models-to-real-time-endpoints) instead.
6060

6161
---
6262

@@ -91,7 +91,7 @@ If you need to deploy a different model, [deploy it to real-time endpoints](#dep
9191

9292
To create a deployment:
9393

94-
# [Llama 3](#tab/llama-three)
94+
# [Meta Llama 3](#tab/llama-three)
9595

9696
1. Go to [Azure Machine Learning studio](https://ml.azure.com/home).
9797
1. Select the workspace in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the **East US 2** region.
@@ -102,10 +102,10 @@ To create a deployment:
102102
1. On the model's overview page, select **Deploy** and then **Pay-as-you-go**.
103103

104104
1. On the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use. You can also select the **Marketplace offer details** tab to learn about pricing for the selected model.
105-
1. If this is your first time deploying the model in the workspace, you have to subscribe your workspace for the particular offering (for example, Llama-3-70b) from Azure Marketplace. This step requires that your account has the Azure subscription permissions and resource group permissions listed in the prerequisites. Each workspace has its own subscription to the particular Azure Marketplace offering, which allows you to control and monitor spending. Select **Subscribe and Deploy**.
105+
1. If this is your first time deploying the model in the workspace, you have to subscribe your workspace for the particular offering (for example, Meta-Llama-3-70B) from Azure Marketplace. This step requires that your account has the Azure subscription permissions and resource group permissions listed in the prerequisites. Each workspace has its own subscription to the particular Azure Marketplace offering, which allows you to control and monitor spending. Select **Subscribe and Deploy**.
106106

107107
> [!NOTE]
108-
> Subscribing a workspace to a particular Azure Marketplace offering (in this case, Llama-3-70b) requires that your account has **Contributor** or **Owner** access at the subscription level where the project is created. Alternatively, your user account can be assigned a custom role that has the Azure subscription permissions and resource group permissions listed in the [prerequisites](#prerequisites).
108+
> Subscribing a workspace to a particular Azure Marketplace offering (in this case, Llama-3-70B) requires that your account has **Contributor** or **Owner** access at the subscription level where the project is created. Alternatively, your user account can be assigned a custom role that has the Azure subscription permissions and resource group permissions listed in the [prerequisites](#prerequisites).
109109
110110
1. Once you sign up the workspace for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ workspace don't require subscribing again. Therefore, you don't need to have the subscription-level permissions for subsequent deployments. If this scenario applies to you, select **Continue to deploy**.
111111

@@ -117,7 +117,7 @@ To create a deployment:
117117
1. You can also take note of the **Target** URL and the **Secret Key** to call the deployment and generate completions.
118118
1. You can always find the endpoint's details, URL, and access keys by navigating to **Workspace** > **Endpoints** > **Serverless endpoints**.
119119

120-
# [Llama 2](#tab/llama-two)
120+
# [Meta Llama 2](#tab/llama-two)
121121

122122
1. Go to [Azure Machine Learning studio](https://ml.azure.com/home).
123123
1. Select the workspace in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the **East US 2** or **West US 3** region.
@@ -153,25 +153,25 @@ To create a deployment:
153153

154154
---
155155

156-
To learn about billing for Llama models deployed with pay-as-you-go, see [Cost and quota considerations for Llama 3 models deployed as a service](#cost-and-quota-considerations-for-llama-models-deployed-as-a-service).
156+
To learn about billing for Meta Llama models deployed with pay-as-you-go, see [Cost and quota considerations for Llama 3 models deployed as a service](#cost-and-quota-considerations-for-meta-llama-models-deployed-as-a-service).
157157

158-
### Consume Llama models as a service
158+
### Consume Meta Llama models as a service
159159

160160
Models deployed as a service can be consumed using either the chat or the completions API, depending on the type of model you deployed.
161161

162-
# [Llama 3](#tab/llama-three)
162+
# [Meta Llama 3](#tab/llama-three)
163163

164164
1. In the **workspace**, select **Endpoints** > **Serverless endpoints**.
165165
1. Find and select the deployment you created.
166166
1. Copy the **Target** URL and the **Key** token values.
167167
1. Make an API request based on the type of model you deployed.
168168

169-
- For completions models, such as `Llama-3-8b`, use the [`<target_url>/v1/completions`](#completions-api) API.
170-
- For chat models, such as `Llama-3-8b-chat`, use the [`<target_url>/v1/chat/completions`](#chat-api) API.
169+
- For completions models, such as `Llama-3-8B`, use the [`<target_url>/v1/completions`](#completions-api) API.
170+
- For chat models, such as `Llama-3-8B-Instruct`, use the [`<target_url>/v1/chat/completions`](#chat-api) API.
171171

172-
For more information on using the APIs, see the [reference](#reference-for-llama-models-deployed-as-a-service) section.
172+
For more information on using the APIs, see the [reference](#reference-for-meta-llama-models-deployed-as-a-service) section.
173173

174-
# [Llama 2](#tab/llama-two)
174+
# [Meta Llama 2](#tab/llama-two)
175175

176176
1. In the **workspace**, select **Endpoints** > **Serverless endpoints**.
177177
1. Find and select the deployment you created.
@@ -181,11 +181,11 @@ Models deployed as a service can be consumed using either the chat or the comple
181181
- For completions models, such as `Llama-2-7b`, use the [`<target_url>/v1/completions`](#completions-api) API.
182182
- For chat models, such as `Llama-2-7b-chat`, use the [`<target_url>/v1/chat/completions`](#chat-api) API.
183183

184-
For more information on using the APIs, see the [reference](#reference-for-llama-models-deployed-as-a-service) section.
184+
For more information on using the APIs, see the [reference](#reference-for-meta-llama-models-deployed-as-a-service) section.
185185

186186
---
187187

188-
### Reference for Llama models deployed as a service
188+
### Reference for Meta Llama models deployed as a service
189189

190190
#### Completions API
191191

@@ -434,15 +434,15 @@ The following is an example response:
434434
}
435435
```
436436

437-
## Deploy Llama models to real-time endpoints
437+
## Deploy Meta Llama models to real-time endpoints
438438

439-
Apart from deploying with the pay-as-you-go managed service, you can also deploy Llama 3 models to real-time endpoints in Azure Machine Learning studio. When deployed to real-time endpoints, you can select all the details about the infrastructure running the model, including the virtual machines to use and the number of instances to handle the load you're expecting. Models deployed to real-time endpoints consume quota from your subscription. All the models in the Llama family can be deployed to real-time endpoints.
439+
Apart from deploying with the pay-as-you-go managed service, you can also deploy Llama 3 models to real-time endpoints in Azure Machine Learning studio. When deployed to real-time endpoints, you can select all the details about the infrastructure running the model, including the virtual machines to use and the number of instances to handle the load you're expecting. Models deployed to real-time endpoints consume quota from your subscription. All the models in the Meta Llama family can be deployed to real-time endpoints.
440440

441441
### Create a new deployment
442442

443-
# [Llama 3](#tab/llama-three)
443+
# [Meta Llama 3](#tab/llama-three)
444444

445-
Follow these steps to deploy a model such as `Llama-3-7b-chat` to a real-time endpoint in [Azure Machine Learning studio](https://ml.azure.com).
445+
Follow these steps to deploy a model such as `Llama-3-7B-Instruct` to a real-time endpoint in [Azure Machine Learning studio](https://ml.azure.com).
446446

447447
1. Select the workspace in which you want to deploy the model.
448448
1. Choose the model that you want to deploy from the studio's [model catalog](https://ml.azure.com/model/catalog).
@@ -454,7 +454,7 @@ Follow these steps to deploy a model such as `Llama-3-7b-chat` to a real-time en
454454
1. On the **Deploy with Azure AI Content Safety (preview)** page, select **Skip Azure AI Content Safety** so that you can continue to deploy the model using the UI.
455455

456456
> [!TIP]
457-
> In general, we recommend that you select **Enable Azure AI Content Safety (Recommended)** for deployment of the Llama model. This deployment option is currently only supported using the Python SDK and it happens in a notebook.
457+
> In general, we recommend that you select **Enable Azure AI Content Safety (Recommended)** for deployment of the Meta Llama model. This deployment option is currently only supported using the Python SDK and it happens in a notebook.
458458
459459
1. Select **Proceed**.
460460

@@ -471,7 +471,7 @@ Follow these steps to deploy a model such as `Llama-3-7b-chat` to a real-time en
471471

472472
For more information on how to deploy models to real-time endpoints, using the studio, see [Deploying foundation models to endpoints for inferencing](how-to-use-foundation-models.md#deploying-foundation-models-to-endpoints-for-inferencing).
473473

474-
# [Llama 2](#tab/llama-two)
474+
# [Meta Llama 2](#tab/llama-two)
475475

476476
Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time endpoint in [Azure Machine Learning studio](https://ml.azure.com).
477477

@@ -487,7 +487,7 @@ Follow these steps to deploy a model such as `Llama-2-7b-chat` to a real-time en
487487
1. On the **Deploy with Azure AI Content Safety (preview)** page, select **Skip Azure AI Content Safety** so that you can continue to deploy the model using the UI.
488488

489489
> [!TIP]
490-
> In general, we recommend that you select **Enable Azure AI Content Safety (Recommended)** for deployment of the Llama model. This deployment option is currently only supported using the Python SDK and it happens in a notebook.
490+
> In general, we recommend that you select **Enable Azure AI Content Safety (Recommended)** for deployment of the Meta Llama model. This deployment option is currently only supported using the Python SDK and it happens in a notebook.
491491
492492
1. Select **Proceed**.
493493

@@ -506,15 +506,15 @@ For more information on how to deploy models to real-time endpoints, using the s
506506

507507
---
508508

509-
### Consume Llama models deployed to real-time endpoints
509+
### Consume Meta Llama models deployed to real-time endpoints
510510

511-
For reference about how to invoke Llama 3 models deployed to real-time endpoints, see the model's card in Azure Machine Learning studio [model catalog](concept-model-catalog.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
511+
For reference about how to invoke Meta Llama 3 models deployed to real-time endpoints, see the model's card in Azure Machine Learning studio [model catalog](concept-model-catalog.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation.
512512

513513
## Cost and quotas
514514

515-
### Cost and quota considerations for Llama models deployed as a service
515+
### Cost and quota considerations for Meta Llama models deployed as a service
516516

517-
Llama models deployed as a service are offered by Meta through Azure Marketplace and integrated with Azure Machine Learning studio for use. You can find Azure Marketplace pricing when deploying or fine-tuning models.
517+
Meta Llama models deployed as a service are offered by Meta through Azure Marketplace and integrated with Azure Machine Learning studio for use. You can find Azure Marketplace pricing when deploying or fine-tuning models.
518518

519519
Each time a workspace subscribes to a given model offering from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently.
520520

@@ -524,9 +524,9 @@ For more information on how to track costs, see [Monitor costs for models offere
524524

525525
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
526526

527-
### Cost and quota considerations for Llama models deployed as real-time endpoints
527+
### Cost and quota considerations for Meta Llama models deployed as real-time endpoints
528528

529-
For deployment and inferencing of Llama models with real-time endpoints, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure Machine Learning studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once you reach this limit, you can request a quota increase.
529+
For deployment and inferencing of Meta Llama models with real-time endpoints, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure Machine Learning studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once you reach this limit, you can request a quota increase.
530530

531531
## Content filtering
532532

0 commit comments

Comments
 (0)