Skip to content

Commit 672baf0

Browse files
authored
Merge pull request #457 from MicrosoftDocs/release-llama-winter-2024
release-llama-winter-2024 -> main -- 9/25 - 04:15 PM PST
2 parents d204cd0 + 4c54f5f commit 672baf0

File tree

1 file changed

+48
-36
lines changed

1 file changed

+48
-36
lines changed

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 48 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: How to use Meta Llama chat models with Azure AI Studio
2+
title: How to use the Meta Llama family of models with Azure AI Studio
33
titleSuffix: Azure AI Studio
4-
description: Learn how to use Meta Llama chat models with Azure AI Studio.
4+
description: Learn how to use the Meta Llama family of models with Azure AI Studio.
55
ms.service: azure-ai-studio
66
manager: scottpolly
77
ms.topic: how-to
@@ -14,24 +14,36 @@ ms.custom: references_regions, generated
1414
zone_pivot_groups: azure-ai-model-catalog-samples-chat
1515
---
1616

17-
# How to use Meta Llama chat models
17+
# How to use the Meta Llama family of models
1818

1919
[!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
2020

21-
In this article, you learn about Meta Llama chat models and how to use them.
22-
Meta Llama 2 and 3 models and tools are a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. The model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF).
23-
21+
In this article, you learn about the Meta Llama family of models and how to use them. Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models - ranging in scale from SLMs (1B, 3B Base and Instruct models) for on-device and edge inferencing - to mid-size LLMs (7B, 8B and 70B Base and Instruct models) and high performant models like Meta Llama 3.1 405B Instruct for synthetic data generation and distillation use cases.
2422

23+
> [!TIP]
24+
> See our announcements of Meta's Llama 3.2 family models available now on Azure AI Model Catalog through [Meta's blog](https://aka.ms/llama-3.2-meta-announcement) and [Microsoft Tech Community Blog](https://aka.ms/llama-3.2-microsoft-announcement).
2525
2626
::: zone pivot="programming-language-python"
2727

28-
## Meta Llama chat models
28+
## Meta Llama family of models
2929

30-
The Meta Llama chat models include the following models:
30+
The Meta Llama family of models include the following models:
3131

32-
# [Meta Llama-3.1](#tab/meta-llama-3-1)
32+
# [Llama-3.2](#tab/python-llama-3-2)
3333

34-
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks.
34+
The Llama 3.2 collection of SLMs and image reasoning models are now available. Coming soon, Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct will be available as a serverless API endpoint via Models-as-a-Service. Starting today, the following models will be available for deployment via managed compute:
35+
* Llama 3.2 1B
36+
* Llama 3.2 3B
37+
* Llama 3.2 1B Instruct
38+
* Llama 3.2 3B Instruct
39+
* Llama Guard 3 1B
40+
* Llama Guard 11B Vision
41+
* Llama 3.2 11B Vision Instruct
42+
* Llama 3.2 90B Vision Instruct are available for managed compute deployment.
43+
44+
# [Meta Llama-3.1](#tab/python-meta-llama-3-1)
45+
46+
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed models on common industry benchmarks.
3547

3648

3749
The following models are available:
@@ -41,9 +53,9 @@ The following models are available:
4153
* [Meta-Llama-3.1-8B-Instruct](https://ai.azure.com/explore/models/Meta-Llama-3.1-8B-Instruct/version/1/registry/azureml-meta)
4254

4355

44-
# [Meta Llama-3](#tab/meta-llama-3)
56+
# [Meta Llama-3](#tab/python-meta-llama-3)
4557

46-
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
58+
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
4759

4860

4961
The following models are available:
@@ -52,7 +64,7 @@ The following models are available:
5264
* [Meta-Llama-3-8B-Instruct](https://ai.azure.com/explore/models/Meta-Llama-3-8B-Instruct/version/6/registry/azureml-meta)
5365

5466

55-
# [Meta Llama-2](#tab/meta-llama-2)
67+
# [Meta Llama-2](#tab/python-meta-llama-2)
5668

5769
Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
5870

@@ -68,13 +80,13 @@ The following models are available:
6880

6981
## Prerequisites
7082

71-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
83+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
7284

7385
### A model deployment
7486

7587
**Deployment to serverless APIs**
7688

77-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
89+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
7890

7991
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
8092

@@ -83,7 +95,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
8395
8496
**Deployment to a self-hosted managed compute**
8597

86-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
98+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
8799

88100
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
89101

@@ -111,7 +123,7 @@ Read more about the [Azure AI inference package and reference](https://aka.ms/az
111123
In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat.
112124

113125
> [!TIP]
114-
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models.
126+
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama Instruct models - text-only or image reasoning models.
115127
116128
### Create a client to consume the model
117129

@@ -296,7 +308,7 @@ response = client.complete(
296308
)
297309
```
298310

299-
The following extra parameters can be passed to Meta Llama chat models:
311+
The following extra parameters can be passed to Meta Llama models:
300312

301313
| Name | Description | Type |
302314
| -------------- | --------------------- | --------------- |
@@ -350,9 +362,9 @@ except HttpResponseError as ex:
350362

351363
::: zone pivot="programming-language-javascript"
352364

353-
## Meta Llama chat models
365+
## Meta Llama models
354366

355-
The Meta Llama chat models include the following models:
367+
The Meta Llama models include the following models:
356368

357369
# [Meta Llama-3.1](#tab/meta-llama-3-1)
358370

@@ -393,13 +405,13 @@ The following models are available:
393405

394406
## Prerequisites
395407

396-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
408+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
397409

398410
### A model deployment
399411

400412
**Deployment to serverless APIs**
401413

402-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
414+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
403415

404416
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
405417

@@ -408,7 +420,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
408420
409421
**Deployment to a self-hosted managed compute**
410422

411-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
423+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
412424

413425
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
414426

@@ -434,7 +446,7 @@ npm install @azure-rest/ai-inference
434446
In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat.
435447

436448
> [!TIP]
437-
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models.
449+
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama models.
438450
439451
### Create a client to consume the model
440452

@@ -638,7 +650,7 @@ var response = await client.path("/chat/completions").post({
638650
});
639651
```
640652
641-
The following extra parameters can be passed to Meta Llama chat models:
653+
The following extra parameters can be passed to Meta Llama models:
642654
643655
| Name | Description | Type |
644656
| -------------- | --------------------- | --------------- |
@@ -698,13 +710,13 @@ catch (error) {
698710
699711
::: zone pivot="programming-language-csharp"
700712
701-
## Meta Llama chat models
713+
## Meta Llama models
702714
703-
The Meta Llama chat models include the following models:
715+
The Meta Llama models include the following models:
704716
705717
# [Meta Llama-3.1](#tab/meta-llama-3-1)
706718
707-
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks.
719+
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed models on common industry benchmarks.
708720
709721
710722
The following models are available:
@@ -716,7 +728,7 @@ The following models are available:
716728
717729
# [Meta Llama-3](#tab/meta-llama-3)
718730
719-
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
731+
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
720732
721733
722734
The following models are available:
@@ -741,13 +753,13 @@ The following models are available:
741753
742754
## Prerequisites
743755
744-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
756+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
745757
746758
### A model deployment
747759
748760
**Deployment to serverless APIs**
749761
750-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
762+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
751763
752764
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
753765
@@ -756,7 +768,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
756768
757769
**Deployment to a self-hosted managed compute**
758770
759-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
771+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
760772
761773
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
762774
@@ -998,7 +1010,7 @@ response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThro
9981010
Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
9991011
```
10001012
1001-
The following extra parameters can be passed to Meta Llama chat models:
1013+
The following extra parameters can be passed to Meta Llama models:
10021014
10031015
| Name | Description | Type |
10041016
| -------------- | --------------------- | --------------- |
@@ -1101,7 +1113,7 @@ The following models are available:
11011113

11021114
## Prerequisites
11031115

1104-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
1116+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
11051117

11061118
### A model deployment
11071119

@@ -1116,7 +1128,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
11161128

11171129
**Deployment to a self-hosted managed compute**
11181130

1119-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
1131+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
11201132

11211133
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
11221134
@@ -1476,4 +1488,4 @@ It is a good practice to start with a low number of instances and scale up as ne
14761488
* [Deploy models as serverless APIs](deploy-models-serverless.md)
14771489
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
14781490
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
1479-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
1491+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

0 commit comments

Comments
 (0)