You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, you learn about Meta Llama chat models and how to use them.
22
-
Meta Llama 2 and 3 models and tools are a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. The model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF).
23
-
21
+
In this article, you learn about the Meta Llama family of models and how to use them. Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models - ranging in scale from SLMs (1B, 3B Base and Instruct models) for on-device and edge inferencing - to mid-size LLMs (7B, 8B and 70B Base and Instruct models) and high performant models like Meta Llama 3.1 405B Instruct for synthetic data generation and distillation use cases.
24
22
23
+
> [!TIP]
24
+
> See our announcements of Meta's Llama 3.2 family models available now on Azure AI Model Catalog through [Meta's blog](https://aka.ms/llama-3.2-meta-announcement) and [Microsoft Tech Community Blog](https://aka.ms/llama-3.2-microsoft-announcement).
25
25
26
26
::: zone pivot="programming-language-python"
27
27
28
-
## Meta Llama chat models
28
+
## Meta Llama family of models
29
29
30
-
The Meta Llama chat models include the following models:
30
+
The Meta Llama family of models include the following models:
31
31
32
-
# [Meta Llama-3.1](#tab/meta-llama-3-1)
32
+
# [Llama-3.2](#tab/python-llama-3-2)
33
33
34
-
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks.
34
+
The Llama 3.2 collection of SLMs and image reasoning models are now available. Coming soon, Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct will be available as a serverless API endpoint via Models-as-a-Service. Starting today, the following models will be available for deployment via managed compute:
35
+
* Llama 3.2 1B
36
+
* Llama 3.2 3B
37
+
* Llama 3.2 1B Instruct
38
+
* Llama 3.2 3B Instruct
39
+
* Llama Guard 3 1B
40
+
* Llama Guard 11B Vision
41
+
* Llama 3.2 11B Vision Instruct
42
+
* Llama 3.2 90B Vision Instruct are available for managed compute deployment.
43
+
44
+
# [Meta Llama-3.1](#tab/python-meta-llama-3-1)
45
+
46
+
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed models on common industry benchmarks.
35
47
36
48
37
49
The following models are available:
@@ -41,9 +53,9 @@ The following models are available:
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
58
+
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
47
59
48
60
49
61
The following models are available:
@@ -52,7 +64,7 @@ The following models are available:
Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
58
70
@@ -68,13 +80,13 @@ The following models are available:
68
80
69
81
## Prerequisites
70
82
71
-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
83
+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
72
84
73
85
### A model deployment
74
86
75
87
**Deployment to serverless APIs**
76
88
77
-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
89
+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
78
90
79
91
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
80
92
@@ -83,7 +95,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
83
95
84
96
**Deployment to a self-hosted managed compute**
85
97
86
-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
98
+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
87
99
88
100
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
89
101
@@ -111,7 +123,7 @@ Read more about the [Azure AI inference package and reference](https://aka.ms/az
111
123
In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat.
112
124
113
125
> [!TIP]
114
-
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models.
126
+
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama Instruct models - text-only or image reasoning models.
115
127
116
128
### Create a client to consume the model
117
129
@@ -296,7 +308,7 @@ response = client.complete(
296
308
)
297
309
```
298
310
299
-
The following extra parameters can be passed to Meta Llama chat models:
311
+
The following extra parameters can be passed to Meta Llama models:
@@ -350,9 +362,9 @@ except HttpResponseError as ex:
350
362
351
363
::: zone pivot="programming-language-javascript"
352
364
353
-
## Meta Llama chat models
365
+
## Meta Llama models
354
366
355
-
The Meta Llama chat models include the following models:
367
+
The Meta Llama models include the following models:
356
368
357
369
# [Meta Llama-3.1](#tab/meta-llama-3-1)
358
370
@@ -393,13 +405,13 @@ The following models are available:
393
405
394
406
## Prerequisites
395
407
396
-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
408
+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
397
409
398
410
### A model deployment
399
411
400
412
**Deployment to serverless APIs**
401
413
402
-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
414
+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
403
415
404
416
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
405
417
@@ -408,7 +420,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
408
420
409
421
**Deployment to a self-hosted managed compute**
410
422
411
-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
423
+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
412
424
413
425
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat.
435
447
436
448
> [!TIP]
437
-
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models.
449
+
> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama models.
438
450
439
451
### Create a client to consume the model
440
452
@@ -638,7 +650,7 @@ var response = await client.path("/chat/completions").post({
638
650
});
639
651
```
640
652
641
-
The following extra parameters can be passed to Meta Llama chat models:
653
+
The following extra parameters can be passed to Meta Llama models:
The Meta Llama chat models include the following models:
715
+
The Meta Llama models include the following models:
704
716
705
717
# [Meta Llama-3.1](#tab/meta-llama-3-1)
706
718
707
-
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks.
719
+
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed models on common industry benchmarks.
708
720
709
721
710
722
The following models are available:
@@ -716,7 +728,7 @@ The following models are available:
716
728
717
729
# [Meta Llama-3](#tab/meta-llama-3)
718
730
719
-
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
731
+
Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.
720
732
721
733
722
734
The following models are available:
@@ -741,13 +753,13 @@ The following models are available:
741
753
742
754
## Prerequisites
743
755
744
-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
756
+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
745
757
746
758
### A model deployment
747
759
748
760
**Deployment to serverless APIs**
749
761
750
-
Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
762
+
Meta Llama models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
751
763
752
764
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
753
765
@@ -756,7 +768,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
756
768
757
769
**Deployment to a self-hosted managed compute**
758
770
759
-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
771
+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
760
772
761
773
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
@@ -1101,7 +1113,7 @@ The following models are available:
1101
1113
1102
1114
## Prerequisites
1103
1115
1104
-
To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites:
1116
+
To use Meta Llama models with Azure AI Studio, you need the following prerequisites:
1105
1117
1106
1118
### A model deployment
1107
1119
@@ -1116,7 +1128,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
1116
1128
1117
1129
**Deployment to a self-hosted managed compute**
1118
1130
1119
-
Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
1131
+
Meta Llama models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
1120
1132
1121
1133
For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option **I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours.**
1122
1134
@@ -1476,4 +1488,4 @@ It is a good practice to start with a low number of instances and scale up as ne
1476
1488
* [Deploy models as serverless APIs](deploy-models-serverless.md)
1477
1489
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
1478
1490
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
1479
-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
1491
+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
0 commit comments