You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-deploy-models-jamba.md
+38-27Lines changed: 38 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,42 +1,53 @@
1
1
---
2
-
title: How to deploy Jamba models with Azure Machine Learning studio
2
+
title: How to deploy AI21's Jamba family models with Azure Machine Learning studio
3
3
titleSuffix: Azure Machine Learning studio
4
-
description: How to deploy Jamba models with Azure Machine Learning studio
4
+
description: How to deploy AI21's Jamba family models with Azure Machine Learning studio
5
5
manager: scottpolly
6
6
ms.service: azure-machine-learning
7
7
ms.topic: how-to
8
-
ms.date: 06/19/2024
8
+
ms.date: 09/06/2024
9
9
ms.author: ssalgado
10
10
ms.reviewer: tgokal
11
11
author: ssalgadodev
12
12
ms.custom: references_regions
13
13
---
14
14
15
-
# How to deploy AI21's Jamba-Instruct model with Azure Machine Learning studio
15
+
# How to deploy AI21's Jamba family models with Azure Machine Learning studio
16
16
17
-
In this article, you learn how to use Azure Machine Learning studio to deploy AI21's Jamba-Instruct model as a serverless API with pay-as-you-go billing.
The Jamba Instruct model is AI21's production-grade Mamba-based large language model (LLM) which leverages AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba Instruct model is built for reliable commercial use with respect to quality and performance.
19
+
In this article, you learn how to use Azure Machine Learning studio to deploy AI21's Jamba family models as a serverless API with pay-as-you-go billing.
The Jamba family models are AI21's production-grade Mamba-based large language model (LLM) which leverages AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba family models are built for reliable commercial use with respect to quality and performance.
22
+
23
+
> [!TIP]
24
+
> See our announcements of AI21's Jamba family models available now on Azure AI Model Catalog through [AI21's blog](https://aka.ms/ai21-jamba-1.5-large-announcement) and [Microsoft Tech Community Blog](https://aka.ms/ai21-jamba-1.5-large-microsoft-annnouncement).
22
25
23
26
24
-
## Deploy the Jamba Instruct model as a serverless API
27
+
## Deploy the Jamba family models as a serverless API
25
28
26
-
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
29
+
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription.
27
30
28
-
The AI21-Jamba-Instruct model deployed as a serverless API with pay-as-you-go billing is [offered by AI21 through Microsoft Azure Marketplace](https://aka.ms/azure-marketplace-offer-ai21-jamba-instruct). AI21 can change or update the terms of use and pricing of this model.
To get started with Jamba Instruct deployed as a serverless API, explore our integrations with [LangChain](https://aka.ms/ai21-jamba-instruct-langchain-sample), [LiteLLM](https://aka.ms/ai21-jamba-instruct-litellm-sample), [OpenAI](https://aka.ms/ai21-jamba-instruct-openai-sample) and the [Azure API](https://aka.ms/ai21-jamba-instruct-azure-api-sample).
33
+
The [AI21-Jamba 1.5 Large model](https://aka.ms/aistudio/landing/ai21-labs-jamba-1.5-large) deployed as a serverless API with pay-as-you-go billing is [offered by AI21 through Microsoft Azure Marketplace](https://aka.ms/azure-marketplace-offer-ai21-jamba-1.5-large). AI21 can change or update the terms of use and pricing of this model.
31
34
32
-
> [!TIP]
33
-
> See our announcements of AI21's Jamba-Instruct model available now on Azure AI Model Catalog through [AI21's blog](https://aka.ms/ai21-jamba-instruct-blog) and [Microsoft Tech Community Blog](https://aka.ms/ai21-jamba-instruct-announcement).
35
+
To get started with Jamba 1.5 large deployed as a serverless API, explore our integrations with [LangChain](https://aka.ms/ai21-jamba-1.5-large-langchain-sample), [LiteLLM](https://aka.ms/ai21-jamba-1.5-large-litellm-sample), [OpenAI](https://aka.ms/ai21-jamba-1.5-large-openai-sample) and the [Azure API](https://aka.ms/ai21-jamba-1.5-large-azure-api-sample).
36
+
37
+
38
+
# [AI21 Jamba 1.5 Mini](#tab/ai21-jamba-1-5)
39
+
40
+
The [AI21 Jamba 1.5 Mini model](https://aka.ms/aistudio/landing/ai21-labs-jamba-1.5-mini) deployed as a serverless API with pay-as-you-go billing is [offered by AI21 through Microsoft Azure Marketplace](https://aka.ms/azure-marketplace-offer-ai21-jamba-1.5-mini). AI21 can change or update the terms of use and pricing of this model.
41
+
42
+
To get started with Jamba 1.5 mini deployed as a serverless API, explore our integrations with [LangChain](https://aka.ms/ai21-jamba-1.5-mini-langchain-sample), [LiteLLM](https://aka.ms/ai21-jamba-1.5-mini-litellm-sample), [OpenAI](https://aka.ms/ai21-jamba-1.5-mini-openai-sample) and the [Azure API](https://aka.ms/ai21-jamba-1.5-mini-azure-api-sample).
43
+
44
+
---
34
45
35
46
36
47
### Prerequisites
37
48
38
49
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
39
-
- An Azure Machine Learning workspace and a compute instance. If you don't have these, use the steps in the [Quickstart: Create workspace resources](quickstart-create-resources.md) article to create them. The serverless API model deployment offering for Jamba Instruct is only available with workspaces created in these regions:
50
+
- An Azure Machine Learning workspace and a compute instance. If you don't have these, use the steps in the [Quickstart: Create workspace resources](quickstart-create-resources.md) article to create them. The serverless API model deployment offering for the Jamba family of models is only available with workspaces created in these regions:
40
51
41
52
* East US
42
53
* East US 2
@@ -70,11 +81,11 @@ To get started with Jamba Instruct deployed as a serverless API, explore our int
70
81
71
82
### Create a new deployment
72
83
73
-
These steps demonstrate the deployment of AI21-Jamba-Instruct. To create a deployment:
84
+
These steps demonstrate the deployment of `AI21Jamba 1.5 Large` or `AI21 Jamba 1.5 Mini` models. To create a deployment:
74
85
75
86
1. Go to [Azure Machine Learning studio](https://ml.azure.com/home).
76
-
1. Select the workspace in which you want to deploy your models. To use the Serverless API model deployment offering, your workspace must belong to the **East US 2** or **Sweden Central** region.
77
-
1.Choose the model you want to deploy from the [model catalog](https://ml.azure.com/model/catalog).
87
+
1. Select the workspace in which you want to deploy your models. To use the Serverless API model deployment offering, your workspace must belong to one of the supported regions that are listed in the pre-requisites.
88
+
1.Search for and select an AI21 model like `AI21 Jamba 1.5 Large` or `AI21 Jamba 1.5 Mini` or `AI21 Jamba Instruct` from the [model catalog](https://ml.azure.com/model/catalog).
78
89
79
90
Alternatively, you can initiate deployment by going to your workspace and selecting **Endpoints** > **Serverless endpoints** > **Create**.
80
91
@@ -97,26 +108,26 @@ These steps demonstrate the deployment of AI21-Jamba-Instruct. To create a deplo
97
108
1. You can always find the endpoint's details, URL, and access keys by navigating to **Workspace** > **Endpoints** > **Serverless endpoints**.
98
109
99
110
100
-
To learn about billing for Jamba models deployed as a serverless API, see [Cost and quota considerations for Jamba Instruct deployed as a serverless API](#cost-and-quota-considerations-for-jamba-instruct-deployed-as-a-serverless-api).
111
+
To learn about billing for the AI21-Jamba family models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Jamba family of models deployed as a serverless API](#cost-and-quota-considerations-for-jamba-family-models-deployed-as-a-serverless-api).
101
112
102
-
### Consume Jamba Instruct as a service
113
+
### Consume Jamba family models as a serverless API
103
114
104
-
You can consume Jamba Instruct models as follows:
115
+
You can consume Jamba family models as follows:
105
116
106
117
1. In the **workspace**, select **Endpoints** > **Serverless endpoints**.
107
118
1. Find and select the deployment you created.
108
119
1. Copy the **Target** URL and the **Key** token values.
109
120
1. Make an API request using either the [Azure AI Model Inference API](reference-model-inference-api.md) on the route `/chat/completions` or the [AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api) on `/v1/chat/completions`.
110
121
111
-
For more information on using the APIs, see the [reference](#reference-for-jamba-instruct-deployed-as-a-serverless-api) section.
122
+
For more information on using the APIs, see the [reference](#reference-for-jamba-family-models-deployed-as-a-serverless-api) section.
112
123
113
124
114
125
115
-
## Reference for Jamba Instruct deployed as a serverless API
126
+
## Reference for Jamba family models deployed as a serverless API
116
127
117
-
Jamba Instruct models accept both of these APIs:
128
+
Jamba family models accept both of these APIs:
118
129
119
-
- The [Azure AI model inference API](reference-model-inference-api.md)[Azure AI Model Inference API] on the route `/chat/completions` for multi-turn chat or single-turn question-answering. This API is supported because Jamba Instruct is fine-tuned for chat completion.
130
+
- The [Azure AI model inference API](reference-model-inference-api.md)[Azure AI Model Inference API] on the route `/chat/completions` for multi-turn chat or single-turn question-answering. This API is supported because Jamba family models are fine-tuned for chat completion.
120
131
-[AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api). For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
121
132
122
133
### Azure AI model inference API
@@ -165,7 +176,7 @@ Payload is a JSON formatted string containing the following parameters:
|`model`|`string`| Y |`jamba-instruct`or `AI21 Jamba 1.5 Large` or `AI21 Jamba 1.5 Mini`|
169
180
|`messages`|`list[object]`| Y | A list of objects, one per message, from oldest to newest. The oldest message can be role `system`. All later messages must alternate between user and assistant roles. See the message object definition below. |
170
181
|`max_tokens`|`integer`| N <br>`4096`| 0 – 4096 | The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences") |
171
182
|`temperature`|`float`| N <br>`1`| 0.0 – 2.0 | How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. We recommend altering this or `top_p`, but not both. |
@@ -302,9 +313,9 @@ data: [DONE]
302
313
303
314
## Cost and quotas
304
315
305
-
### Cost and quota considerations for Jamba Instruct deployed as a serverless API
316
+
### Cost and quota considerations for Jamba family models deployed as a serverless API
306
317
307
-
Jamba models deployed as a serverless API are offered by AI21 through Azure Marketplace and integrated with Azure Machine Learning studio for use. You can find Azure Marketplace pricing when deploying or fine-tuning models.
318
+
The Jamba family models are deployed as a serverless API and is offered by AI21 through Azure Marketplace and integrated with Azure AI studio for use. You can find Azure Marketplace pricing when deploying or fine-tuning models.
308
319
309
320
Each time a workspace subscribes to a given model offering from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently.
0 commit comments