You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/deploy-models-jamba.md
+22-12Lines changed: 22 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,10 @@ description: How to deploy AI21's Jamba-Instruct model with Azure AI Studio
5
5
manager: scottpolly
6
6
ms.service: machine-learning
7
7
ms.topic: how-to
8
-
ms.date: 05/02/2024
8
+
ms.date: 06/19/2024
9
9
ms.author: ssalgado
10
10
ms.reviewer: tgokal
11
-
author: tgokal
11
+
reviewer: tgokal
12
12
ms.custom: references_regions
13
13
---
14
14
@@ -97,14 +97,24 @@ For more information on using the APIs, see the [reference](#reference-for-jamba
97
97
98
98
## Reference for Jamba Instruct deployed a serverless API
99
99
100
-
Since Jamba Instruct is fine-tuned for chat completion, we support the route `/chat/completions` as part of the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) for multi-turn chat or single-turn question-answering. [AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api) can also be used. For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
100
+
Jamba Instruct models accept both of these APIs:
101
+
102
+
- The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` for multi-turn chat or single-turn question-answering. This API is supported because Jamba Instruct is fine-tuned for chat completion.
103
+
-[AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api). For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
101
104
102
105
### Azure AI model inference API
103
106
104
107
The [Azure AI model inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started).
105
108
106
-
Single- and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request. In a multi-turn chat, the message thread includes all messages from the user and the model, ordered oldest to newest, alternating between `user` and `assistant` role messages, optionally starting with a system
107
-
message to provide context. For example, the message stack for the fourth call in a chat request that includes an initial system message would look like this in pseudocode:
109
+
Single-turn and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request.
110
+
111
+
In a multi-turn chat, the message thread has the following attributes:
112
+
113
+
- Includes all messages from the user and the model, ordered from oldest to newest.
114
+
- Messages alternate between `user` and `assistant` role messages
115
+
- Optionally, the message thread starts with a system message to provide context.
116
+
117
+
The following pseudocode is an example of the message stack for the fourth call in a chat request that includes an initial system message.
108
118
109
119
```json
110
120
[
@@ -199,15 +209,15 @@ __Chat example (fourth request containing third user response)__
199
209
200
210
The response depends slightly on whether the result is streamed or not.
201
211
202
-
**In a non-streamed result**, all responses are delivered together in a single response, which also includes a `usage` property.
212
+
In a _non-streamed result_, all responses are delivered together in a single response, which also includes a `usage` property.
203
213
204
-
**In a streamed result:**
214
+
In a _streamed result_,
205
215
206
-
* Each response includes a single token in the `choices` field
207
-
* The `choices` object structure is different
208
-
* Only the last response includes a `usage` object
209
-
* The entire response is wrapped in a `data` object
210
-
* The final response object is `data: [DONE]`
216
+
* Each response includes a single token in the `choices` field.
217
+
* The `choices` object structure is different.
218
+
* Only the last response includes a `usage` object.
219
+
* The entire response is wrapped in a `data` object.
220
+
* The final response object is `data: [DONE]`.
211
221
212
222
The response payload is a dictionary with the following fields.
0 commit comments