Skip to content

Commit d06d9a7

Browse files
committed
edits to reference section
1 parent d7435ef commit d06d9a7

File tree

1 file changed

+22
-12
lines changed

1 file changed

+22
-12
lines changed

articles/ai-studio/how-to/deploy-models-jamba.md

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ description: How to deploy AI21's Jamba-Instruct model with Azure AI Studio
55
manager: scottpolly
66
ms.service: machine-learning
77
ms.topic: how-to
8-
ms.date: 05/02/2024
8+
ms.date: 06/19/2024
99
ms.author: ssalgado
1010
ms.reviewer: tgokal
11-
author: tgokal
11+
reviewer: tgokal
1212
ms.custom: references_regions
1313
---
1414

@@ -97,14 +97,24 @@ For more information on using the APIs, see the [reference](#reference-for-jamba
9797

9898
## Reference for Jamba Instruct deployed a serverless API
9999

100-
Since Jamba Instruct is fine-tuned for chat completion, we support the route `/chat/completions` as part of the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) for multi-turn chat or single-turn question-answering. [AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api) can also be used. For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
100+
Jamba Instruct models accept both of these APIs:
101+
102+
- The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` for multi-turn chat or single-turn question-answering. This API is supported because Jamba Instruct is fine-tuned for chat completion.
103+
- [AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api). For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
101104

102105
### Azure AI model inference API
103106

104107
The [Azure AI model inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started).
105108

106-
Single- and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request. In a multi-turn chat, the message thread includes all messages from the user and the model, ordered oldest to newest, alternating between `user` and `assistant` role messages, optionally starting with a system
107-
message to provide context. For example, the message stack for the fourth call in a chat request that includes an initial system message would look like this in pseudocode:
109+
Single-turn and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request.
110+
111+
In a multi-turn chat, the message thread has the following attributes:
112+
113+
- Includes all messages from the user and the model, ordered from oldest to newest.
114+
- Messages alternate between `user` and `assistant` role messages
115+
- Optionally, the message thread starts with a system message to provide context.
116+
117+
The following pseudocode is an example of the message stack for the fourth call in a chat request that includes an initial system message.
108118

109119
```json
110120
[
@@ -199,15 +209,15 @@ __Chat example (fourth request containing third user response)__
199209

200210
The response depends slightly on whether the result is streamed or not.
201211

202-
**In a non-streamed result**, all responses are delivered together in a single response, which also includes a `usage` property.
212+
In a _non-streamed result_, all responses are delivered together in a single response, which also includes a `usage` property.
203213

204-
**In a streamed result:**
214+
In a _streamed result_,
205215

206-
* Each response includes a single token in the `choices` field
207-
* The `choices` object structure is different
208-
* Only the last response includes a `usage` object
209-
* The entire response is wrapped in a `data` object
210-
* The final response object is `data: [DONE]`
216+
* Each response includes a single token in the `choices` field.
217+
* The `choices` object structure is different.
218+
* Only the last response includes a `usage` object.
219+
* The entire response is wrapped in a `data` object.
220+
* The final response object is `data: [DONE]`.
211221

212222
The response payload is a dictionary with the following fields.
213223

0 commit comments

Comments
 (0)