Skip to content

Commit 7d65eb0

Browse files
authored
Update deploy-jamba-models.md
1 parent 0ef9615 commit 7d65eb0

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

articles/ai-studio/how-to/deploy-jamba-models.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,9 @@ For more information on using the APIs, see the [reference](#reference-for-jamba
107107

108108
Since Jamba Instruct is fine-tuned for chat completion, we support the route `/chat/completions` as part of the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) for multi-turn chat or single-turn question-answering. AI21's [Jamba Instruct model](https://docs.ai21.com/reference/jamba-instruct-api) can also be used. For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api).
109109

110-
The [Azure AI Model Inference API](reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started).
110+
### Azure AI Model Inference API
111+
112+
The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started).
111113

112114
Single- and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request. In a multi-turn chat, the message thread includes all messages from the user and the model, ordered oldest to newest, alternating between `user` and `assistant` role messages, optionally starting with a system
113115
message to provide context. For example, the message stack for the fourth call in a chat request that includes an initial system message would look like this in pseudocode:

0 commit comments

Comments
 (0)