MicrosoftDocs
diff --git a/‎articles/machine-learning/reference-model-inference-api.md
Lines changed: 3 additions & 3 deletions b/‎articles/machine-learning/reference-model-inference-api.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/machine-learning/reference-model-inference-chat-completions.md
Lines changed: 19 additions & 9 deletions b/‎articles/machine-learning/reference-model-inference-chat-completions.md
Lines changed: 19 additions & 9 deletions
diff --git a/‎articles/machine-learning/reference-model-inference-completions.md
Lines changed: 8 additions & 10 deletions b/‎articles/machine-learning/reference-model-inference-completions.md
Lines changed: 8 additions & 10 deletions
@@ -75,7 +75,7 @@ The following example shows a request passing the parameter `safe_prompt` suppor
 __Request__
 
 ```HTTP/1.1
-POST /chat/completions?api-version=2024-05-01-preview
+POST /chat/completions?api-version=2024-04-01-preview
 Authorization: Bearer <bearer-token>
 Content-Type: application/json
 extra-parameters: allow
@@ -112,7 +112,7 @@ The following example shows the response for a chat completion request indicatin
 __Request__
 
 ```HTTP/1.1
-POST /chat/completions?api-version=2024-05-01-preview
+POST /chat/completions?api-version=2024-04-01-preview
 Authorization: Bearer <bearer-token>
 Content-Type: application/json
 ```
@@ -198,7 +198,7 @@ wget -d --header="Authorization: Bearer <TOKEN>" <ENDPOINT_URI>/swagger.json
 Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
 
 ```HTTP/1.1
-POST /embeddings?api-version=2024-05-01-preview
+POST /embeddings?api-version=2024-04-01-preview
 Authorization: Bearer <bearer-token>
 Content-Type: application/json
 ```
 
@@ -20,7 +20,7 @@ ms.custom:
 Creates a model response for the given chat conversation.
 
 ```http
-POST /chat/completions?api-version=2024-05-01-preview
+POST /chat/completions?api-version=2024-04-01-preview
 ```
 
 ## URI Parameters
@@ -36,14 +36,13 @@ POST /chat/completions?api-version=2024-05-01-preview
 | messages | True | [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
 | frequency\_penalty |     | number | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
 | max\_tokens |     | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
-| model |     | string | Kept for compatibility reasons. This parameter is ignored. |
 | presence\_penalty |     | number | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
 | response\_format |     | [ChatCompletionResponseFormat](#chatcompletionresponseformat) |     |
 | seed |     | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
 | stop |     |     | Sequences where the API will stop generating further tokens. |
 | stream |     | boolean | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
 | temperature |     | number | Non-negative number. Return 422 if value is unsupported by model. |
-| tool\_choice |     | ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
+| tool\_choice |     | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
 | tools |     | [ChatCompletionTool](#chatcompletiontool)\[\] | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
 | top\_p |     | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
 
@@ -84,7 +83,7 @@ Token URL: https://login.microsoftonline.com/common/oauth2/v2.0/token
 
 
 ```http
-POST /chat/completions?api-version=2024-05-01-preview
+POST /chat/completions?api-version=2024-04-01-preview
 
 {
   "messages": [
@@ -153,14 +152,15 @@ Status code: 200
 | [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | |
 | [ChatCompletionMessageContentPart](#chatcompletionmessagecontentpart) | |
 | [ChatCompletionMessageContentPartType](#chatcompletionmessagecontentparttype)  | |
+| [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
 | [ChatCompletionFinishReason](#chatcompletionfinishreason) | The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
 | [ChatCompletionMessageToolCall](#chatcompletionmessagetoolcall) |     |
 | [ChatCompletionObject](#chatcompletionobject) | The object type, which is always `chat.completion`. |
 | [ChatCompletionResponseFormat](#chatcompletionresponseformat) |     |
 | [ChatCompletionResponseMessage](#chatcompletionresponsemessage) | A chat completion message generated by the model. |
 | [ChatCompletionTool](#chatcompletiontool) |     |
 | [ChatMessageRole](#chatmessagerole) | The role of the author of this message. |
-| [Choices](#choices) | A list of chat completion choices. Can be more than one if `n` is greater than 1. |
+| [Choices](#choices) | A list of chat completion choices. |
 | [CompletionUsage](#completionusage) | Usage statistics for the completion request. |
 | [ContentFilterError](#contentfiltererror) | The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
 | [CreateChatCompletionRequest](#createchatcompletionrequest) |     |
@@ -194,7 +194,7 @@ The reason the model stopped generating tokens. This will be `stop` if the model
 | Name | Type | Description |
 | --- | --- | --- |
 | function | [Function](#function) | The function that the model called. |
-| id  | string | The ID of the tool call. |
+| ID  | string | The ID of the tool call. |
 | type | [ToolType](#tooltype) | The type of the tool. Currently, only `function` is supported. |
 
 ### ChatCompletionObject
@@ -287,14 +287,13 @@ The API call fails when the prompt triggers a content filter as configured. Modi
 | frequency\_penalty | number | 0   | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
 | max\_tokens | integer |     | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
 | messages | ChatCompletionRequestMessage\[\] |     | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
-| model | string |     | Kept for compatibility reasons. This parameter is ignored. |
 | presence\_penalty | number | 0   | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
 | response\_format | [ChatCompletionResponseFormat](#chatcompletionresponseformat) | text |     |
 | seed | integer |     | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
 | stop |     |     | Sequences where the API will stop generating further tokens. |
 | stream | boolean | False | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
 | temperature | number | 1   | Non-negative number. Return 422 if value is unsupported by model. |
-| tool\_choice | ChatCompletionToolChoiceOption |     | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
+| tool\_choice | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) |     | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
 | tools | [ChatCompletionTool](#chatcompletiontool)\[\] |     | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
 | top\_p | number | 1   | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
 
@@ -322,6 +321,17 @@ The API call fails when the prompt triggers a content filter as configured. Modi
 | image | string |  |
 | image_url | string |  |
 
+### ChatCompletionToolChoiceOption
+
+Controls which (if any) tool is called by the model.
+
+| Name | Type | Description |
+| --- | --- | --- |
+| none | string | The model will not call any tool and instead generates a message. |
+| auto | string | The model can pick between generating a message or calling one or more tools. |
+| required | string | The model must call one or more tools. |
+| | string | Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
+
 ### ImageDetail
 
 Specifies the detail level of the image.
@@ -342,7 +352,7 @@ Represents a chat completion response returned by model, based on the provided i
 | --- | --- | --- |
 | choices | [Choices](#choices)\[\] | A list of chat completion choices. Can be more than one if `n` is greater than 1. |
 | created | integer | The Unix timestamp (in seconds) of when the chat completion was created. |
-| id  | string | A unique identifier for the chat completion. |
+| ID  | string | A unique identifier for the chat completion. |
 | model | string | The model used for the chat completion. |
 | object | [ChatCompletionObject](#chatcompletionobject) | The object type, which is always `chat.completion`. |
 | system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used in conjunction with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |
 
@@ -20,7 +20,7 @@ ms.custom:
 Creates a completion for the provided prompt and parameters.
 
 ```http
-POST /completions?api-version=2024-05-01-preview
+POST /completions?api-version=2024-04-01-preview
 ```
 
 | Name | In  | Required | Type | Description |
@@ -36,7 +36,6 @@ POST /completions?api-version=2024-05-01-preview
 | prompt | True |     | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
 | frequency\_penalty |     | number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
 | max\_tokens |     | integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
-| model |     | string | Kept for compatibility reasons. This parameter is ignored. |
 | presence\_penalty |     | number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
 | seed |     | integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
 | stop |     |     | Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
@@ -50,11 +49,11 @@ POST /completions?api-version=2024-05-01-preview
 | Name | Type | Description |
 | --- | --- | --- |
 | 200 OK | [CreateCompletionResponse](#createcompletionresponse) | OK  |
-| 401 Unauthorized |     | Access token is missing or invalid |
-| 404 Not Found |     | Modality not supported by the model. Check the documentation of the model to see which routes are available. |
-| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string |
-| 429 Too Many Requests |     | You have hit your assigned rate limit and your request need to be paced. |
-| Other Status Codes | [ContentFilterError](#contentfiltererror) | Bad request<br><br>Headers<br><br>x-ms-error-code: string |
+| 401 Unauthorized         | [UnauthorizedError](#unauthorizederror)                 | Access token is missing or invalid<br><br>Headers<br><br>x-ms-error-code: string                                                                           |
+| 404 Not Found            | [NotFoundError](#notfounderror)                         | Modality not supported by the model. Check the documentation of the model to see which routes are available.<br><br>Headers<br><br>x-ms-error-code: string |
+| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string                                                                   |
+| 429 Too Many Requests    | [TooManyRequestsError](#toomanyrequestserror)           | You have hit your assigned rate limit and your request need to be paced.<br><br>Headers<br><br>x-ms-error-code: string                                     |
+| Other Status Codes       | [ContentFilterError](#contentfiltererror)               | Bad request<br><br>Headers<br><br>x-ms-error-code: string                                                                                                  |
 
 
 ## Security
@@ -85,7 +84,7 @@ Azure Active Directory OAuth2 authentication
 #### Sample Request
 
 ```http
-POST /completions?api-version=2024-05-01-preview
+POST /completions?api-version=2024-04-01-preview
 
 {
   "prompt": "This is a very good text",
@@ -198,7 +197,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
 | --- | --- | --- | --- |
 | frequency\_penalty | number | 0   | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
 | max\_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
-| model | string |     | Kept for compatibility reasons. This parameter is ignored. |
 | presence\_penalty | number | 0   | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
 | prompt |     | `<\|endoftext\|>` | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
 | seed | integer |     | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
@@ -217,7 +215,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
 | --- | --- | --- |
 | choices | [Choices](#choices)\[\] | The list of completion choices the model generated for the input prompt. |
 | created | integer | The Unix timestamp (in seconds) of when the completion was created. |
-| id  | string | A unique identifier for the completion. |
+| ID  | string | A unique identifier for the completion. |
 | model | string | The model used for completion. |
 | object | [TextCompletionObject](#textcompletionobject) | The object type, which is always "text\_completion" |
 | system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |