Skip to content

Commit e4ef9ae

Browse files
Merge pull request #276398 from santiagxf/santiagxf/release-build-2024-azureml
Release update //build Azure ML
2 parents 272ac6f + d1d7022 commit e4ef9ae

6 files changed

+61
-56
lines changed

articles/machine-learning/reference-model-inference-api.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The following example shows a request passing the parameter `safe_prompt` suppor
7575
__Request__
7676

7777
```HTTP/1.1
78-
POST /chat/completions?api-version=2024-05-01-preview
78+
POST /chat/completions?api-version=2024-04-01-preview
7979
Authorization: Bearer <bearer-token>
8080
Content-Type: application/json
8181
extra-parameters: allow
@@ -112,7 +112,7 @@ The following example shows the response for a chat completion request indicatin
112112
__Request__
113113

114114
```HTTP/1.1
115-
POST /chat/completions?api-version=2024-05-01-preview
115+
POST /chat/completions?api-version=2024-04-01-preview
116116
Authorization: Bearer <bearer-token>
117117
Content-Type: application/json
118118
```
@@ -198,7 +198,7 @@ wget -d --header="Authorization: Bearer <TOKEN>" <ENDPOINT_URI>/swagger.json
198198
Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
199199

200200
```HTTP/1.1
201-
POST /embeddings?api-version=2024-05-01-preview
201+
POST /embeddings?api-version=2024-04-01-preview
202202
Authorization: Bearer <bearer-token>
203203
Content-Type: application/json
204204
```

articles/machine-learning/reference-model-inference-chat-completions.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ ms.custom:
2020
Creates a model response for the given chat conversation.
2121

2222
```http
23-
POST /chat/completions?api-version=2024-05-01-preview
23+
POST /chat/completions?api-version=2024-04-01-preview
2424
```
2525

2626
## URI Parameters
@@ -36,14 +36,13 @@ POST /chat/completions?api-version=2024-05-01-preview
3636
| messages | True | [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
3737
| frequency\_penalty | | number | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
3838
| max\_tokens | | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
39-
| model | | string | Kept for compatibility reasons. This parameter is ignored. |
4039
| presence\_penalty | | number | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
4140
| response\_format | | [ChatCompletionResponseFormat](#chatcompletionresponseformat) | |
4241
| seed | | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
4342
| stop | | | Sequences where the API will stop generating further tokens. |
4443
| stream | | boolean | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
4544
| temperature | | number | Non-negative number. Return 422 if value is unsupported by model. |
46-
| tool\_choice | | ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
45+
| tool\_choice | | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
4746
| tools | | [ChatCompletionTool](#chatcompletiontool)\[\] | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
4847
| top\_p | | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
4948

@@ -84,7 +83,7 @@ Token URL: https://login.microsoftonline.com/common/oauth2/v2.0/token
8483

8584

8685
```http
87-
POST /chat/completions?api-version=2024-05-01-preview
86+
POST /chat/completions?api-version=2024-04-01-preview
8887
8988
{
9089
"messages": [
@@ -153,14 +152,15 @@ Status code: 200
153152
| [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | |
154153
| [ChatCompletionMessageContentPart](#chatcompletionmessagecontentpart) | |
155154
| [ChatCompletionMessageContentPartType](#chatcompletionmessagecontentparttype) | |
155+
| [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
156156
| [ChatCompletionFinishReason](#chatcompletionfinishreason) | The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
157157
| [ChatCompletionMessageToolCall](#chatcompletionmessagetoolcall) | |
158158
| [ChatCompletionObject](#chatcompletionobject) | The object type, which is always `chat.completion`. |
159159
| [ChatCompletionResponseFormat](#chatcompletionresponseformat) | |
160160
| [ChatCompletionResponseMessage](#chatcompletionresponsemessage) | A chat completion message generated by the model. |
161161
| [ChatCompletionTool](#chatcompletiontool) | |
162162
| [ChatMessageRole](#chatmessagerole) | The role of the author of this message. |
163-
| [Choices](#choices) | A list of chat completion choices. Can be more than one if `n` is greater than 1. |
163+
| [Choices](#choices) | A list of chat completion choices. |
164164
| [CompletionUsage](#completionusage) | Usage statistics for the completion request. |
165165
| [ContentFilterError](#contentfiltererror) | The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
166166
| [CreateChatCompletionRequest](#createchatcompletionrequest) | |
@@ -194,7 +194,7 @@ The reason the model stopped generating tokens. This will be `stop` if the model
194194
| Name | Type | Description |
195195
| --- | --- | --- |
196196
| function | [Function](#function) | The function that the model called. |
197-
| id | string | The ID of the tool call. |
197+
| ID | string | The ID of the tool call. |
198198
| type | [ToolType](#tooltype) | The type of the tool. Currently, only `function` is supported. |
199199

200200
### ChatCompletionObject
@@ -287,14 +287,13 @@ The API call fails when the prompt triggers a content filter as configured. Modi
287287
| frequency\_penalty | number | 0 | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
288288
| max\_tokens | integer | | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
289289
| messages | ChatCompletionRequestMessage\[\] | | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
290-
| model | string | | Kept for compatibility reasons. This parameter is ignored. |
291290
| presence\_penalty | number | 0 | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
292291
| response\_format | [ChatCompletionResponseFormat](#chatcompletionresponseformat) | text | |
293292
| seed | integer | | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
294293
| stop | | | Sequences where the API will stop generating further tokens. |
295294
| stream | boolean | False | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
296295
| temperature | number | 1 | Non-negative number. Return 422 if value is unsupported by model. |
297-
| tool\_choice | ChatCompletionToolChoiceOption | | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
296+
| tool\_choice | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
298297
| tools | [ChatCompletionTool](#chatcompletiontool)\[\] | | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
299298
| top\_p | number | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
300299

@@ -322,6 +321,17 @@ The API call fails when the prompt triggers a content filter as configured. Modi
322321
| image | string | |
323322
| image_url | string | |
324323

324+
### ChatCompletionToolChoiceOption
325+
326+
Controls which (if any) tool is called by the model.
327+
328+
| Name | Type | Description |
329+
| --- | --- | --- |
330+
| none | string | The model will not call any tool and instead generates a message. |
331+
| auto | string | The model can pick between generating a message or calling one or more tools. |
332+
| required | string | The model must call one or more tools. |
333+
| | string | Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
334+
325335
### ImageDetail
326336

327337
Specifies the detail level of the image.
@@ -342,7 +352,7 @@ Represents a chat completion response returned by model, based on the provided i
342352
| --- | --- | --- |
343353
| choices | [Choices](#choices)\[\] | A list of chat completion choices. Can be more than one if `n` is greater than 1. |
344354
| created | integer | The Unix timestamp (in seconds) of when the chat completion was created. |
345-
| id | string | A unique identifier for the chat completion. |
355+
| ID | string | A unique identifier for the chat completion. |
346356
| model | string | The model used for the chat completion. |
347357
| object | [ChatCompletionObject](#chatcompletionobject) | The object type, which is always `chat.completion`. |
348358
| system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used in conjunction with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |

articles/machine-learning/reference-model-inference-completions.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ ms.custom:
2020
Creates a completion for the provided prompt and parameters.
2121

2222
```http
23-
POST /completions?api-version=2024-05-01-preview
23+
POST /completions?api-version=2024-04-01-preview
2424
```
2525

2626
| Name | In | Required | Type | Description |
@@ -36,7 +36,6 @@ POST /completions?api-version=2024-05-01-preview
3636
| prompt | True | | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
3737
| frequency\_penalty | | number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
3838
| max\_tokens | | integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
39-
| model | | string | Kept for compatibility reasons. This parameter is ignored. |
4039
| presence\_penalty | | number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
4140
| seed | | integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
4241
| stop | | | Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
@@ -50,11 +49,11 @@ POST /completions?api-version=2024-05-01-preview
5049
| Name | Type | Description |
5150
| --- | --- | --- |
5251
| 200 OK | [CreateCompletionResponse](#createcompletionresponse) | OK |
53-
| 401 Unauthorized | | Access token is missing or invalid |
54-
| 404 Not Found | | Modality not supported by the model. Check the documentation of the model to see which routes are available. |
55-
| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string |
56-
| 429 Too Many Requests | | You have hit your assigned rate limit and your request need to be paced. |
57-
| Other Status Codes | [ContentFilterError](#contentfiltererror) | Bad request<br><br>Headers<br><br>x-ms-error-code: string |
52+
| 401 Unauthorized | [UnauthorizedError](#unauthorizederror) | Access token is missing or invalid<br><br>Headers<br><br>x-ms-error-code: string |
53+
| 404 Not Found | [NotFoundError](#notfounderror) | Modality not supported by the model. Check the documentation of the model to see which routes are available.<br><br>Headers<br><br>x-ms-error-code: string |
54+
| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string |
55+
| 429 Too Many Requests | [TooManyRequestsError](#toomanyrequestserror) | You have hit your assigned rate limit and your request need to be paced.<br><br>Headers<br><br>x-ms-error-code: string |
56+
| Other Status Codes | [ContentFilterError](#contentfiltererror) | Bad request<br><br>Headers<br><br>x-ms-error-code: string |
5857

5958

6059
## Security
@@ -85,7 +84,7 @@ Azure Active Directory OAuth2 authentication
8584
#### Sample Request
8685

8786
```http
88-
POST /completions?api-version=2024-05-01-preview
87+
POST /completions?api-version=2024-04-01-preview
8988
9089
{
9190
"prompt": "This is a very good text",
@@ -198,7 +197,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
198197
| --- | --- | --- | --- |
199198
| frequency\_penalty | number | 0 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
200199
| max\_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
201-
| model | string | | Kept for compatibility reasons. This parameter is ignored. |
202200
| presence\_penalty | number | 0 | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
203201
| prompt | | `<\|endoftext\|>` | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
204202
| seed | integer | | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
@@ -217,7 +215,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
217215
| --- | --- | --- |
218216
| choices | [Choices](#choices)\[\] | The list of completion choices the model generated for the input prompt. |
219217
| created | integer | The Unix timestamp (in seconds) of when the completion was created. |
220-
| id | string | A unique identifier for the completion. |
218+
| ID | string | A unique identifier for the completion. |
221219
| model | string | The model used for completion. |
222220
| object | [TextCompletionObject](#textcompletionobject) | The object type, which is always "text\_completion" |
223221
| system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |

0 commit comments

Comments
 (0)