You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/reference-model-inference-chat-completions.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ ms.custom:
20
20
Creates a model response for the given chat conversation.
21
21
22
22
```http
23
-
POST /chat/completions?api-version=2024-05-01-preview
23
+
POST /chat/completions?api-version=2024-04-01-preview
24
24
```
25
25
26
26
## URI Parameters
@@ -36,14 +36,13 @@ POST /chat/completions?api-version=2024-05-01-preview
36
36
| messages | True |[ChatCompletionRequestMessage](#chatcompletionrequestmessage)| A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
37
37
| frequency\_penalty || number | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
38
38
| max\_tokens || integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
39
-
| model || string | Kept for compatibility reasons. This parameter is ignored. |
40
39
| presence\_penalty || number | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
| seed || integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
43
42
| stop ||| Sequences where the API will stop generating further tokens. |
44
43
| stream || boolean | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
45
44
| temperature || number | Non-negative number. Return 422 if value is unsupported by model. |
46
-
| tool\_choice || ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
45
+
| tool\_choice ||[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
47
46
| tools ||[ChatCompletionTool](#chatcompletiontool)\[\]| A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
48
47
| top\_p || number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
|[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
156
156
|[ChatCompletionFinishReason](#chatcompletionfinishreason)| The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
|[ChatCompletionResponseMessage](#chatcompletionresponsemessage)| A chat completion message generated by the model. |
161
161
|[ChatCompletionTool](#chatcompletiontool)||
162
162
|[ChatMessageRole](#chatmessagerole)| The role of the author of this message. |
163
-
|[Choices](#choices)| A list of chat completion choices. Can be more than one if `n` is greater than 1. |
163
+
|[Choices](#choices)| A list of chat completion choices. |
164
164
|[CompletionUsage](#completionusage)| Usage statistics for the completion request. |
165
165
|[ContentFilterError](#contentfiltererror)| The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
@@ -194,7 +194,7 @@ The reason the model stopped generating tokens. This will be `stop` if the model
194
194
| Name | Type | Description |
195
195
| --- | --- | --- |
196
196
| function |[Function](#function)| The function that the model called. |
197
-
|id| string | The ID of the tool call. |
197
+
|ID| string | The ID of the tool call. |
198
198
| type |[ToolType](#tooltype)| The type of the tool. Currently, only `function` is supported. |
199
199
200
200
### ChatCompletionObject
@@ -287,14 +287,13 @@ The API call fails when the prompt triggers a content filter as configured. Modi
287
287
| frequency\_penalty | number | 0 | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
288
288
| max\_tokens | integer || The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
289
289
| messages | ChatCompletionRequestMessage\[\]|| A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
290
-
| model | string || Kept for compatibility reasons. This parameter is ignored. |
291
290
| presence\_penalty | number | 0 | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
292
291
| response\_format |[ChatCompletionResponseFormat](#chatcompletionresponseformat)| text ||
293
292
| seed | integer || If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
294
293
| stop ||| Sequences where the API will stop generating further tokens. |
295
294
| stream | boolean | False | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
296
295
| temperature | number | 1 | Non-negative number. Return 422 if value is unsupported by model. |
297
-
| tool\_choice | ChatCompletionToolChoiceOption || Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
296
+
| tool\_choice |[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)|| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
298
297
| tools |[ChatCompletionTool](#chatcompletiontool)\[\]|| A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
299
298
| top\_p | number | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
300
299
@@ -322,6 +321,17 @@ The API call fails when the prompt triggers a content filter as configured. Modi
322
321
| image | string ||
323
322
| image_url | string ||
324
323
324
+
### ChatCompletionToolChoiceOption
325
+
326
+
Controls which (if any) tool is called by the model.
327
+
328
+
| Name | Type | Description |
329
+
| --- | --- | --- |
330
+
| none | string | The model will not call any tool and instead generates a message. |
331
+
| auto | string | The model can pick between generating a message or calling one or more tools. |
332
+
| required | string | The model must call one or more tools. |
333
+
|| string | Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
334
+
325
335
### ImageDetail
326
336
327
337
Specifies the detail level of the image.
@@ -342,7 +352,7 @@ Represents a chat completion response returned by model, based on the provided i
342
352
| --- | --- | --- |
343
353
| choices |[Choices](#choices)\[\]| A list of chat completion choices. Can be more than one if `n` is greater than 1. |
344
354
| created | integer | The Unix timestamp (in seconds) of when the chat completion was created. |
345
-
|id| string | A unique identifier for the chat completion. |
355
+
|ID| string | A unique identifier for the chat completion. |
346
356
| model | string | The model used for the chat completion. |
347
357
| object |[ChatCompletionObject](#chatcompletionobject)| The object type, which is always `chat.completion`. |
348
358
| system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used in conjunction with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |
Copy file name to clipboardExpand all lines: articles/machine-learning/reference-model-inference-completions.md
+8-10Lines changed: 8 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ ms.custom:
20
20
Creates a completion for the provided prompt and parameters.
21
21
22
22
```http
23
-
POST /completions?api-version=2024-05-01-preview
23
+
POST /completions?api-version=2024-04-01-preview
24
24
```
25
25
26
26
| Name | In | Required | Type | Description |
@@ -36,7 +36,6 @@ POST /completions?api-version=2024-05-01-preview
36
36
| prompt | True || The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
37
37
| frequency\_penalty || number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
38
38
| max\_tokens || integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
39
-
| model || string | Kept for compatibility reasons. This parameter is ignored. |
40
39
| presence\_penalty || number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
41
40
| seed || integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
42
41
| stop ||| Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
@@ -50,11 +49,11 @@ POST /completions?api-version=2024-05-01-preview
50
49
| Name | Type | Description |
51
50
| --- | --- | --- |
52
51
| 200 OK |[CreateCompletionResponse](#createcompletionresponse)| OK |
53
-
| 401 Unauthorized || Access token is missing or invalid |
54
-
| 404 Not Found || Modality not supported by the model. Check the documentation of the model to see which routes are available. |
| 429 Too Many Requests || You have hit your assigned rate limit and your request need to be paced. |
57
-
| Other Status Codes |[ContentFilterError](#contentfiltererror)| Bad request<br><br>Headers<br><br>x-ms-error-code: string |
52
+
| 401 Unauthorized |[UnauthorizedError](#unauthorizederror)| Access token is missing or invalid<br><br>Headers<br><br>x-ms-error-code: string |
53
+
| 404 Not Found |[NotFoundError](#notfounderror)| Modality not supported by the model. Check the documentation of the model to see which routes are available.<br><br>Headers<br><br>x-ms-error-code: string|
| 429 Too Many Requests |[TooManyRequestsError](#toomanyrequestserror)| You have hit your assigned rate limit and your request need to be paced.<br><br>Headers<br><br>x-ms-error-code: string |
56
+
| Other Status Codes |[ContentFilterError](#contentfiltererror)| Bad request<br><br>Headers<br><br>x-ms-error-code: string|
58
57
59
58
60
59
## Security
@@ -85,7 +84,7 @@ Azure Active Directory OAuth2 authentication
85
84
#### Sample Request
86
85
87
86
```http
88
-
POST /completions?api-version=2024-05-01-preview
87
+
POST /completions?api-version=2024-04-01-preview
89
88
90
89
{
91
90
"prompt": "This is a very good text",
@@ -198,7 +197,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
198
197
| --- | --- | --- | --- |
199
198
| frequency\_penalty | number | 0 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
200
199
| max\_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
201
-
| model | string || Kept for compatibility reasons. This parameter is ignored. |
202
200
| presence\_penalty | number | 0 | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
203
201
| prompt ||`<\|endoftext\|>`| The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
204
202
| seed | integer || If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
@@ -217,7 +215,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
217
215
| --- | --- | --- |
218
216
| choices |[Choices](#choices)\[\]| The list of completion choices the model generated for the input prompt. |
219
217
| created | integer | The Unix timestamp (in seconds) of when the completion was created. |
220
-
|id| string | A unique identifier for the completion. |
218
+
|ID| string | A unique identifier for the completion. |
221
219
| model | string | The model used for completion. |
222
220
| object |[TextCompletionObject](#textcompletionobject)| The object type, which is always "text\_completion" |
223
221
| system\_fingerprint | string | This fingerprint represents the backend configuration that the model runs with.<br><br>Can be used with the `seed` request parameter to understand when backend changes have been made that might impact determinism. |
0 commit comments