You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/reference/reference-model-inference-chat-completions.md
+17-7Lines changed: 17 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ ms.custom:
21
21
Creates a model response for the given chat conversation.
22
22
23
23
```http
24
-
POST /chat/completions?api-version=2024-05-01-preview
24
+
POST /chat/completions?api-version=2024-04-01-preview
25
25
```
26
26
27
27
## URI Parameters
@@ -37,14 +37,13 @@ POST /chat/completions?api-version=2024-05-01-preview
37
37
| messages | True |[ChatCompletionRequestMessage](#chatcompletionrequestmessage)| A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
38
38
| frequency\_penalty || number | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
39
39
| max\_tokens || integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
40
-
| model || string | Kept for compatibility reasons. This parameter is ignored. |
41
40
| presence\_penalty || number | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
| seed || integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
44
43
| stop ||| Sequences where the API will stop generating further tokens. |
45
44
| stream || boolean | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
46
45
| temperature || number | Non-negative number. Return 422 if value is unsupported by model. |
47
-
| tool\_choice || ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
46
+
| tool\_choice ||[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
48
47
| tools ||[ChatCompletionTool](#chatcompletiontool)\[\]| A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
49
48
| top\_p || number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
|[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
157
157
|[ChatCompletionFinishReason](#chatcompletionfinishreason)| The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
|[ChatCompletionResponseMessage](#chatcompletionresponsemessage)| A chat completion message generated by the model. |
162
162
|[ChatCompletionTool](#chatcompletiontool)||
163
163
|[ChatMessageRole](#chatmessagerole)| The role of the author of this message. |
164
-
|[Choices](#choices)| A list of chat completion choices. Can be more than one if `n` is greater than 1. |
164
+
|[Choices](#choices)| A list of chat completion choices. |
165
165
|[CompletionUsage](#completionusage)| Usage statistics for the completion request. |
166
166
|[ContentFilterError](#contentfiltererror)| The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
@@ -288,14 +288,13 @@ The API call fails when the prompt triggers a content filter as configured. Modi
288
288
| frequency\_penalty | number | 0 | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
289
289
| max\_tokens | integer || The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
290
290
| messages | ChatCompletionRequestMessage\[\]|| A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
291
-
| model | string || Kept for compatibility reasons. This parameter is ignored. |
292
291
| presence\_penalty | number | 0 | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
293
292
| response\_format |[ChatCompletionResponseFormat](#chatcompletionresponseformat)| text ||
294
293
| seed | integer || If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
295
294
| stop ||| Sequences where the API will stop generating further tokens. |
296
295
| stream | boolean | False | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
297
296
| temperature | number | 1 | Non-negative number. Return 422 if value is unsupported by model. |
298
-
| tool\_choice | ChatCompletionToolChoiceOption || Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
297
+
| tool\_choice |[ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption)|| Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
299
298
| tools |[ChatCompletionTool](#chatcompletiontool)\[\]|| A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
300
299
| top\_p | number | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
301
300
@@ -323,6 +322,17 @@ The API call fails when the prompt triggers a content filter as configured. Modi
323
322
| image | string ||
324
323
| image_url | string ||
325
324
325
+
### ChatCompletionToolChoiceOption
326
+
327
+
Controls which (if any) tool is called by the model.
328
+
329
+
| Name | Type | Description |
330
+
| --- | --- | --- |
331
+
| none | string | The model will not call any tool and instead generates a message. |
332
+
| auto | string | The model can pick between generating a message or calling one or more tools. |
333
+
| required | string | The model must call one or more tools. |
334
+
|| string | Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
Copy file name to clipboardExpand all lines: articles/ai-studio/reference/reference-model-inference-completions.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ ms.custom:
21
21
Creates a completion for the provided prompt and parameters.
22
22
23
23
```http
24
-
POST /completions?api-version=2024-05-01-preview
24
+
POST /completions?api-version=2024-04-01-preview
25
25
```
26
26
27
27
| Name | In | Required | Type | Description |
@@ -37,7 +37,6 @@ POST /completions?api-version=2024-05-01-preview
37
37
| prompt | True || The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
38
38
| frequency\_penalty || number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
39
39
| max\_tokens || integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
40
-
| model || string | Kept for compatibility reasons. This parameter is ignored. |
41
40
| presence\_penalty || number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
42
41
| seed || integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
43
42
| stop ||| Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
@@ -51,11 +50,11 @@ POST /completions?api-version=2024-05-01-preview
51
50
| Name | Type | Description |
52
51
| --- | --- | --- |
53
52
| 200 OK |[CreateCompletionResponse](#createcompletionresponse)| OK |
54
-
| 401 Unauthorized || Access token is missing or invalid |
55
-
| 404 Not Found || Modality not supported by the model. Check the documentation of the model to see which routes are available. |
| 429 Too Many Requests || You have hit your assigned rate limit and your request need to be paced. |
58
-
| Other Status Codes |[ContentFilterError](#contentfiltererror)| Bad request<br><br>Headers<br><br>x-ms-error-code: string |
53
+
| 401 Unauthorized |[UnauthorizedError](#unauthorizederror)| Access token is missing or invalid<br><br>Headers<br><br>x-ms-error-code: string |
54
+
| 404 Not Found |[NotFoundError](#notfounderror)| Modality not supported by the model. Check the documentation of the model to see which routes are available.<br><br>Headers<br><br>x-ms-error-code: string|
| 429 Too Many Requests |[TooManyRequestsError](#toomanyrequestserror)| You have hit your assigned rate limit and your request need to be paced.<br><br>Headers<br><br>x-ms-error-code: string |
57
+
| Other Status Codes |[ContentFilterError](#contentfiltererror)| Bad request<br><br>Headers<br><br>x-ms-error-code: string|
59
58
60
59
61
60
## Security
@@ -86,7 +85,7 @@ Azure Active Directory OAuth2 authentication
86
85
#### Sample Request
87
86
88
87
```http
89
-
POST /completions?api-version=2024-05-01-preview
88
+
POST /completions?api-version=2024-04-01-preview
90
89
91
90
{
92
91
"prompt": "This is a very good text",
@@ -199,7 +198,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
199
198
| --- | --- | --- | --- |
200
199
| frequency\_penalty | number | 0 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
201
200
| max\_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
202
-
| model | string || Kept for compatibility reasons. This parameter is ignored. |
203
201
| presence\_penalty | number | 0 | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
204
202
| prompt ||`<\|endoftext\|>`| The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
205
203
| seed | integer || If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
0 commit comments