Skip to content

Commit f2164c6

Browse files
authored
Merge pull request #276431 from MicrosoftDocs/main
5/27/2024 AM Publish
2 parents 74706ea + 818e26b commit f2164c6

26 files changed

+400
-315
lines changed

articles/ai-studio/includes/region-availabilitity-serverless-api.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,11 @@ Availability of serverless API endpoints for select models are listed in the fol
6060
| South Central US | **✓** |
6161
| West US | **✓** |
6262
| West US 3 | **✓** |
63-
| France Central | unavailable |
6463
| Sweden Central | **✓** |
6564

6665
#### Phi 3 models
6766

68-
| Region | Phi 3 |
69-
|----------------|:-----------:|
70-
| East US 2 | **✓** |
71-
| Sweden Central | **✓** |
67+
| Region | Phi-3-mini | Phi-3-medium |
68+
|----------------|:----------------:|:----------------:|
69+
| East US 2 | **✓** | **✓** |
70+
| Sweden Central | **✓** | **✓** |

articles/ai-studio/reference/reference-model-inference-api.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ The following example shows a request passing the parameter `safe_prompt` suppor
7777
__Request__
7878

7979
```HTTP/1.1
80-
POST /chat/completions?api-version=2024-05-01-preview
80+
POST /chat/completions?api-version=2024-04-01-preview
8181
Authorization: Bearer <bearer-token>
8282
Content-Type: application/json
8383
extra-parameters: allow
@@ -114,7 +114,7 @@ The following example shows the response for a chat completion request indicatin
114114
__Request__
115115

116116
```HTTP/1.1
117-
POST /chat/completions?api-version=2024-05-01-preview
117+
POST /chat/completions?api-version=2024-04-01-preview
118118
Authorization: Bearer <bearer-token>
119119
Content-Type: application/json
120120
```
@@ -199,7 +199,7 @@ wget -d --header="Authorization: Bearer <TOKEN>" <ENDPOINT_URI>/swagger.json
199199
Use the **Endpoint URI** and the **Key** to submit requests. The following example sends a request to a Cohere embedding model:
200200

201201
```HTTP/1.1
202-
POST /embeddings?api-version=2024-05-01-preview
202+
POST /embeddings?api-version=2024-04-01-preview
203203
Authorization: Bearer <bearer-token>
204204
Content-Type: application/json
205205
```

articles/ai-studio/reference/reference-model-inference-chat-completions.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ ms.custom:
2121
Creates a model response for the given chat conversation.
2222

2323
```http
24-
POST /chat/completions?api-version=2024-05-01-preview
24+
POST /chat/completions?api-version=2024-04-01-preview
2525
```
2626

2727
## URI Parameters
@@ -37,14 +37,13 @@ POST /chat/completions?api-version=2024-05-01-preview
3737
| messages | True | [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
3838
| frequency\_penalty | | number | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
3939
| max\_tokens | | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
40-
| model | | string | Kept for compatibility reasons. This parameter is ignored. |
4140
| presence\_penalty | | number | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
4241
| response\_format | | [ChatCompletionResponseFormat](#chatcompletionresponseformat) | |
4342
| seed | | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
4443
| stop | | | Sequences where the API will stop generating further tokens. |
4544
| stream | | boolean | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
4645
| temperature | | number | Non-negative number. Return 422 if value is unsupported by model. |
47-
| tool\_choice | | ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
46+
| tool\_choice | | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
4847
| tools | | [ChatCompletionTool](#chatcompletiontool)\[\] | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
4948
| top\_p | | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
5049

@@ -85,7 +84,7 @@ Token URL: https://login.microsoftonline.com/common/oauth2/v2.0/token
8584

8685

8786
```http
88-
POST /chat/completions?api-version=2024-05-01-preview
87+
POST /chat/completions?api-version=2024-04-01-preview
8988
9089
{
9190
"messages": [
@@ -154,14 +153,15 @@ Status code: 200
154153
| [ChatCompletionRequestMessage](#chatcompletionrequestmessage) | |
155154
| [ChatCompletionMessageContentPart](#chatcompletionmessagecontentpart) | |
156155
| [ChatCompletionMessageContentPartType](#chatcompletionmessagecontentparttype) | |
156+
| [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
157157
| [ChatCompletionFinishReason](#chatcompletionfinishreason) | The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, `content_filter` if content was omitted due to a flag from our content filters, `tool_calls` if the model called a tool. |
158158
| [ChatCompletionMessageToolCall](#chatcompletionmessagetoolcall) | |
159159
| [ChatCompletionObject](#chatcompletionobject) | The object type, which is always `chat.completion`. |
160160
| [ChatCompletionResponseFormat](#chatcompletionresponseformat) | |
161161
| [ChatCompletionResponseMessage](#chatcompletionresponsemessage) | A chat completion message generated by the model. |
162162
| [ChatCompletionTool](#chatcompletiontool) | |
163163
| [ChatMessageRole](#chatmessagerole) | The role of the author of this message. |
164-
| [Choices](#choices) | A list of chat completion choices. Can be more than one if `n` is greater than 1. |
164+
| [Choices](#choices) | A list of chat completion choices. |
165165
| [CompletionUsage](#completionusage) | Usage statistics for the completion request. |
166166
| [ContentFilterError](#contentfiltererror) | The API call fails when the prompt triggers a content filter as configured. Modify the prompt and try again. |
167167
| [CreateChatCompletionRequest](#createchatcompletionrequest) | |
@@ -288,14 +288,13 @@ The API call fails when the prompt triggers a content filter as configured. Modi
288288
| frequency\_penalty | number | 0 | Helps prevent word repetitions by reducing the chance of a word being selected if it has already been used. The higher the frequency penalty, the less likely the model is to repeat the same words in its output. Return a 422 error if value or parameter is not supported by model. |
289289
| max\_tokens | integer | | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. Passing null causes the model to use its max context length. |
290290
| messages | ChatCompletionRequestMessage\[\] | | A list of messages comprising the conversation so far. Returns a 422 error if at least some of the messages can't be understood by the model. |
291-
| model | string | | Kept for compatibility reasons. This parameter is ignored. |
292291
| presence\_penalty | number | 0 | Helps prevent the same topics from being repeated by penalizing a word if it exists in the completion already, even just once. Return a 422 error if value or parameter is not supported by model. |
293292
| response\_format | [ChatCompletionResponseFormat](#chatcompletionresponseformat) | text | |
294293
| seed | integer | | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
295294
| stop | | | Sequences where the API will stop generating further tokens. |
296295
| stream | boolean | False | If set, partial message deltas will be sent. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. |
297296
| temperature | number | 1 | Non-negative number. Return 422 if value is unsupported by model. |
298-
| tool\_choice | ChatCompletionToolChoiceOption | | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
297+
| tool\_choice | [ChatCompletionToolChoiceOption](#chatcompletiontoolchoiceoption) | | Controls which (if any) function is called by the model. `none` means the model will not call a function and instead generates a message. `auto` means the model can pick between generating a message or calling a function. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that function.<br><br>`none` is the default when no functions are present. `auto` is the default if functions are present. Returns a 422 error if the tool is not supported by the model. |
299298
| tools | [ChatCompletionTool](#chatcompletiontool)\[\] | | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. Returns a 422 error if the tool is not supported by the model. |
300299
| top\_p | number | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both. |
301300

@@ -323,6 +322,17 @@ The API call fails when the prompt triggers a content filter as configured. Modi
323322
| image | string | |
324323
| image_url | string | |
325324

325+
### ChatCompletionToolChoiceOption
326+
327+
Controls which (if any) tool is called by the model.
328+
329+
| Name | Type | Description |
330+
| --- | --- | --- |
331+
| none | string | The model will not call any tool and instead generates a message. |
332+
| auto | string | The model can pick between generating a message or calling one or more tools. |
333+
| required | string | The model must call one or more tools. |
334+
| | string | Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
335+
326336
### ImageDetail
327337

328338
Specifies the detail level of the image.

articles/ai-studio/reference/reference-model-inference-completions.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ ms.custom:
2121
Creates a completion for the provided prompt and parameters.
2222

2323
```http
24-
POST /completions?api-version=2024-05-01-preview
24+
POST /completions?api-version=2024-04-01-preview
2525
```
2626

2727
| Name | In | Required | Type | Description |
@@ -37,7 +37,6 @@ POST /completions?api-version=2024-05-01-preview
3737
| prompt | True | | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
3838
| frequency\_penalty | | number | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
3939
| max\_tokens | | integer | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
40-
| model | | string | Kept for compatibility reasons. This parameter is ignored. |
4140
| presence\_penalty | | number | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
4241
| seed | | integer | If specified, the model makes a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |
4342
| stop | | | Sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
@@ -51,11 +50,11 @@ POST /completions?api-version=2024-05-01-preview
5150
| Name | Type | Description |
5251
| --- | --- | --- |
5352
| 200 OK | [CreateCompletionResponse](#createcompletionresponse) | OK |
54-
| 401 Unauthorized | | Access token is missing or invalid |
55-
| 404 Not Found | | Modality not supported by the model. Check the documentation of the model to see which routes are available. |
56-
| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string |
57-
| 429 Too Many Requests | | You have hit your assigned rate limit and your request need to be paced. |
58-
| Other Status Codes | [ContentFilterError](#contentfiltererror) | Bad request<br><br>Headers<br><br>x-ms-error-code: string |
53+
| 401 Unauthorized | [UnauthorizedError](#unauthorizederror) | Access token is missing or invalid<br><br>Headers<br><br>x-ms-error-code: string |
54+
| 404 Not Found | [NotFoundError](#notfounderror) | Modality not supported by the model. Check the documentation of the model to see which routes are available.<br><br>Headers<br><br>x-ms-error-code: string |
55+
| 422 Unprocessable Entity | [UnprocessableContentError](#unprocessablecontenterror) | The request contains unprocessable content<br><br>Headers<br><br>x-ms-error-code: string |
56+
| 429 Too Many Requests | [TooManyRequestsError](#toomanyrequestserror) | You have hit your assigned rate limit and your request need to be paced.<br><br>Headers<br><br>x-ms-error-code: string |
57+
| Other Status Codes | [ContentFilterError](#contentfiltererror) | Bad request<br><br>Headers<br><br>x-ms-error-code: string |
5958

6059

6160
## Security
@@ -86,7 +85,7 @@ Azure Active Directory OAuth2 authentication
8685
#### Sample Request
8786

8887
```http
89-
POST /completions?api-version=2024-05-01-preview
88+
POST /completions?api-version=2024-04-01-preview
9089
9190
{
9291
"prompt": "This is a very good text",
@@ -199,7 +198,6 @@ The API call fails when the prompt triggers a content filter as configured. Modi
199198
| --- | --- | --- | --- |
200199
| frequency\_penalty | number | 0 | Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
201200
| max\_tokens | integer | 256 | The maximum number of tokens that can be generated in the completion. The token count of your prompt plus `max_tokens` cannot exceed the model's context length. |
202-
| model | string | | Kept for compatibility reasons. This parameter is ignored. |
203201
| presence\_penalty | number | 0 | Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
204202
| prompt | | `<\|endoftext\|>` | The prompts to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that `<\|endoftext\|>` is the document separator that the model sees during training, so if a prompt is not specified the model generates as if from the beginning of a new document. |
205203
| seed | integer | | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend. |

0 commit comments

Comments
 (0)