You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/api-versions/latest-inference.md
+18-8Lines changed: 18 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.topic: include
8
8
ms.date: 11/01/2024
9
9
---
10
10
11
-
## Completions - Create
11
+
## Completions
12
12
13
13
```HTTP
14
14
POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2024-10-21
@@ -42,7 +42,7 @@ Creates a completion for the provided prompt, parameters, and chosen model.
42
42
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
43
43
| logit_bias | object | Modify the likelihood of specified tokens appearing in the completion.<br><br>Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.<br><br>As an example, you can pass `{"50256": -100}` to prevent the <|endoftext|> token from being generated.<br> | No | None |
44
44
| logprobs | integer | Include the log probabilities on the `logprobs` most likely output tokens, as well the chosen tokens. For example, if `logprobs` is 5, the API will return a list of the five most likely tokens. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response.<br><br>The maximum value for `logprobs` is 5.<br> | No | None |
45
-
| max_tokens | integer | The maximum number of [tokens](https://platform.openai.com/tokenizer) that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
45
+
| max_tokens | integer | The maximum number of tokens that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
46
46
| n | integer | How many completions to generate for each prompt.<br><br>**Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`.<br> | No | 1 |
47
47
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
48
48
| seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism isn't guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.<br> | No ||
@@ -116,7 +116,7 @@ Status Code: 200
116
116
}
117
117
```
118
118
119
-
## Embeddings - Create
119
+
## Embeddings
120
120
121
121
```HTTP
122
122
POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21
@@ -262,7 +262,7 @@ Status Code: 200
262
262
}
263
263
```
264
264
265
-
## Chat completions - Create
265
+
## Chat completions
266
266
267
267
```HTTP
268
268
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2024-10-21
@@ -294,7 +294,7 @@ Creates a completion for the chat message
294
294
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
295
295
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
296
296
| stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No ||
297
-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No ||
297
+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No ||
298
298
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No ||
299
299
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
300
300
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -805,7 +805,7 @@ Status Code: 200
805
805
}
806
806
```
807
807
808
-
## Image generations - Create
808
+
## Image generation
809
809
810
810
```HTTP
811
811
POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2024-10-21
@@ -1235,7 +1235,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
1235
1235
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
1236
1236
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
1237
1237
| stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No ||
1238
-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No ||
1238
+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No ||
1239
1239
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No ||
1240
1240
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
1241
1241
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -2033,7 +2033,7 @@ No properties defined for this component.
2033
2033
| description | string | A description of what the function does, used by the model to choose when and how to call the function. | No ||
2034
2034
| name | string | The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. | Yes ||
2035
2035
| parameters |[FunctionParameters](#functionparameters)| The parameters the functions accepts, described as a JSON Schema object. See the guide](/azure/ai-services/openai/how-to/function-calling) for examples, and the [JSON Schema reference](https://json-schema.org/understanding-json-schema/) for documentation about the format. <br><br>Omitting `parameters` defines a function with an empty parameter list. | No ||
2036
-
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. Learn more about Structured Outputs in the [function calling guide](docs/guides/function-calling). | No | False |
2036
+
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. | No | False |
2037
2037
2038
2038
2039
2039
### ResponseFormatText
@@ -2310,4 +2310,14 @@ The image url or encoded image if successful, and an error otherwise.
2310
2310
| revised_prompt | string | The prompt that was used to generate the image, if there was any revision to the prompt. | No ||
2311
2311
| prompt_filter_results |[dalleFilterResults](#dallefilterresults)| Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer blocklist, if it has been filtered and its id. | No ||
2312
2312
2313
+
### Completions extensions
2313
2314
2315
+
Completions extensions aren't part of the latest GA version of the Azure OpenAI data plane inference spec.
2316
+
2317
+
### Chatmessage
2318
+
2319
+
The Chat message object isn't part of the latest GA version of the Azure OpenAI data plane inference spec.
2320
+
2321
+
### Text to speech
2322
+
2323
+
Is not currently part of the latest Azure OpenAI GA version of the Azure OpenAI data plane inference spec. Refer to the latest [preview](../../reference-preview.md) version for this capability.
0 commit comments