Skip to content

Commit 763f5f2

Browse files
committed
update
1 parent 3f77f34 commit 763f5f2

File tree

1 file changed

+18
-8
lines changed

1 file changed

+18
-8
lines changed

articles/ai-services/openai/includes/api-versions/latest-inference.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.topic: include
88
ms.date: 11/01/2024
99
---
1010

11-
## Completions - Create
11+
## Completions
1212

1313
```HTTP
1414
POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2024-10-21
@@ -42,7 +42,7 @@ Creates a completion for the provided prompt, parameters, and chosen model.
4242
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
4343
| logit_bias | object | Modify the likelihood of specified tokens appearing in the completion.<br><br>Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.<br><br>As an example, you can pass `{"50256": -100}` to prevent the <&#124;endoftext&#124;> token from being generated.<br> | No | None |
4444
| logprobs | integer | Include the log probabilities on the `logprobs` most likely output tokens, as well the chosen tokens. For example, if `logprobs` is 5, the API will return a list of the five most likely tokens. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response.<br><br>The maximum value for `logprobs` is 5.<br> | No | None |
45-
| max_tokens | integer | The maximum number of [tokens](https://platform.openai.com/tokenizer) that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
45+
| max_tokens | integer | The maximum number of tokens that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
4646
| n | integer | How many completions to generate for each prompt.<br><br>**Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`.<br> | No | 1 |
4747
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
4848
| seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism isn't guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.<br> | No | |
@@ -116,7 +116,7 @@ Status Code: 200
116116
}
117117
```
118118

119-
## Embeddings - Create
119+
## Embeddings
120120

121121
```HTTP
122122
POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21
@@ -262,7 +262,7 @@ Status Code: 200
262262
}
263263
```
264264

265-
## Chat completions - Create
265+
## Chat completions
266266

267267
```HTTP
268268
POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2024-10-21
@@ -294,7 +294,7 @@ Creates a completion for the chat message
294294
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
295295
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
296296
| stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No | |
297-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No | |
297+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No | |
298298
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No | |
299299
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
300300
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -805,7 +805,7 @@ Status Code: 200
805805
}
806806
```
807807

808-
## Image generations - Create
808+
## Image generation
809809

810810
```HTTP
811811
POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2024-10-21
@@ -1235,7 +1235,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
12351235
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
12361236
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
12371237
| stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No | |
1238-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No | |
1238+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No | |
12391239
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No | |
12401240
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
12411241
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -2033,7 +2033,7 @@ No properties defined for this component.
20332033
| description | string | A description of what the function does, used by the model to choose when and how to call the function. | No | |
20342034
| name | string | The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. | Yes | |
20352035
| parameters | [FunctionParameters](#functionparameters) | The parameters the functions accepts, described as a JSON Schema object. See the guide](/azure/ai-services/openai/how-to/function-calling) for examples, and the [JSON Schema reference](https://json-schema.org/understanding-json-schema/) for documentation about the format. <br><br>Omitting `parameters` defines a function with an empty parameter list. | No | |
2036-
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. Learn more about Structured Outputs in the [function calling guide](docs/guides/function-calling). | No | False |
2036+
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. | No | False |
20372037

20382038

20392039
### ResponseFormatText
@@ -2310,4 +2310,14 @@ The image url or encoded image if successful, and an error otherwise.
23102310
| revised_prompt | string | The prompt that was used to generate the image, if there was any revision to the prompt. | No | |
23112311
| prompt_filter_results | [dalleFilterResults](#dallefilterresults) | Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer blocklist, if it has been filtered and its id. | No | |
23122312

2313+
### Completions extensions
23132314

2315+
Completions extensions aren't part of the latest GA version of the Azure OpenAI data plane inference spec.
2316+
2317+
### Chatmessage
2318+
2319+
The Chat message object isn't part of the latest GA version of the Azure OpenAI data plane inference spec.
2320+
2321+
### Text to speech
2322+
2323+
Is not currently part of the latest Azure OpenAI GA version of the Azure OpenAI data plane inference spec. Refer to the latest [preview](../../reference-preview.md) version for this capability.

0 commit comments

Comments
 (0)