You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/api-surface.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Information on the division of control plane and data plane API sur
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: include
8
-
ms.date: 01/08/2024
8
+
ms.date: 01/29/2025
9
9
---
10
10
11
11
@@ -22,8 +22,8 @@ Each API surface/specification encapsulates a different set of Azure OpenAI capa
22
22
| API | Latest preview release | Latest GA release | Specifications | Description |
23
23
|:---|:----|:----|:----|:---|
24
24
| **Control plane** | [`2024-06-01-preview`](/rest/api/aiservices/accountmanagement/operation-groups?view=rest-aiservices-accountmanagement-2024-06-01-preview&preserve-view=true) | [`2024-10-01`](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices) | Azure OpenAI shares a common control plane with all other Azure AI Services. The control plane API is used for things like [creating Azure OpenAI resources](/rest/api/aiservices/accountmanagement/accounts/create?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), [model deployment](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.|
25
-
| **Data plane - authoring** | `2024-10-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
26
-
|**Data plane - inference**|[`2024-12-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference)|[`2024-10-21`](/azure/ai-services/openai/reference#data-plane-inference)|[Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference)| The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. |
25
+
| **Data plane - authoring** | `2025-01-01-preview` | `2024-10-21` | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
26
+
|**Data plane - inference**|[`2025-01-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference)|[`2024-10-21`](/azure/ai-services/openai/reference#data-plane-inference)|[Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference)| The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. |
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/api-versions/latest-inference-preview.md
+9-11Lines changed: 9 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,9 +44,9 @@ Creates a completion for the provided prompt, parameters and chosen model.
44
44
| logprobs | integer | Include the log probabilities on the `logprobs` most likely output tokens, as well the chosen tokens. For example, if `logprobs` is 5, the API will return a list of the 5 most likely tokens. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response.<br><br>The maximum value for `logprobs` is 5.<br> | No | None |
45
45
| max_tokens | integer | The maximum number of tokens that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
46
46
| n | integer | How many completions to generate for each prompt.<br><br>**Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`.<br> | No | 1 |
47
-
| modalities |[ChatCompletionModalities](#chatcompletionmodalities)| Output types that you would like the model to generate for this request.<br>Most models are capable of generating text, which is the default:<br><br>`["text"]`<br><br>The `gpt-4o-audio-preview` model can also be used to [generate audio](/docs/guides/audio). To<br>request that this model generate both text and audio responses, you can<br>use:<br><br>`["text", "audio"]`<br> | No ||
47
+
| modalities |[ChatCompletionModalities](#chatcompletionmodalities)| Output types that you would like the model to generate for this request.<br>Most models are capable of generating text, which is the default:<br><br>`["text"]`<br><br>The `gpt-4o-audio-preview` model can also be used to generate audio. To<br>request that this model generate both text and audio responses, you can<br>use:<br><br>`["text", "audio"]`<br> | No ||
48
48
| prediction |[PredictionContent](#predictioncontent)| Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content. | No ||
49
-
| audio | object | Parameters for audio output. Required when audio output is requested with<br>`modalities: ["audio"]`. [Learn more](/docs/guides/audio).<br>| No ||
49
+
| audio | object | Parameters for audio output. Required when audio output is requested with<br>`modalities: ["audio"]`. | No ||
50
50
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
51
51
| seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism isn't guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.<br> | No ||
52
52
| stop | string or array | Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence.<br> | No ||
@@ -4597,9 +4597,9 @@ Information about the content filtering category (hate, sexual, violence, self_h
4597
4597
| logprobs | integer | Include the log probabilities on the `logprobs` most likely output tokens, as well the chosen tokens. For example, if `logprobs` is 5, the API will return a list of the 5 most likely tokens. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response.<br><br>The maximum value for `logprobs` is 5.<br> | No | None |
4598
4598
| max_tokens | integer | The maximum number of tokens that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
4599
4599
| n | integer | How many completions to generate for each prompt.<br><br>**Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`.<br> | No | 1 |
4600
-
| modalities |[ChatCompletionModalities](#chatcompletionmodalities)| Output types that you would like the model to generate for this request.<br>Most models are capable of generating text, which is the default:<br><br>`["text"]`<br><br>The `gpt-4o-audio-preview` model can also be used to [generate audio](/docs/guides/audio). To<br>request that this model generate both text and audio responses, you can<br>use:<br><br>`["text", "audio"]`<br> | No ||
4601
-
| prediction |[PredictionContent](#predictioncontent)| Configuration for a [Predicted Output](/docs/guides/predicted-outputs), which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content. | No ||
4602
-
| audio | object | Parameters for audio output. Required when audio output is requested with<br>`modalities: ["audio"]`. [Learn more](/docs/guides/audio).<br>| No ||
4600
+
| modalities |[ChatCompletionModalities](#chatcompletionmodalities)| Output types that you would like the model to generate for this request.<br>Most models are capable of generating text, which is the default:<br><br>`["text"]`<br><br>The `gpt-4o-audio-preview` model can also be used to generate audio. To<br>request that this model generate both text and audio responses, you can<br>use:<br><br>`["text", "audio"]`<br> | No ||
4601
+
| prediction |[PredictionContent](#predictioncontent)| Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content. | No ||
4602
+
| audio | object | Parameters for audio output. Required when audio output is requested with<br>`modalities: ["audio"]`. | No ||
4603
4603
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
4604
4604
| seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism isn't guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.<br> | No ||
4605
4605
| stop | string or array | Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence.<br> | No ||
@@ -4662,7 +4662,7 @@ Represents a completion response from the API. Note: both the streamed and non-s
4662
4662
| user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse.<br> | No ||
4663
4663
| messages | array | A list of messages comprising the conversation so far. | Yes ||
4664
4664
| data_sources | array | The configuration entries for Azure OpenAI chat extensions that use them.<br> This additional specification is only compatible with Azure OpenAI. | No ||
4665
-
| reasoning_effort | enum |**o1 models only** <br><br> Constrains effort on reasoning for <br>[reasoning models](https://platform.openai.com/docs/guides/reasoning).<br><br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br>Possible values: low, medium, high | No ||
4665
+
| reasoning_effort | enum |**o1 models only** <br><br> Constrains effort on reasoning for <br>reasoning models.<br><br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br>Possible values: low, medium, high | No ||
4666
4666
| logprobs | boolean | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | No | False |
4667
4667
| top_logprobs | integer | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | No ||
4668
4668
| n | integer | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep `n` as `1` to minimize costs. | No | 1 |
@@ -4856,8 +4856,6 @@ This component can be one of the following:
4856
4856
4857
4857
### chatCompletionRequestMessageContentPartAudio
4858
4858
4859
-
Learn about [audio inputs](/docs/guides/audio).
4860
-
4861
4859
4862
4860
| Name | Type | Description | Required | Default |
@@ -5696,7 +5694,7 @@ A chat completion message generated by the model.
5696
5694
| content | string | The contents of the message. | Yes ||
5697
5695
| tool_calls | array | The tool calls generated by the model, such as function calls. | No ||
5698
5696
| function_call |[chatCompletionFunctionCall](#chatcompletionfunctioncall)| Deprecated and replaced by `tool_calls`. The name and arguments of a function that should be called, as generated by the model. | No ||
5699
-
| audio | object | If the audio output modality is requested, this object contains data<br>about the audio response from the model. [Learn more](/docs/guides/audio).<br>| No ||
5697
+
| audio | object | If the audio output modality is requested, this object contains data<br>about the audio response from the model. | No ||
5700
5698
| context |[azureChatExtensionsMessageContext](#azurechatextensionsmessagecontext)| A representation of the additional context information available when Azure OpenAI chat extensions are involved<br> in the generation of a corresponding chat completions response. This context information is only populated when<br> using an Azure OpenAI request configured to use a matching extension. | No ||
5701
5699
5702
5700
@@ -5799,7 +5797,7 @@ Most models are capable of generating text, which is the default:
5799
5797
5800
5798
`["text"]`
5801
5799
5802
-
The `gpt-4o-audio-preview` model can also be used to [generate audio](/docs/guides/audio). To
5800
+
The `gpt-4o-audio-preview` model can also be used to generate audio. To
5803
5801
request that this model generate both text and audio responses, you can
5804
5802
use:
5805
5803
@@ -5902,7 +5900,7 @@ No properties defined for this component.
5902
5900
| description | string | A description of what the function does, used by the model to choose when and how to call the function. | No ||
5903
5901
| name | string | The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. | Yes ||
5904
5902
| parameters |[FunctionParameters](#functionparameters)| The parameters the functions accepts, described as a JSON Schema object. [See the guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/function-calling) for examples, and the [JSON Schema reference](https://json-schema.org/understanding-json-schema/) for documentation about the format. <br><br>Omitting `parameters` defines a function with an empty parameter list. | No ||
5905
-
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. Learn more about Structured Outputs in the [function calling guide](docs/guides/function-calling). | No | False |
5903
+
| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. | No | False |
Copy file name to clipboardExpand all lines: articles/ai-services/openai/reference-preview.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Learn how to use Azure OpenAI's latest preview REST API. In this ar
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: conceptual
8
-
ms.date: 10/16/2024
8
+
ms.date: 01/29/2025
9
9
author: mrbullwinkle
10
10
ms.author: mbullwin
11
11
recommendations: false
@@ -20,7 +20,7 @@ This article provides details on the inference REST API endpoints for Azure Open
20
20
21
21
## Data plane inference
22
22
23
-
The rest of the article covers the latest preview release of the Azure OpenAI data plane inference specification, `2024-10-01-preview`. This article includes documentation for the latest preview capabilities like assistants, threads, and vector stores.
23
+
The rest of the article covers the latest preview release of the Azure OpenAI data plane inference specification, `2025-01-01-preview`. This article includes documentation for the latest preview capabilities like assistants, threads, and vector stores.
24
24
25
25
If you're looking for documentation on the latest GA API release, refer to the [latest GA data plane inference API](./reference.md)
0 commit comments