Skip to content

Commit dd46a04

Browse files
committed
Merge branch 'main' into release-aisvcs-move-rai-docs
2 parents 6212c83 + 155bbb1 commit dd46a04

19 files changed

+462
-387
lines changed

articles/ai-foundry/model-inference/concepts/models.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,6 @@ Microsoft models include various model groups such as MAI models, Phi models, he
180180

181181
| Model | Type | Tier | Capabilities |
182182
| ------ | ---- | --- | ------------ |
183-
| [MAI-DS-R1](https://ai.azure.com/explore/models/MAI-DS-R1/version/1/registry/azureml) | chat-completion <br /> [(with reasoning content)](../how-to/use-chat-reasoning.md) | Global standard | - **Input:** text (163,840 tokens) <br /> - **Output:** (163,840 tokens) <br /> - **Languages:** `en` and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text. |
184183
| [Phi-4-mini-instruct](https://ai.azure.com/explore/models/Phi-4-mini-instruct/version/1/registry/azureml) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` <br /> - **Tool calling:** No <br /> - **Response formats:** Text |
185184
| [Phi-4-multimodal-instruct](https://ai.azure.com/explore/models/Phi-4-multimodal-instruct/version/1/registry/azureml) | chat-completion | Global standard | - **Input:** text, images, and audio (131,072 tokens) <br /> - **Output:** (4,096 tokens) <br /> - **Languages:** `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` <br /> - **Tool calling:** No <br /> - **Response formats:** Text |
186185
| [Phi-4](https://ai.azure.com/explore/models/Phi-4/version/2/registry/azureml) | chat-completion | Global standard | - **Input:** text (16,384 tokens) <br /> - **Output:** (16,384 tokens) <br /> - **Languages:** `en`, `ar`, `bn`, `cs`, `da`, `de`, `el`, `es`, `fa`, `fi`, `fr`, `gu`, `ha`, `he`, `hi`, `hu`, `id`, `it`, `ja`, `jv`, `kn`, `ko`, `ml`, `mr`, `nl`, `no`, `or`, `pa`, `pl`, `ps`, `pt`, `ro`, `ru`, `sv`, `sw`, `ta`, `te`, `th`, `tl`, `tr`, `uk`, `ur`, `vi`, `yo`, and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text |

articles/ai-foundry/model-inference/includes/use-chat-reasoning/about-reasoning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ author: santiagxf
99

1010
## Reasoning models
1111

12-
Reasoning models can reach higher levels of performance in domains like math, coding, science, strategy, and logistics. The way these models produces outputs is by explicitly using chain of thought to explore all possible paths before generating an answer. They verify their answers as they produce them which helps them to arrive to better more accurate conclusions. This means that reasoning models may require less context in prompting in order to produce effective results.
12+
Reasoning models can reach higher levels of performance in domains like math, coding, science, strategy, and logistics. The way these models produce outputs is by explicitly using chain of thought to explore all possible paths before generating an answer. They verify their answers as they produce them which helps them to arrive to better more accurate conclusions. This means that reasoning models may require less context in prompting in order to produce effective results.
1313

1414
Such way of scaling model's performance is referred as *inference compute time* as it trades performance against higher latency and cost. It contrasts to other approaches that scale through *training compute time*.
1515

@@ -19,4 +19,4 @@ Reasoning models then produce two types of outputs:
1919
> * Reasoning completions
2020
> * Output completions
2121
22-
Both of these completions count towards content generated from the model and hence, towards the token limits and costs associated with the model. Some models may output the reasoning content, like `DeepSeek-R1`. Some others, like `o1`, only outputs the output piece of the completions.
22+
Both of these completions count towards content generated from the model and hence, towards the token limits and costs associated with the model. Some models may output the reasoning content, like `DeepSeek-R1`. Some others, like `o1`, only outputs the output piece of the completions.

articles/ai-services/openai/concepts/models.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Azure OpenAI is powered by a diverse set of models with different capabilities a
2222

2323
| Models | Description |
2424
|--|--|
25+
| [codex-mini](#o-series-models) | Fine-tuned version of o4-mini. |
2526
| [GPT-4.1 series](#gpt-41-series) | Latest model release from Azure OpenAI |
2627
| [model-router](#model-router) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. |
2728
| [computer-use-preview](#computer-use-preview) | An experimental model trained for use with the Responses API computer use tool. |
@@ -49,7 +50,7 @@ Azure OpenAI is powered by a diverse set of models with different capabilities a
4950
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
5051
| --- | :--- |:--- |:---|:---: |
5152
| `gpt-4.1` (2025-04-14) | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | - 1,047,576 <br> - 128,000 (provisioned managed deployments) | 32,768 | May 31, 2024 |
52-
| `gpt-4.1-nano` (2025-04-14) <br><br> **Fastest 4.1 model** | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | - 1,047,576 <br> - 128,000 (provisioned managed deployments) | 32,768 | May 31, 2024 |
53+
| `gpt-4.1-nano` (2025-04-14) | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | - 1,047,576 <br> - 128,000 (provisioned managed deployments) | 32,768 | May 31, 2024 |
5354
| `gpt-4.1-mini` (2025-04-14) | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | - 1,047,576 <br> - 128,000 (provisioned managed deployments) | 32,768 | May 31, 2024 |
5455

5556
## model-router
@@ -121,7 +122,9 @@ The Azure OpenAI o<sup>&#42;</sup> series models are specifically designed to ta
121122

122123
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
123124
| --- | :--- |:--- |:---: |
124-
| `o4-mini` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br><br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
125+
| `codex-mini` (2025-05-16) | Fine-tuned version of o4-mini. <br> - [Responses API](../how-to/responses.md) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools<br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
126+
| `o3-pro` (2025-06-10) | - [Responses API](../how-to/responses.md) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools<br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
127+
| `o4-mini` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br><br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools<br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
125128
| `o3` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br> <br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) <br> - Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
126129
| `o3-mini` (2025-01-31) | - [Enhanced reasoning abilities](../how-to/reasoning.md). <br> - Structured outputs<br> - Text-only processing <br> - Functions/Tools | Input: 200,000 <br> Output: 100,000 | Oct 2023 |
127130
| `o1` (2024-12-17) | - [Enhanced reasoning abilities](../how-to/reasoning.md). <br> - Structured outputs<br> - Text, image processing <br> - Functions/Tools | Input: 200,000 <br> Output: 100,000 | Oct 2023 |
@@ -136,6 +139,8 @@ To learn more about the advanced `o-series` models see, [getting started with re
136139

137140
| Model | Region |
138141
|---|---|
142+
|`codex-mini` | East US2 & Sweden Central (Global Standard) |
143+
|`o3-pro` | East US2 & Sweden Central (Global Standard) |
139144
|`o4-mini`| See the [models table](#model-summary-table-and-region-availability). |
140145
| `o3` | See the [models table](#model-summary-table-and-region-availability). |
141146
|`o3-mini` | See the [models table](#model-summary-table-and-region-availability). |

articles/ai-services/openai/how-to/fine-tuning-direct-preference-optimization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Training datasets must be in `jsonl` format:
5151

5252
## Direct preference optimization model support
5353

54-
- `gpt-4o-2024-08-06` supports direct preference optimization in its respective fine-tuning regions. Latest region availability is updated in the [models page](../concepts/models.md#fine-tuning-models)
54+
- `gpt-4o-2024-08-06`,`gpt-4.1-2025-04-14`,`gpt-4.1-mini-2025-04-14` supports direct preference optimization in its respective fine-tuning regions. Latest region availability is updated in the [models page](../concepts/models.md#fine-tuning-models)
5555

5656
Users can use preference fine tuning with base models as well as models that have already been fine-tuned using supervised fine-tuning as long as they are of a supported model/version.
5757

@@ -70,4 +70,4 @@ Users can use preference fine tuning with base models as well as models that hav
7070

7171
- Explore the fine-tuning capabilities in the [Azure OpenAI fine-tuning tutorial](../tutorials/fine-tune.md).
7272
- Review fine-tuning [model regional availability](../concepts/models.md#fine-tuning-models)
73-
- Learn more about [Azure OpenAI quotas](../quotas-limits.md)
73+
- Learn more about [Azure OpenAI quotas](../quotas-limits.md)

articles/ai-services/openai/how-to/function-calling.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: mbullwin #delegenz
77
ms.service: azure-ai-openai
88
ms.custom: devx-track-python
99
ms.topic: how-to
10-
ms.date: 04/16/2025
10+
ms.date: 06/17/2025
1111
manager: nitinme
1212
---
1313

@@ -48,6 +48,8 @@ Support for parallel function was first added in API version [`2023-12-01-previe
4848
### Basic function calling with tools
4949

5050
* All the models that support parallel function calling
51+
* `codex-mini` (`2025-05-16`)
52+
* `o3-pro` (`2025-06-10`)
5153
* `o4-mini` (`2025-04-16`)
5254
* `o3` (`2025-04-16`)
5355
* `gpt-4.1-nano` (`2025-04-14`)

articles/ai-services/openai/how-to/reasoning.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Azure OpenAI's advanced o3-mini, o1, & o1-mini rea
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: include
8-
ms.date: 04/18/2025
8+
ms.date: 06/17/2025
99
author: mrbullwinkle
1010
ms.author: mbullwin
1111
---
@@ -28,40 +28,43 @@ Azure OpenAI `o-series` models are designed to tackle reasoning and problem-solv
2828

2929
| Model | Region | Limited access |
3030
|---|---|---|
31-
| `o4-mini` | East US2 (Global Standard) <br><br> Sweden Central (Global Standard) | No access request needed to use the core capabilities of this model.<br><br> Request access: [o4-mini reasoning summary feature](https://aka.ms/oai/o3access) |
32-
| `o3` | East US2 (Global Standard) <br><br> Sweden Central (Global Standard) | Request access: [o3 limited access model application](https://aka.ms/oai/o3access) |
31+
| `o3-pro` | East US2 & Sweden Central (Global Standard) | Request access: [o3 limited access model application](https://aka.ms/oai/o3access). If you already have `o3 access` no request is required for `o3-pro`. |
32+
| `codex-mini` | East US2 & Sweden Central (Global Standard) | No access request needed. |
33+
| `o4-mini` | [Model availability](../concepts/models.md#global-standard-model-availability) | No access request needed to use the core capabilities of this model.<br><br> Request access: [o4-mini reasoning summary feature](https://aka.ms/oai/o3access) |
34+
| `o3` | [Model availability](../concepts/models.md#global-standard-model-availability) | Request access: [o3 limited access model application](https://aka.ms/oai/o3access) |
3335
| `o3-mini` | [Model availability](../concepts/models.md#global-standard-model-availability). | Access is no longer restricted for this model. |
3436
|`o1` | [Model availability](../concepts/models.md#global-standard-model-availability). | Access is no longer restricted for this model. |
3537
| `o1-preview` | [Model availability](../concepts/models.md#global-standard-model-availability). |This model is only available for customers who were granted access as part of the original limited access release. We're currently not expanding access to `o1-preview`. |
3638
| `o1-mini` | [Model availability](../concepts/models.md#global-standard-model-availability). | No access request needed for Global Standard deployments.<br><br>Standard (regional) deployments are currently only available to select customers who were previously granted access as part of the `o1-preview` release.|
3739

3840
## API & feature support
3941

40-
| **Feature** | **o4-mini**, **2025-04-16** | **o3**, **2025-04-16** | **o3-mini**, **2025-01-31** |**o1**, **2024-12-17** | **o1-preview**, **2024-09-12** | **o1-mini**, **2024-09-12** |
41-
|:-------------------|:--------------------------:|:-----:|:-------:|:--------------------------:|:-------------------------------:|:---:|
42-
| **API Version** | `2025-04-01-preview` | `2025-04-01-preview` | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) |
43-
| **[Developer Messages](#developer-messages)** ||||| - | - |
44-
| **[Structured Outputs](./structured-outputs.md)** ||||| - | - |
45-
| **[Context Window](../concepts/models.md#o-series-models)** | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 128,000 <br> Output: 32,768 | Input: 128,000 <br> Output: 65,536 |
46-
| **[Reasoning effort](#reasoning-effort)** ||||| - | - |
47-
| **[Vision Support](./gpt-with-vision.md)** ||| - || - | - |
48-
| Chat Completions API |||||||
49-
| Responses API ||| - | - | - | - |
50-
| Functions/Tools ||||| - | - |
51-
| Parallel Tool Calls | - | - | - | - | - | - |
52-
| `max_completion_tokens` <sup>1</sup> |||||||
53-
| System Messages <sup>2</sup> ||||| - | - |
54-
| [Reasoning summary](#reasoning-summary) <sup>3</sup> ||| - | - | - | - |
55-
| Streaming <sup>4</sup> |||| - | - | - |
42+
| **Feature** | **codex-mini**, **2025-05-16** | **o3-pro**, **2025-06-10** | **o4-mini**, **2025-04-16** | **o3**, **2025-04-16** | **o3-mini**, **2025-01-31** |**o1**, **2024-12-17** | **o1-preview**, **2024-09-12** | **o1-mini**, **2024-09-12** |
43+
|:-------------------|:--------------------------:|:------:|:--------|:-----:|:-------:|:--------------------------:|:-------------------------------:|:---:|
44+
| **API Version** | `2025-04-01-preview` & [v1 preview](../api-version-lifecycle.md#api-evolution) | `2025-04-01-preview` & [v1 preview](../api-version-lifecycle.md#api-evolution) | `2025-04-01-preview` | `2025-04-01-preview` | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) |
45+
| **[Developer Messages](#developer-messages)** ||||| || - | - |
46+
| **[Structured Outputs](./structured-outputs.md)** ||||| || - | - |
47+
| **[Context Window](../concepts/models.md#o-series-models)** | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 128,000 <br> Output: 32,768 | Input: 128,000 <br> Output: 65,536 |
48+
| **[Reasoning effort](#reasoning-effort)** |||| ||| - | - |
49+
| **[Image input](./gpt-with-vision.md)** || ||| - || - | - |
50+
| Chat Completions API | - | - | ||||||
51+
| Responses API | |||| - | - | - | - |
52+
| Functions/Tools ||||||| - | - |
53+
| Parallel Tool Calls | - | - | - | - | - | - | - | - |
54+
| `max_completion_tokens` <sup>1</sup> | || ||||||
55+
| System Messages <sup>2</sup> |||| ||| - | - |
56+
| [Reasoning summary](#reasoning-summary) <sup>3</sup> | | - | || - | - | - | - |
57+
| Streaming <sup>4</sup> || - || || - | - | - |
5658

5759
<sup>1</sup> Reasoning models will only work with the `max_completion_tokens` parameter. <br><br>
58-
5960
<sup>2</sup> The latest o<sup>&#42;</sup> series model support system messages to make migration easier. When you use a system message with `o4-mini`, `o3`, `o3-mini`, and `o1` it will be treated as a developer message. You should not use both a developer message and a system message in the same API request.
60-
6161
<sup>3</sup> Access to the chain-of-thought reasoning summary is limited access only for `o3` & `o4-mini`.
62-
6362
<sup>4</sup> Streaming for `o3` is limited access only.
6463

64+
> [!NOTE]
65+
> - To avoid timeouts [background mode](./responses.md#background-tasks) is recommended for `o3-pro`.
66+
> - `o3-pro` does not currently support image generation.
67+
6568
### Not Supported
6669

6770
The following are currently unsupported with reasoning models:

articles/ai-services/openai/how-to/structured-outputs.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: how-to
9-
ms.date: 04/16/2025
9+
ms.date: 06/17/2025
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -25,6 +25,8 @@ Structured outputs make a model follow a [JSON Schema](https://json-schema.org/o
2525
2626
## Supported models
2727

28+
- `codex-mini` version `2025-05-16`
29+
- `o3-pro` version `2025-06-10`
2830
- `gpt-4.5-preview` version `2025-02-27`
2931
- `o3-mini` version `2025-01-31`
3032
- `o1` version: `2024-12-17`

articles/ai-services/openai/includes/global-batch-limits.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,25 @@ The table shows the batch quota limit. Quota values for global batch are represe
2424

2525
|Model|Enterprise agreement|Default| Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
2626
|---|---|---|---|---|---|
27+
| `gpt-4.1`| 5 B | 200 M | 50 M | 90 K | N/A |
28+
| `gpt-4.1 mini` | 15B | 1B | 50M | 90k | N/A |
29+
| `gpt-4.1-nano` | 15 B | 1 B | 50 M | 90 K | N/A |
2730
| `gpt-4o` | 5 B | 200 M | 50 M | 90 K | N/A|
2831
| `gpt-4o-mini` | 15 B | 1 B | 50 M | 90 K | N/A |
2932
| `gpt-4-turbo` | 300 M | 80 M | 40 M | 90 K | N/A |
3033
| `gpt-4` | 150 M | 30 M | 5 M | 100 K | N/A |
3134
| `gpt-35-turbo` | 10 B | 1 B | 100 M | 2 M | 50 K |
3235
| `o3-mini`| 15 B | 1 B | 50 M | 90 K | N/A |
36+
| `o4-mini` | 15 B | 1 B | 50 M | 90 K | N/A |
3337

3438
B = billion | M = million | K = thousand
3539

3640
### Data zone batch
3741

3842
|Model|Enterprise agreement|Default| Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
3943
|---|---|---|---|---|---|
44+
| `gpt-4.1` | 500 M | 30 M | 30 M | 90 K | N/A|
45+
| `gpt-4.1-mini` | 1.5 B | 100 M | 50 M | 90 K | N/A |
4046
| `gpt-4o` | 500 M | 30 M | 30 M | 90 K | N/A|
4147
| `gpt-4o-mini` | 1.5 B | 100 M | 50 M | 90 K | N/A |
4248
| `o3-mini` | 1.5 B | 100 M | 50 M | 90 K | N/A |

0 commit comments

Comments
 (0)