Skip to content

Commit 57a614f

Browse files
authored
Merge pull request #4229 from MicrosoftDocs/main
4/18/2025 PM Publish
2 parents b1fe2eb + 8f29ffc commit 57a614f

8 files changed

+230
-50
lines changed

articles/ai-services/openai/concepts/models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ The Azure OpenAI o<sup>&#42;</sup> series models are specifically designed to ta
9999

100100
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
101101
| --- | :--- |:--- |:---: |
102-
| `o4-mini` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br><br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) (**Feature coming soon!**) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
103-
| `o3` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br> <br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) (**Feature coming soon!**) <br> - Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
102+
| `o4-mini` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br><br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) <br>- Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
103+
| `o3` (2025-04-16) | - **NEW** reasoning model, offering [enhanced reasoning abilities](../how-to/reasoning.md). <br> <br> - Chat Completions API <br> - [Responses API](../how-to/responses.md) <br> - Structured outputs<br> - Text, image processing <br> - Functions/Tools/Parallel tool calling <br> [Full summary of capabilities](../how-to/reasoning.md) | Input: 200,000 <br> Output: 100,000 | May 31, 2024 |
104104
| `o3-mini` (2025-01-31) | - [Enhanced reasoning abilities](../how-to/reasoning.md). <br> - Structured outputs<br> - Text-only processing <br> - Functions/Tools | Input: 200,000 <br> Output: 100,000 | Oct 2023 |
105105
| `o1` (2024-12-17) | - [Enhanced reasoning abilities](../how-to/reasoning.md). <br> - Structured outputs<br> - Text, image processing <br> - Functions/Tools | Input: 200,000 <br> Output: 100,000 | Oct 2023 |
106106
|`o1-preview` (2024-09-12) | Older preview version | Input: 128,000 <br> Output: 32,768 | Oct 2023 |

articles/ai-services/openai/how-to/reasoning.md

Lines changed: 139 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Azure OpenAI's advanced o3-mini, o1, & o1-mini rea
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: include
8-
ms.date: 04/16/2025
8+
ms.date: 04/18/2025
99
author: mrbullwinkle
1010
ms.author: mbullwin
1111
---
@@ -39,19 +39,19 @@ Azure OpenAI `o-series` models are designed to tackle reasoning and problem-solv
3939

4040
| **Feature** | **o4-mini**, **2025-04-16** | **o3**, **2025-04-16** | **o3-mini**, **2025-01-31** |**o1**, **2024-12-17** | **o1-preview**, **2024-09-12** | **o1-mini**, **2024-09-12** |
4141
|:-------------------|:--------------------------:|:-----:|:-------:|:--------------------------:|:-------------------------------:|:---:|
42-
| **API Version** | `2025-03-01-preview` | `2025-03-01-preview` | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) |
42+
| **API Version** | `2025-04-01-preview` | `2025-04-01-preview` | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-12-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) | `2024-09-01-preview` or later <br> `2025-03-01-preview` (Recommended) |
4343
| **[Developer Messages](#developer-messages)** ||||| - | - |
4444
| **[Structured Outputs](./structured-outputs.md)** ||||| - | - |
4545
| **[Context Window](../concepts/models.md#o-series-models)** | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 200,000 <br> Output: 100,000 | Input: 128,000 <br> Output: 32,768 | Input: 128,000 <br> Output: 65,536 |
4646
| **[Reasoning effort](#reasoning-effort)** ||||| - | - |
4747
| **[Vision Support](./gpt-with-vision.md)** ||| - || - | - |
4848
| Chat Completions API |||||||
49-
| Responses API |(**Feature coming soon!**) |(**Feature coming soon!**) | - | - | - | - |
49+
| Responses API ||| - | - | - | - |
5050
| Functions/Tools ||||| - | - |
5151
| Parallel Tool Calls ||| - | - | - | - |
5252
| `max_completion_tokens`<sup>*</sup> |||||||
5353
| System Messages<sup>**</sup> ||||| - | - |
54-
| Reasoning summary <sup>***</sup> |(**Feature coming soon!**) | (**Feature coming soon!**) | - | - | - | - |
54+
| [Reasoning summary](#reasoning-summary) <sup>***</sup> ||| - | - | - | - |
5555
| Streaming |||| - | - | - |
5656

5757
<sup>*</sup> Reasoning models will only work with the `max_completion_tokens` parameter. <br><br>
@@ -91,7 +91,7 @@ token_provider = get_bearer_token_provider(
9191
client = AzureOpenAI(
9292
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
9393
azure_ad_token_provider=token_provider,
94-
api_version="2024-12-01-preview"
94+
api_version="2025-03-01-preview"
9595
)
9696

9797
response = client.chat.completions.create(
@@ -121,7 +121,7 @@ from openai import AzureOpenAI
121121
client = AzureOpenAI(
122122
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
123123
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
124-
api_version="2024-12-01-preview"
124+
api_version="2025-03-01-preview"
125125
)
126126

127127
response = client.chat.completions.create(
@@ -298,7 +298,7 @@ token_provider = get_bearer_token_provider(
298298
client = AzureOpenAI(
299299
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
300300
azure_ad_token_provider=token_provider,
301-
api_version="2024-12-01-preview"
301+
api_version="2025-03-01-preview"
302302
)
303303

304304
response = client.chat.completions.create(
@@ -330,7 +330,7 @@ from openai import AzureOpenAI
330330
client = AzureOpenAI(
331331
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
332332
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
333-
api_version="2024-12-01-preview"
333+
api_version="2025-03-01-preview"
334334
)
335335

336336
response = client.chat.completions.create(
@@ -381,6 +381,137 @@ Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");
381381

382382
---
383383

384+
## Reasoning summary
385+
386+
When using the latest `o3` and `o4-mini` models with the [Responses API](./responses.md) you can use the reasoning summary parameter to receive summaries of the model's chain of thought reasoning. This parameter can be set to `auto`, `concise`, or `detailed`. Access to this feature requires you to [Request Access](https://aka.ms/oai/o3access).
387+
388+
> [!NOTE]
389+
> Even when enabled, reasoning summaries are not generated for every step/request. This is expected behavior.
390+
391+
# [Python](#tab/py)
392+
393+
You'll need to upgrade your OpenAI client library for access to the latest parameters.
394+
395+
```cmd
396+
pip install openai --upgrade
397+
```
398+
399+
```python
400+
from openai import AzureOpenAI
401+
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
402+
403+
token_provider = get_bearer_token_provider(
404+
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
405+
)
406+
407+
client = AzureOpenAI(
408+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
409+
azure_ad_token_provider=token_provider,
410+
api_version="2025-04-01-preview" # You must use this version or greater to access reasoning summary
411+
)
412+
413+
response = client.responses.create(
414+
input="Tell me about the curious case of neural text degeneration",
415+
model="o4-mini", # replace with model deployment name
416+
reasoning={
417+
"effort": "medium",
418+
"summary": "detailed" # auto, concise, or detailed (currently only supported with o4-mini and o3)
419+
}
420+
)
421+
422+
print(response.model_dump_json(indent=2))
423+
```
424+
425+
# [REST](#tab/REST)
426+
427+
```bash
428+
curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/responses?api-version=2025-04-01-preview" \
429+
-H "Content-Type: application/json" \
430+
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
431+
-d '{
432+
"model": "o4-mini",
433+
"input": "Tell me about the curious case of neural text degeneration",
434+
"reasoning": {"summary": "detailed"}
435+
}'
436+
```
437+
438+
---
439+
440+
```output
441+
{
442+
"id": "resp_68007e26b2cc8190b83361014f3a78c50ae9b88522c3ad24",
443+
"created_at": 1744862758.0,
444+
"error": null,
445+
"incomplete_details": null,
446+
"instructions": null,
447+
"metadata": {},
448+
"model": "o4-mini",
449+
"object": "response",
450+
"output": [
451+
{
452+
"id": "rs_68007e2773bc8190b5b8089949bfe13a0ae9b88522c3ad24",
453+
"summary": [
454+
{
455+
"text": "**Summarizing neural text degeneration**\n\nThe user's asking about \"The Curious Case of Neural Text Degeneration,\" a paper by Ari Holtzman et al. from 2020. It explains how certain decoding strategies produce repetitive and dull text. In contrast, methods like nucleus sampling yield more coherent and diverse outputs. The authors introduce metrics like surprisal and distinct-n for evaluation and suggest that maximum likelihood decoding often favors generic continuations, leading to loops and repetitive patterns in longer texts. They promote sampling from truncated distributions for improved text quality.",
456+
"type": "summary_text"
457+
},
458+
{
459+
"text": "**Explaining nucleus sampling**\n\nThe authors propose nucleus sampling, which captures a specified mass of the predictive distribution, improving metrics such as coherence and diversity. They identify a \"sudden drop\" phenomenon in token probabilities, where a few tokens dominate, leading to a long tail. By truncating this at a cumulative probability threshold, they aim to enhance text quality compared to top-k sampling. Their evaluations include human assessments, showing better results in terms of BLEU scores and distinct-n measures. Overall, they highlight how decoding strategies influence quality and recommend adaptive techniques for improved outcomes.",
460+
"type": "summary_text"
461+
}
462+
],
463+
"type": "reasoning",
464+
"status": null
465+
},
466+
{
467+
"id": "msg_68007e35c44881908cb4651b8e9972300ae9b88522c3ad24",
468+
"content": [
469+
{
470+
"annotations": [],
471+
"text": "Researchers first became aware that neural language models, when used to generate long stretches of text with standard “maximum‐likelihood” decoding (greedy search, beam search, etc.), often produce bland, repetitive or looping output. The 2020 paper “The Curious Case of Neural Text Degeneration” (Holtzman et al.) analyzes this failure mode and proposes a simple fix—nucleus (top‑p) sampling—that dramatically improves output quality.\n\n1. The Problem: Degeneration \n • With greedy or beam search, models tend to pick very high‑probability tokens over and over, leading to loops (“the the the…”) or generic, dull continuations. \n • Even sampling with a fixed top‑k (e.g. always sample from the 40 most likely tokens) can be suboptimal: if the model’s probability mass is skewed, k may be too small (overly repetitive) or too large (introducing incoherence).\n\n2. Why It Happens: Distributional Peakedness \n • At each time step the model’s predicted next‐token distribution often has one or two very high‑probability tokens, then a long tail of low‑probability tokens. \n • Maximum‐likelihood decoding zeroes in on the peak, collapsing diversity. \n • Uniform sampling over a large k allows low‑probability “wild” tokens, harming coherence.\n\n3. The Fix: Nucleus (Top‑p) Sampling \n • Rather than fixing k, dynamically truncate the distribution to the smallest set of tokens whose cumulative probability ≥ p (e.g. p=0.9). \n • Then renormalize and sample from that “nucleus.” \n • This keeps only the “plausible” mass and discards the improbable tail, adapting to each context.\n\n4. Empirical Findings \n • Automatic metrics (distinct‑n, repetition rates) and human evaluations show nucleus sampling yields more diverse, coherent, on‑topic text than greedy/beam or fixed top‑k. \n • It also outperforms simple temperature scaling (raising logits to 1/T) because it adapts to changes in the distribution’s shape.\n\n5. Takeaways for Practitioners \n • Don’t default to beam search for open-ended generation—its high likelihood doesn’t mean high quality. \n • Use nucleus sampling (p between 0.8 and 0.95) for a balance of diversity and coherence. \n • Monitor repetition and distinct‑n scores if you need automatic sanity checks.\n\nIn short, “neural text degeneration” is the tendency of likelihood‐maximizing decoders to produce dull or looping text. By recognizing that the shape of the model’s probability distribution varies wildly from step to step, nucleus sampling provides an elegant, adaptive way to maintain both coherence and diversity in generated text.",
472+
"type": "output_text"
473+
}
474+
],
475+
"role": "assistant",
476+
"status": "completed",
477+
"type": "message"
478+
}
479+
],
480+
"parallel_tool_calls": true,
481+
"temperature": 1.0,
482+
"tool_choice": "auto",
483+
"tools": [],
484+
"top_p": 1.0,
485+
"max_output_tokens": null,
486+
"previous_response_id": null,
487+
"reasoning": {
488+
"effort": "medium",
489+
"generate_summary": null,
490+
"summary": "detailed"
491+
},
492+
"status": "completed",
493+
"text": {
494+
"format": {
495+
"type": "text"
496+
}
497+
},
498+
"truncation": "disabled",
499+
"usage": {
500+
"input_tokens": 16,
501+
"output_tokens": 974,
502+
"output_tokens_details": {
503+
"reasoning_tokens": 384
504+
},
505+
"total_tokens": 990,
506+
"input_tokens_details": {
507+
"cached_tokens": 0
508+
}
509+
},
510+
"user": null,
511+
"store": true
512+
}
513+
```
514+
384515
## Markdown output
385516

386517
By default the `o3-mini` and `o1` models will not attempt to produce output that includes markdown formatting. A common use case where this behavior is undesirable is when you want the model to output code contained within a markdown code block. When the model generates output without markdown formatting you lose features like syntax highlighting, and copyable code blocks in interactive playground experiences. To override this new default behavior and encourage markdown inclusion in model responses, add the string `Formatting re-enabled` to the beginning of your developer message.

articles/ai-services/openai/how-to/responses.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ The responses API is currently available in the following regions:
4646
- `gpt-4.1` (Version: `2025-04-14`)
4747
- `gpt-4.1-nano` (Version: `2025-04-14`)
4848
- `gpt-4.1-mini` (Version: `2025-04-14`)
49+
- `o3` (Version: `2025-04-16`)
50+
- `o4-mini` (Version: `2025-04-16`)
4951

5052
Not every model is available in the regions supported by the responses API. Check the [models page](../concepts/models.md) for model region availability.
5153

@@ -460,6 +462,35 @@ second_response = client.responses.create(
460462
print(second_response.model_dump_json(indent=2))
461463
```
462464

465+
## Streaming
466+
467+
```python
468+
from openai import AzureOpenAI
469+
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
470+
471+
token_provider = get_bearer_token_provider(
472+
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
473+
)
474+
475+
client = AzureOpenAI(
476+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
477+
azure_ad_token_provider = token_provider,
478+
api_version = "2025-04-01-preview"
479+
)
480+
481+
response = client.responses.create(
482+
input = "This is a test",
483+
model = "o4-mini", # replace with model deployment name
484+
stream = True
485+
)
486+
487+
for event in response:
488+
if event.type == 'response.output_text.delta':
489+
print(event.delta, end='')
490+
491+
```
492+
493+
463494
## Function calling
464495

465496
The responses API supports function calling.
@@ -658,6 +689,10 @@ response = client.responses.create(
658689
print(response)
659690
```
660691

692+
## Reasoning models
693+
694+
For examples of how to use reasoning models with the responses API see the [reasoning models guide](./reasoning.md#reasoning-summary).
695+
661696
## Computer use
662697

663698
In this section, we provide a simple example script that integrates Azure OpenAI's `computer-use-preview` model with [Playwright](https://playwright.dev/) to automate basic browser interactions. Combining the model with [Playwright](https://playwright.dev/) allows the model to see the browser screen, make decisions, and perform actions like clicking, typing, and navigating websites. You should exercise caution when running this example code. This code is designed to be run locally but should only be executed in a test environment. Use a human to confirm decisions and don't give the model access to sensitive data.

articles/search/cognitive-search-aml-skill.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@ ms.custom:
1010
- ignite-2023
1111
- build-2024
1212
ms.topic: reference
13-
ms.date: 08/05/2024
13+
ms.date: 04/18/2025
1414
---
1515

1616
# AML skill in an Azure AI Search enrichment pipeline
1717

1818
> [!IMPORTANT]
1919
> Support for indexer connections to the Azure AI Foundry model catalog is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). Preview REST APIs support this skill.
2020
21-
The **AML** skill allows you to extend AI enrichment with a custom [Azure Machine Learning (AML)](../machine-learning/overview-what-is-azure-machine-learning.md) model or deployed base embedding model on Azure AI Foundry. Once an AML model is [trained and deployed](../machine-learning/concept-azure-machine-learning-architecture.md#workspace), an **AML** skill integrates it into a skillset.
21+
The **AML** skill allows you to extend AI enrichment with a custom [Azure Machine Learning (AML)](../machine-learning/overview-what-is-azure-machine-learning.md) model or deployed base embedding model in the Azure AI Foundry model catalog. Once an AML model is [trained and deployed](../machine-learning/concept-azure-machine-learning-architecture.md#workspace), an **AML** skill integrates it into a skillset.
2222

2323
## AML skill usage
2424

articles/search/cognitive-search-defining-skillset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-search
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 12/06/2024
11+
ms.date: 04/18/2025
1212
---
1313

1414
# Create a skillset in Azure AI Search
@@ -192,7 +192,7 @@ Skills read from and write to an enriched document. Skill inputs specify the ori
192192
| `source`: `/document/some-named-field` | For text-based skills, such as entity recognition or key phrase extraction, the origin should be a field that contains sufficient text to be analyzed, such as a *description* or *summary*. |
193193
| `source`: `/document/normalized_images/*` | For image content, the source is image that's been normalized during document cracking. |
194194

195-
If the skill iterates over an array, both context and input source should include `/*` in the correct positions.
195+
If the skill iterates over an array, both context and input source should include `/*` in the correct positions. For more information about the complete syntax, see [Skill context and input annotation language](cognitive-search-skill-annotation-language.md).
196196

197197
## Define outputs
198198

0 commit comments

Comments
 (0)