Skip to content

Commit 0bd2db1

Browse files
Merge pull request #6440 from mrbullwinkle/mrb_08_06_2025_vnext_006
[Azure OpenAI] [Release Branch] Update 006
2 parents 35a0409 + 312d879 commit 0bd2db1

File tree

1 file changed

+72
-2
lines changed

1 file changed

+72
-2
lines changed

articles/ai-foundry/openai/how-to/reasoning.md

Lines changed: 72 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,25 @@ Azure OpenAI reasoning models are designed to tackle reasoning and problem-solvi
5858
| Parallel Tool Calls<sup>1</sup> ||||
5959
| `max_completion_tokens` <sup>2</sup> ||||
6060
| System Messages <sup>3</sup> ||||
61-
| [Reasoning summary](#reasoning-summary) <sup>4</sup> || | |
61+
| [Reasoning summary](#reasoning-summary) <sup>4</sup> || - | - |
6262
| Streaming ||||
6363

6464
<sup>1</sup> Parallel tool calls are not supported when `reasoning_effort` is set to `minimal`<br><br>
6565
<sup>2</sup> Reasoning models will only work with the `max_completion_tokens` parameter. <br><br>
6666
<sup>3</sup> The latest reasoning models support system messages to make migration easier. You should not use both a developer message and a system message in the same API request.<br><br>
6767
<sup>4</sup> Access to the chain-of-thought reasoning summary is limited access only for `o3` & `o4-mini`.
6868

69+
### NEW GPT-5 reasoning features
70+
71+
| Feature | Description |
72+
|----|----|
73+
|`reasoning_effort` | `minimal` is now supported with GPT-5 series reasoning models <br><br> **Options**: `minimal`, `low`, `medium`, `high`|
74+
|`verbosity` | A new parameter giving you more granular control over how concise the model's output will be.<br><br>**Options:** `low`, `medium`, `high`. |
75+
| `preamble` | GPT-5 series reasoning models have the ability to spend extra time *"thinking"* before executing a function/tool call.<br><br> When this planning occurs the model can provide insight into the planning steps in the model response via a new object called the `preamble` object.<br><br> Generation of preambles in the model response is not guaranteed though you can encourage the model by using the `instructions` parameter and passing content like "You MUST plan extensively before each function call. ALWAYS output your plan to the user before calling any function"|
76+
| **allowed tools** | You can specify multiple tools under `tool_choice` instead of just one. |
77+
| **custom tool type** | Enables raw text (non-json) outputs |
78+
| [`lark_tool`](#python-lark) | Allows you to use some of the capabilities of [Python lark](https://github.com/lark-parser/lark) for more flexible constraining of model responses |
79+
6980
# [O-Series Reasoning Models](#tab/o-series)
7081

7182
| **Feature** | **codex-mini**, **2025-05-16** | **o3-pro**, **2025-06-10** | **o4-mini**, **2025-04-16** | **o3**, **2025-04-16** | **o3-mini**, **2025-01-31** |**o1**, **2024-12-17** | **o1-mini**, **2024-09-12** |
@@ -305,7 +316,7 @@ Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");
305316
## Reasoning effort
306317

307318
> [!NOTE]
308-
> Reasoning models have `reasoning_tokens` as part of `completion_tokens_details` in the model response. These are hidden tokens that aren't returned as part of the message response content but are used by the model to help generate a final answer to your request. `2024-12-01-preview` adds an additional new parameter `reasoning_effort` which can be set to `low`, `medium`, or `high` with the latest `o1` model. The higher the effort setting, the longer the model will spend processing the request, which will generally result in a larger number of `reasoning_tokens`.
319+
> Reasoning models have `reasoning_tokens` as part of `completion_tokens_details` in the model response. These are hidden tokens that aren't returned as part of the message response content but are used by the model to help generate a final answer to your request. `reasoning_effort` can be set to `low`, `medium`, or `high` for all reasoning models except `o1-mini`. GPT-5 reasoning models support a new `reasoning_effort` setting of `minimal`. The higher the effort setting, the longer the model will spend processing the request, which will generally result in a larger number of `reasoning_tokens`.
309320
310321
## Developer messages
311322

@@ -548,6 +559,65 @@ curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses?ap
548559
}
549560
```
550561

562+
## Python lark
563+
564+
GPT-5 series reasoning models have the ability to call a new `custom_tool` called `lark_tool`. This tool is based on [Python lark](https://github.com/lark-parser/lark) and can be used for more flexible constraining of model output.
565+
566+
### Chat Completions
567+
568+
```json
569+
{
570+
"messages": [
571+
{
572+
"role": "user",
573+
"content": "Which one is larger, 42 or 0?"
574+
}
575+
],
576+
"tools": [
577+
{
578+
"type": "custom",
579+
"name": "custom_tool",
580+
"custom": {
581+
"name": "lark_tool",
582+
"format": {
583+
"type": "grammar",
584+
"grammar": {
585+
"syntax": "lark",
586+
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
587+
}
588+
}
589+
}
590+
}
591+
],
592+
"tool_choice": "required",
593+
"model": "gpt-5-2025-08-07"
594+
}
595+
```
596+
597+
### Responses API
598+
599+
```
600+
{
601+
"model": "gpt-5-2025-08-07",
602+
"input": "please calculate the area of a circle with radius equal to the number of 'r's in strawberry",
603+
"tools": [
604+
{
605+
"type": "custom",
606+
"name": "lark_tool",
607+
"format": {
608+
"type": "grammar",
609+
"syntax": "lark",
610+
"definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
611+
}
612+
}
613+
],
614+
"tool_choice": "required"
615+
}
616+
```
617+
618+
619+
620+
551621
## Markdown output
552622

553623
By default the `o3-mini` and `o1` models will not attempt to produce output that includes markdown formatting. A common use case where this behavior is undesirable is when you want the model to output code contained within a markdown code block. When the model generates output without markdown formatting you lose features like syntax highlighting, and copyable code blocks in interactive playground experiences. To override this new default behavior and encourage markdown inclusion in model responses, add the string `Formatting re-enabled` to the beginning of your developer message.

0 commit comments

Comments
 (0)