You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/includes/api-versions/latest-inference-preview.md
+9-18Lines changed: 9 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4647,7 +4647,7 @@ Creates a model response.
4647
4647
| max_output_tokens | integer | An upper bound for the number of tokens that can be generated for a response, including visible output tokens and conversation state.<br> | No ||
4648
4648
| parallel_tool_calls | boolean | Whether to allow the model to run tool calls in parallel.<br> | No | True |
4649
4649
| previous_response_id | string | The unique ID of the previous response to the model. Use this to create multi-turn conversations. Learn more about conversation state.<br> | No ||
4650
-
| reasoning |[Reasoning](#reasoning)|**o-series models only**<br><br>Configuration options for reasoning models.<br>| No ||
4650
+
| reasoning |[Reasoning](#reasoning)|Configuration options for reasoning models. | No ||
4651
4651
| store | boolean | Whether to store the generated model response for later retrieval via API.<br> | No | True |
4652
4652
| stream | boolean | If set to true, the model response data will be streamed to the client as it is generated using [server-sent events](https://developer.mozilla.org/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format).<br>See the Streaming section below for more information.<br> | No | False |
4653
4653
| text | object | Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:<br>- Text inputs and outputs<br>- Structured Outputs | No ||
@@ -8408,7 +8408,7 @@ An x/y coordinate pair, e.g. `{ x: 100, y: 200 }`.
8408
8408
| max_output_tokens | integer | An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.<br> | No ||
8409
8409
| parallel_tool_calls | boolean | Whether to allow the model to run tool calls in parallel.<br> | No | True |
8410
8410
| previous_response_id | string | The unique ID of the previous response to the model. Use this to create multi-turn conversations. | No ||
8411
-
| reasoning |[Reasoning](#reasoning)|**o-series models only**<br><br>Configuration options for reasoning models.<br>| No ||
8411
+
| reasoning |[Reasoning](#reasoning)| Configuration options for reasoning models. | No ||
8412
8412
| store | boolean | Whether to store the generated model response for later retrieval via API.<br> | No | True |
8413
8413
| stream | boolean | If set to true, the model response data will be streamed to the client as it is generated using [server-sent events](https://developer.mozilla.org/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format). | No | False |
8414
8414
| text | object | Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:<br>- text inputs and outputs<br>- Structured Outputs<br> | No ||
@@ -8922,16 +8922,13 @@ When a session is created on the server via REST API, the session object also co
8922
8922
8923
8923
### Reasoning
8924
8924
8925
-
**o-series models only**
8926
-
8927
-
Configuration options for
8928
-
reasoning models.
8925
+
Configuration options for reasoning models.
8929
8926
8930
8927
8931
8928
| Name | Type | Description | Required | Default |
| effort |[ReasoningEffort](#reasoningeffort)|**o-series models only** <br><br>Constrains effort on reasoning for reasoning models.<br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br> | Yes | medium |
8934
-
| summary | enum |**o-series models only** <br><br>A summary of the reasoning performed by the model. This can be useful for debugging and understanding the model's reasoning process.<br>One of `concise` or `detailed`.<br><br>Possible values: `concise`, `detailed`| No ||
8930
+
| effort |[ReasoningEffort](#reasoningeffort)| Constrains effort on reasoning for reasoning models.<br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br> | Yes | medium |
8931
+
| summary | enum | A summary of the reasoning performed by the model. This can be useful for debugging and understanding the model's reasoning process.<br>One of `concise` or `detailed`.<br><br>Possible values: `concise`, `detailed`| No ||
8935
8932
8936
8933
### ReasoningItem
8937
8934
@@ -8974,7 +8971,7 @@ A refusal from the model.
8974
8971
| output_text | string | SDK-only convenience property that contains the aggregated text output from all `output_text` items in the `output` array, if any are present. <br>Supported in the Python and JavaScript SDKs.<br> | No ||
8975
8972
| parallel_tool_calls | boolean | Whether to allow the model to run tool calls in parallel.<br> | Yes | True |
8976
8973
| previous_response_id | string | The unique ID of the previous response to the model. Use this to create multi-turn conversations. | No ||
8977
-
| reasoning |[Reasoning](#reasoning)|**o-series models only**<br><br>Configuration options for reasoning models.<br> | No ||
8974
+
| reasoning |[Reasoning](#reasoning)| Configuration options for reasoning models.<br> | No ||
8978
8975
| status | enum | The status of the response generation. One of `completed`, `failed`, `in_progress`, or `incomplete`.<br><br>Possible values: `completed`, `failed`, `in_progress`, `incomplete`| No ||
8979
8976
| temperature | number | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.<br>We generally recommend altering this or `top_p` but not both.<br> | Yes | 1 |
8980
8977
| text | object | Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:<br>- text inputs and outputs<br>- Structured Outputs<br> | No ||
@@ -9295,7 +9292,7 @@ Emitted when an output item is marked done.
9295
9292
| instructions | string | Inserts a system (or developer) message as the first item in the model's context.<br><br>When using along with `previous_response_id`, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.<br> | No ||
9296
9293
| max_output_tokens | integer | An upper bound for the number of tokens that can be generated for a response, including visible output tokens and conversation state.<br> | No ||
9297
9294
| previous_response_id | string | The unique ID of the previous response to the model. Use this to create multi-turn conversations. | No ||
9298
-
| reasoning |[Reasoning](#reasoning)|**o-series models only**<br><br>Configuration options for reasoning models.<br> | No ||
9295
+
| reasoning |[Reasoning](#reasoning)| Configuration options for reasoning models.<br> | No ||
9299
9296
| text | object | Configuration options for a text response from the model. Can be plain text or structured JSON data. Learn more:<br>- text inputs and outputs<br>- Structured Outputs<br> | No ||
9300
9297
| └─ format |[TextResponseFormatConfiguration](#textresponseformatconfiguration)| An object specifying the format that the model must output.<br><br>Configuring `{ "type": "json_schema" }` enables Structured Outputs, which ensures the model matches your supplied JSON schema. The default format is `{ "type": "text" }` with no additional options.<br><br>**Not recommended for gpt-4o and newer models:**<br><br>Setting to `{ "type": "json_object" }` enables the older JSON mode, which ensures the message the model generates is valid JSON. Using `json_schema` is preferred for models that support it.<br> | No ||
9301
9298
| tool_choice |[ToolChoiceOptions](#toolchoiceoptions) or [ToolChoiceTypes](#toolchoicetypes) or [ToolChoiceFunction](#toolchoicefunction)| How the model should select which tool (or tools) to use when generating a response. See the `tools` parameter to see how to specify which tools the model can call.<br> | No ||
@@ -9618,18 +9615,12 @@ A wait action.
9618
9615
9619
9616
### ReasoningEffort
9620
9617
9621
-
**o-series models only**
9622
-
9623
-
Constrains effort on reasoning for
9624
-
reasoning models.
9625
-
Currently supported values are `low`, `medium`, and `high`. Reducing
9626
-
reasoning effort can result in faster responses and fewer tokens used
9627
-
on reasoning in a response.
9618
+
Constrains effort on reasoning for reasoning models. Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
9628
9619
9629
9620
9630
9621
| Property | Value |
9631
9622
|----------|-------|
9632
-
|**Description**|**o-series models only** <br><br>Constrains effort on reasoning for reasoning models.<br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br> |
9623
+
|**Description**| Constrains effort on reasoning for reasoning models.<br>Currently supported values are `low`, `medium`, and `high`. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.<br> |
0 commit comments