|
6 | 6 | "source": [
|
7 | 7 | "## Better performance from reasoning models using the Responses API \n",
|
8 | 8 | "\n",
|
9 |
| - "We've recently released two new state-of-the-art reasoning models, o3 and o4-mini, that excel at combining reasoning capabilities with agentic tool use. What a lot of folks don't know is that you can improve their performance by fully leveraging our (relatively) new Responses API. This cookbook aims to demonstrate how you might be able to get the most out of the two models and dive a little deeper into the details of how reasoning and function calling work for these models behind the scenes. By giving the model access to previous reasoning items, we can ensure it is operating at maximum model intelligence and lowest cost.\n" |
| 9 | + "Overview: By leveraging the Responses API with OpenAI’s latest reasoning models, you can unlock higher intelligence, lower costs, and more efficient token usage in your applications. The API also enables access to reasoning summaries, supports features like hosted-tool use, and is designed to accommodate upcoming enhancements for even greater flexibility and performance.\n", |
| 10 | + "\n", |
| 11 | + "We've recently released two new state-of-the-art reasoning models, o3 and o4-mini, that excel at combining reasoning capabilities with agentic tool use. What many folks don't know is that you can improve their performance by fully leveraging our (relatively) new Responses API. This cookbook shows how to get the most out of these models and explores how reasoning and function calling work behind the scenes. By giving the model access to previous reasoning items, we can ensure it operates at maximum intelligence and lowest cost.\n" |
10 | 12 | ]
|
11 | 13 | },
|
12 | 14 | {
|
13 | 15 | "cell_type": "markdown",
|
14 | 16 | "metadata": {},
|
15 | 17 | "source": [
|
16 |
| - "We've introduced the Responses API during its launch with a separate [cookbook](https://cookbook.openai.com/examples/responses_api/responses_example) along with the [API reference](https://platform.openai.com/docs/api-reference/responses). The short takeaway is that by design the Responses API isn't that different from the Completions API with a few improvements and added features. We've recently rolled out encrypted content for Responses, which we will also get into here, which will make it even more useful for folks who cannot use Responses API in a stateful way!" |
| 18 | + "We introduced the Responses API with a separate [cookbook](https://cookbook.openai.com/examples/responses_api/responses_example) and [API reference](https://platform.openai.com/docs/api-reference/responses). The main takeaway: the Responses API is similar to the Completions API, but with improvements and added features. We've also rolled out encrypted content for Responses, making it even more useful for those who can't use the API in a stateful way!" |
17 | 19 | ]
|
18 | 20 | },
|
19 | 21 | {
|
|
22 | 24 | "source": [
|
23 | 25 | "## How Reasoning Models work\n",
|
24 | 26 | "\n",
|
25 |
| - "Before we dive into how the Responses API can help us, it is useful to first review how [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) work behind the scenes. Reasoning models like o3 and o4-mini take time to think through a problem before answering. Through this thinking process, the model is able to break a complex problem down and work through it step by step, increasing its performance on these tasks. During the thinking process, the models produce a long internal chain of thought that encodes the reasoning logic for the problem. For safety reasons, the reasoning tokens are only exposed to end users in summarized form rather than in raw form." |
| 27 | + "Before we dive into how the Responses API can help, let's quickly review how [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) work. Models like o3 and o4-mini break problems down step by step, producing an internal chain of thought that encodes their reasoning. For safety, these reasoning tokens are only exposed to users in summarized form." |
26 | 28 | ]
|
27 | 29 | },
|
28 | 30 | {
|
|
31 | 33 | "source": [
|
32 | 34 | "In a multistep conversation, the reasoning tokens are discarded after each turn while input and output tokens from each step are fed into the next\n",
|
33 | 35 | "\n",
|
34 |
| - "\n", |
| 36 | + "\n", |
35 | 37 | "Diagram borrowed from our [doc](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#how-reasoning-works)"
|
36 | 38 | ]
|
37 | 39 | },
|
|
44 | 46 | },
|
45 | 47 | {
|
46 | 48 | "cell_type": "code",
|
47 |
| - "execution_count": 2, |
| 49 | + "execution_count": 3, |
48 | 50 | "metadata": {},
|
49 | 51 | "outputs": [],
|
50 | 52 | "source": [
|
|
151 | 153 | "cell_type": "markdown",
|
152 | 154 | "metadata": {},
|
153 | 155 | "source": [
|
154 |
| - "You can see from the JSON dump of the response object that, in addition to the `output_text`, there is a reasoning item that was also produced from this single API call. This represents the reasoning tokens produced by the model. By default, it is exposed as an ID; in this instance, it is `rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7`. Since the Responses API is stateful as well, the reasoning token is persisted—all you have to do is include these items along with their associated IDs in subsequent messages for subsequent responses to have access to the same reasoning items. If you use `previous_response_id` for multi-turn conversations, the model will also have access to all the reasoning items produced previously.\n", |
| 156 | + "From the JSON dump of the response object, you can see that in addition to the `output_text`, the model also produces a reasoning item. This item represents the model's internal reasoning tokens and is exposed as an ID—here, for example, `rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7`. Because the Responses API is stateful, these reasoning tokens persist: just include their IDs in subsequent messages to give future responses access to the same reasoning items. If you use `previous_response_id` for multi-turn conversations, the model will automatically have access to all previously produced reasoning items.\n", |
155 | 157 | "\n",
|
156 |
| - "Note, you can see how many reasoning tokens the model has produced from this response. With a total of 10 input tokens, we produced 148 output tokens, of which 128 are reasoning tokens that you don't see in the final assistant message." |
| 158 | + "You can also see how many reasoning tokens the model generated. For example, with 10 input tokens, the response included 148 output tokens—128 of which are reasoning tokens not shown in the final assistant message." |
157 | 159 | ]
|
158 | 160 | },
|
159 | 161 | {
|
160 | 162 | "cell_type": "markdown",
|
161 | 163 | "metadata": {},
|
162 | 164 | "source": [
|
163 |
| - "But wait! From the above diagram, didn’t you say that reasoning from previous turns is discarded? Then why does passing it back in matter for subsequent turns?\n", |
| 165 | + "Wait—didn’t the diagram show that reasoning from previous turns is discarded? So why bother passing it back in later turns?\n", |
164 | 166 | "\n",
|
165 |
| - "If you’ve been paying attention, you probably have that question. That is a great question. For normal multi-turn conversations, the inclusion of reasoning items and tokens is not necessary—the model is trained so that it does not need the reasoning tokens from previous turns to produce the best output. This changes when we consider the possibility of tool use. When we talk about a single turn, the turn may include function calls as well, even though it may involve an additional round trip outside of the API. In this instance, it is necessary to include the reasoning items (either via `previous_response_id` or by explicitly including the reasoning item in `input`). To illustrate this, let’s create a quick function-calling example." |
| 167 | + "Great question! In typical multi-turn conversations, you don’t need to include reasoning items or tokens—the model is trained to produce the best output without them. However, things change when tool use is involved. If a turn includes a function call (which may require an extra round trip outside the API), you do need to include the reasoning items—either via `previous_response_id` or by explicitly adding the reasoning item to `input`. Let’s see how this works with a quick function-calling example." |
166 | 168 | ]
|
167 | 169 | },
|
168 | 170 | {
|
|
223 | 225 | "cell_type": "markdown",
|
224 | 226 | "metadata": {},
|
225 | 227 | "source": [
|
226 |
| - "Here we see that after reasoning for a bit, the o4-mini model has decided that it needs additional information, which it can obtain by calling a function. We can go ahead and call the function and pass the output back to the model. The important thing to note here is that, in order for the model to have maximum intelligence, we need to pass the reasoning item back, which one can do simply by adding all of the output back into the context being passed back." |
| 228 | + "After some reasoning, the o4-mini model determines it needs more information and calls a function to get it. We can call the function and return its output to the model. Crucially, to maximize the model’s intelligence, we should include the reasoning item by simply adding all of the output back into the context for the next turn." |
227 | 229 | ]
|
228 | 230 | },
|
229 | 231 | {
|
|
269 | 271 | "cell_type": "markdown",
|
270 | 272 | "metadata": {},
|
271 | 273 | "source": [
|
272 |
| - "It is hard to illustrate the improved model intelligence in this toy example, since the model will probably still do the right thing with or without the reasoning item being included. So we ran some tests ourselves: in a more comprehensive benchmark like SWE-bench, we were able to get about **3% improvement** by including the reasoning items for the same prompt and setup." |
| 274 | + "While this toy example may not clearly show the benefits—since the model will likely perform well with or without the reasoning item—our own tests found otherwise. On a more rigorous benchmark like SWE-bench, including reasoning items led to about a **3% improvement** for the same prompt and setup." |
273 | 275 | ]
|
274 | 276 | },
|
275 | 277 | {
|
|
279 | 281 | "## Caching\n",
|
280 | 282 | "As illustrated above, reasoning models produce both reasoning tokens and completion tokens that are treated differently in the API today. This also has implications for cache utilization and latency. To illustrate the point, we include this helpful sketch.\n",
|
281 | 283 | "\n",
|
282 |
| - "" |
| 284 | + "" |
283 | 285 | ]
|
284 | 286 | },
|
285 | 287 | {
|
286 | 288 | "cell_type": "markdown",
|
287 | 289 | "metadata": {},
|
288 | 290 | "source": [
|
289 |
| - "Note that in turn 2, reasoning items from turn 1 will be ignored and stripped, since the model does not reuse reasoning items from previous turns. This is why it is impossible to get a full cache hit on the fourth API call in the diagram above, as the prompt now excludes the reasoning items. That being said, we can still include them without harm, as the API will automatically strip reasoning items that are irrelevant in the current turn. Keep in mind that caching will only become relevant for prompts that are longer than 1024 tokens in length. In our tests, we were able to get cache utilization to go from 40% to 80% of the input prompt by moving from Completions to the Responses API. With better cache utilization comes better economics, as cached tokens are billed significantly less than uncached ones: for `o4-mini`, cached input tokens are 75% cheaper than uncached input tokens. It will also improve latency as well." |
| 291 | + "In turn 2, reasoning items from turn 1 are ignored and stripped, since the model doesn't reuse reasoning items from previous turns. This makes it impossible to get a full cache hit on the fourth API call in the diagram above, as the prompt now omits those reasoning items. However, including them does no harm—the API will automatically remove any reasoning items that aren't relevant for the current turn. Note that caching only matters for prompts longer than 1024 tokens. In our tests, switching from Completions to the Responses API increased cache utilization from 40% to 80%. Better cache utilization means better economics, since cached tokens are billed much less: for `o4-mini`, cached input tokens are 75% cheaper than uncached ones. Latency also improves." |
290 | 292 | ]
|
291 | 293 | },
|
292 | 294 | {
|
|
295 | 297 | "source": [
|
296 | 298 | "## Encrypted Reasoning Items\n",
|
297 | 299 | "\n",
|
298 |
| - "For organizations that cannot use the Responses API in a stateful way due to compliance and data requirement constraints (e.g., if your organization is under [Zero Data Retention](https://openai.com/enterprise-privacy/)), we've recently rolled out [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items), which allow you to reap all the benefits mentioned above while continuing to use the Responses API in a stateless way.\n", |
| 300 | + "For organizations that can't use the Responses API statefully due to compliance or data retention requirements (such as [Zero Data Retention](https://openai.com/enterprise-privacy/)), we've introduced [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items). This lets you get all the benefits of reasoning items while keeping your workflow stateless.\n", |
299 | 301 | "\n",
|
300 |
| - "To leverage this, all you have to do is include `[\"reasoning.encrypted_content\"]` as part of the `include` field. By doing so, we will pass an encrypted version of the reasoning tokens to you, which you can then pass back just as you would with reasoning items before.\n", |
| 302 | + "To use this, simply add `[\"reasoning.encrypted_content\"]` to the `include` field. You'll receive an encrypted version of the reasoning tokens, which you can pass back to the API just as you would with regular reasoning items.\n", |
301 | 303 | "\n",
|
302 |
| - "If your organization is under Zero Data Retention (ZDR), OpenAI automatically enforces `store=false` settings at the API level. When a user’s request comes into the Responses API, we first check for any `encrypted_content` included in the payload. If present, this content is decrypted in-memory using keys to which only OpenAI has access. This decrypted reasoning content (i.e., chain-of-thought) is never written to disk and is used solely to inform the model’s next response. Once the model generates its output, any new reasoning tokens it produces are immediately encrypted and returned to the client as part of the response payload. At that point, all transient data from the request—including both decrypted inputs and model outputs—is securely discarded. No intermediate state is persisted to disk, ensuring full compliance with ZDR.\n", |
| 304 | + "For Zero Data Retention (ZDR) organizations, OpenAI enforces `store=false` at the API level. When a request arrives, the API checks for any `encrypted_content` in the payload. If present, it's decrypted in-memory using keys only OpenAI can access. This decrypted reasoning (chain-of-thought) is never written to disk and is used only for generating the next response. Any new reasoning tokens are immediately encrypted and returned to you. All transient data—including decrypted inputs and model outputs—is securely discarded after the response, with no intermediate state persisted, ensuring full ZDR compliance.\n", |
303 | 305 | "\n",
|
304 |
| - "Here is a quick modified version of the above code snippet to demonstrate this:" |
| 306 | + "Here’s a quick update to the earlier code snippet to show how this works:" |
305 | 307 | ]
|
306 | 308 | },
|
307 | 309 | {
|
|
393 | 395 | "\n",
|
394 | 396 | "Now you should be fully equipped with the knowledge to fully utilize our latest reasoning models!"
|
395 | 397 | ]
|
| 398 | + }, |
| 399 | + { |
| 400 | + "cell_type": "markdown", |
| 401 | + "metadata": {}, |
| 402 | + "source": [ |
| 403 | + "## Reasoning Summaries\n", |
| 404 | + "\n", |
| 405 | + "Another useful feature in the Responses API is that it supports reasoning summaries. While we do not expose the raw chain of thought tokens, users can access their [summaries](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#reasoning-summaries)." |
| 406 | + ] |
| 407 | + }, |
| 408 | + { |
| 409 | + "cell_type": "code", |
| 410 | + "execution_count": 9, |
| 411 | + "metadata": {}, |
| 412 | + "outputs": [ |
| 413 | + { |
| 414 | + "name": "stdout", |
| 415 | + "output_type": "stream", |
| 416 | + "text": [ |
| 417 | + "First reasoning summary text:\n", |
| 418 | + " **Analyzing biological processes**\n", |
| 419 | + "\n", |
| 420 | + "I think the user is looking for a clear explanation of the differences between certain processes. I should create a side-by-side comparison that lists out key elements like the formulas, energy flow, locations, reactants, products, organisms involved, electron carriers, and whether the processes are anabolic or catabolic. This structured approach will help in delivering a comprehensive answer. It’s crucial to cover all aspects to ensure the user understands the distinctions clearly.\n" |
| 421 | + ] |
| 422 | + } |
| 423 | + ], |
| 424 | + "source": [ |
| 425 | + "# Make a hard call to o3 with reasoning summary included\n", |
| 426 | + "\n", |
| 427 | + "response = client.responses.create(\n", |
| 428 | + " model=\"o3\",\n", |
| 429 | + " input=\"What are the main differences between photosynthesis and cellular respiration?\",\n", |
| 430 | + " reasoning={\"summary\": \"auto\"},\n", |
| 431 | + "\n", |
| 432 | + " \n", |
| 433 | + ")\n", |
| 434 | + "\n", |
| 435 | + "# Extract the first reasoning summary text from the response object\n", |
| 436 | + "first_reasoning_item = response.output[0] # Should be a ResponseReasoningItem\n", |
| 437 | + "first_summary_text = first_reasoning_item.summary[0].text if first_reasoning_item.summary else None\n", |
| 438 | + "print(\"First reasoning summary text:\\n\", first_summary_text)\n", |
| 439 | + "\n" |
| 440 | + ] |
| 441 | + }, |
| 442 | + { |
| 443 | + "cell_type": "markdown", |
| 444 | + "metadata": {}, |
| 445 | + "source": [ |
| 446 | + "Reasoning summary text enables you to design user experiences where users can peek into the model's thought process. For example, in conversations involving multiple function calls, users can see not only which function calls are made, but also the reasoning behind each tool call—without having to wait for the final assistant message. This provides greater transparency and interactivity in your application's UX." |
| 447 | + ] |
| 448 | + }, |
| 449 | + { |
| 450 | + "cell_type": "markdown", |
| 451 | + "metadata": {}, |
| 452 | + "source": [ |
| 453 | + "## Conclusion\n", |
| 454 | + "\n", |
| 455 | + "By leveraging the OpenAI Responses API and the latest reasoning models, you can unlock higher intelligence, improved transparency, and greater efficiency in your applications. Whether you’re utilizing reasoning summaries, encrypted reasoning items for compliance, or optimizing for cost and latency, these tools empower you to build more robust and interactive AI experiences.\n", |
| 456 | + "\n", |
| 457 | + "Happy building!" |
| 458 | + ] |
396 | 459 | }
|
397 | 460 | ],
|
398 | 461 | "metadata": {
|
|
0 commit comments