From c1d629fa5889486d14e8269c7da99b8288236f00 Mon Sep 17 00:00:00 2001 From: Matt Habermehl Date: Thu, 7 Aug 2025 17:21:27 -0400 Subject: [PATCH 1/4] housekeeping --- examples/gpt-5/gpt-5_prompting_guide.ipynb | 35 +++++++++++----------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/examples/gpt-5/gpt-5_prompting_guide.ipynb b/examples/gpt-5/gpt-5_prompting_guide.ipynb index 3e09cbf05c..d7dc4f9bb3 100644 --- a/examples/gpt-5/gpt-5_prompting_guide.ipynb +++ b/examples/gpt-5/gpt-5_prompting_guide.ipynb @@ -8,7 +8,7 @@ "\n", "GPT-5, our newest flagship model, represents a substantial leap forward in agentic task performance, coding, raw intelligence, and steerability.\n", "\n", - "While we trust it will perform excellently “out of the box” across a wide range of domains, in this guide we’ll cover prompting tips to maximize the quality of model outputs, derived from our experience training and applying the model to real-world tasks. We discuss concepts like improving agentic task performance, ensuring instruction adherence, making use of newly API features, and optimizing coding for frontend and software engineering tasks - with key insights into AI code editor Cursor’s prompt tuning work with GPT-5.\n", + "While we trust it will perform excellently “out of the box” across a wide range of domains, in this guide we’ll cover prompting tips to maximize the quality of model outputs, derived from our experience training and applying the model to real-world tasks. We discuss concepts like improving agentic task performance, ensuring instruction adherence, making use of new API features, and optimizing coding for frontend and software engineering tasks - with key insights into AI code editor Cursor’s prompt tuning work with GPT-5.\n", "\n", "We’ve seen significant gains from applying these best practices and adopting our canonical tools whenever possible, and we hope that this guide, along with the [prompt optimizer tool](http://platform.openai.com/chat/edit?optimize=true) we’ve built, will serve as a launchpad for your use of GPT-5. But, as always, remember that prompting is not a one-size-fits-all exercise - we encourage you to run experiments and iterate on the foundation offered here to find the best solution for your problem." ] @@ -17,7 +17,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Agentic workflow predictability \n", + "## Agentic workflow predictability\n", "\n", "We trained GPT-5 with developers in mind: we’ve focused on improving tool calling, instruction following, and long-context understanding to serve as the best foundation model for agentic applications. If adopting GPT-5 for agentic and tool calling flows, we recommend upgrading to the [Responses API](https://platform.openai.com/docs/api-reference/responses), where reasoning is persisted between tool calls, leading to more efficient and intelligent outputs..\n", "\n", @@ -25,7 +25,7 @@ "Agentic scaffolds can span a wide spectrum of control—some systems delegate the vast majority of decision-making to the underlying model, while others keep the model on a tight leash with heavy programmatic logical branching. GPT-5 is trained to operate anywhere along this spectrum, from making high-level decisions under ambiguous circumstances to handling focused, well-defined tasks. In this section we cover how to best calibrate GPT-5’s agentic eagerness: in other words, its balance between proactivity and awaiting explicit guidance.\n", "\n", "#### Prompting for less eagerness\n", - "GPT-5 is, by default, thorough and comprehensive when trying to gather context in an agentic environment to ensure it will produce a correct answer. To reduce the scope of GPT-5’s agentic behavior—including limiting tangential tool-calling action and minimizing latency to reach a final answer—try the following: \n", + "GPT-5 is, by default, thorough and comprehensive when trying to gather context in an agentic environment to ensure it will produce a correct answer. To reduce the scope of GPT-5’s agentic behavior — including limiting tangential tool-calling and minimizing latency to reach a final answer — try the following:\n", "- Switch to a lower `reasoning_effort`. This reduces exploration depth but improves efficiency and latency. Many workflows can be accomplished with consistent results at medium or even low `reasoning_effort`.\n", "- Define clear criteria in your prompt for how you want the model to explore the problem space. This reduces the model’s need to explore and reason about too many ideas:\n", "\n", @@ -83,12 +83,13 @@ "### Tool Preambles\n", "We recognize that on agentic trajectories monitored by users, intermittent model updates on what it’s doing with its tool calls and why can provide for a much better interactive user experience - the longer the rollout, the bigger the difference these updates make. To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages. \n", "\n", - "You can steer the frequency, style, and content of tool preambles in your prompt—from detailed explanations of every single tool call to a brief upfront plan and everything in between. This is an example of a high-quality preamble prompt:\n", + "You can steer the frequency, style, and content of tool preambles in your prompt — from detailed explanations of every single tool call to a brief upfront plan and everything in between. This is an example of a high-quality preamble prompt:\n", "\n", "```\n", "\n", - "- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.\n", - "- Then, immediately outline a structured plan detailing each logical step you’ll follow. - As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly. \n", + "- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner before calling any tools.\n", + "- Then, immediately outline a structured plan detailing each logical step you’ll follow.\n", + "- As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly.\n", "- Finish by summarizing completed work distinctly from your upfront plan.\n", "\n", "```\n", @@ -147,7 +148,7 @@ "metadata": {}, "source": [ "## Maximizing coding performance, from planning to execution\n", - "GPT-5 leads all frontier models in coding capabilities: it can work in large codebases to fix bugs, handle large diffs, and implement multi-file refactors or large new features. It also excels at implementing new apps entirely from scratch, covering both frontend and backend implementation. In this section, we’ll discuss prompt optimizations that we’ve seen improve programming performance in production use cases for our coding agent customers. \n", + "GPT-5 leads all frontier models in coding capabilities: it can work in large codebases to fix bugs, handle large diffs, and implement multi-file refactors or large new features. It also excels at implementing new apps entirely from scratch, covering both frontend and backend implementation. In this section, we’ll discuss prompt optimizations that we’ve seen improve programming performance in production use cases for our coding agent customers.\n", "\n", "### Frontend app development\n", "GPT-5 is trained to have excellent baseline aesthetic taste alongside its rigorous implementation abilities. We’re confident in its ability to use all types of web development frameworks and packages; however, for new apps, we recommend using the following frameworks and packages to get the most out of the model's frontend capabilities:\n", @@ -170,7 +171,7 @@ "```\n", "\n", "#### Matching codebase design standards\n", - "When implementing incremental changes and refactors in existing apps, model-written code should adhere to existing style and design standards, and “blend in” to the codebase as neatly as possible. Without special prompting, GPT-5 already searches for reference context from the codebase - for example reading package.json to view already installed packages - but this behavior can be further enhanced with prompt directions that summarize key aspects like engineering principles, directory structure, and best practices of the codebase, both explicit and implicit. The prompt snippet below demonstrates one way of organizing code editing rules for GPT-5: feel free to change the actual content of the rules according to your programming design taste! \n", + "When implementing incremental changes and refactors in existing apps, model-written code should adhere to existing style and design standards, and “blend in” to the codebase as neatly as possible. Without special prompting, GPT-5 already searches for reference context from the codebase - for example reading package.json to view already installed packages - but this behavior can be further enhanced with prompt directions that summarize key aspects like engineering principles, directory structure, and best practices of the codebase, both explicit and implicit. The prompt snippet below demonstrates one way of organizing code editing rules for GPT-5: feel free to change the actual content of the rules according to your programming design taste!\n", "\n", "```\n", "\n", @@ -188,7 +189,7 @@ "- UI Components: shadcn/ui\n", "- Icons: Lucide\n", "- State Management: Zustand\n", - "- Directory Structure: \n", + "- Directory Structure:\n", "\\`\\`\\`\n", "/src\n", " /app\n", @@ -205,7 +206,7 @@ "\n", "\n", "- Visual Hierarchy: Limit typography to 4–5 font sizes and weights for consistent hierarchy; use `text-xs` for captions and annotations; avoid `text-xl` unless for hero or major headings.\n", - "- Color Usage: Use 1 neutral base (e.g., `zinc`) and up to 2 accent colors. \n", + "- Color Usage: Use 1 neutral base (e.g., `zinc`) and up to 2 accent colors.\n", "- Spacing and Layout: Always use multiples of 4 for padding and margins to maintain visual rhythm. Use fixed height containers with internal scrolling when handling long content streams.\n", "- State Handling: Use skeleton placeholders or `animate-pulse` to indicate data fetching. Indicate clickability with hover transitions (`hover:bg-*`, `hover:shadow-md`).\n", "- Accessibility: Use semantic HTML and ARIA roles where appropriate. Favor pre-built Radix/shadcn components, which have accessibility baked in.\n", @@ -218,7 +219,7 @@ "We’re proud to have had AI code editor Cursor as a trusted alpha tester for GPT-5: below, we show a peek into how Cursor tuned their prompts to get the most out of the model’s capabilities. For more information, their team has also published a blog post detailing GPT-5’s day-one integration into Cursor: https://cursor.com/blog/gpt-5\n", "\n", "#### System prompt and parameter tuning\n", - "Cursor’s system prompt focuses on reliable tool calling, balancing verbosity and autonomous behavior while giving users the ability to configure custom instructions. Cursor’s goal for their system prompt is to allow the Agent to operate relatively autonomously during long horizon tasks, while still faithfully following user-provided instructions. \n", + "Cursor’s system prompt focuses on reliable tool calling, balancing verbosity and autonomous behavior while giving users the ability to configure custom instructions. Cursor’s goal for their system prompt is to allow the Agent to operate relatively autonomously during long horizon tasks, while still faithfully following user-provided instructions.\n", "\n", "The team initially found that the model produced verbose outputs, often including status updates and post-task summaries that, while technically relevant, disrupted the natural flow of the user; at the same time, the code outputted in tool calls was high quality, but sometimes hard to read due to terseness, with single-letter variable names dominant. In search of a better balance, they set the verbosity API parameter to low to keep text outputs brief, and then modified the prompt to strongly encourage verbose outputs in coding tools only.\n", "\n", @@ -267,7 +268,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Optimizing intelligence and instruction-following \n", + "## Optimizing intelligence and instruction-following\n", "\n", "### Steering\n", "As our most steerable model yet, GPT-5 is extraordinarily receptive to prompt instructions surrounding verbosity, tone, and tool calling behavior.\n", @@ -286,7 +287,7 @@ "You are CareFlow Assistant, a virtual admin for a healthcare startup that schedules patients based on priority and symptoms. Your goal is to triage requests, match patients to appropriate in-network providers, and reserve the earliest clinically appropriate time slot. Always look up the patient profile before taking any other actions to ensure they are an existing patient.\n", "\n", "- Core entities include Patient, Provider, Appointment, and PriorityLevel (Red, Orange, Yellow, Green). Map symptoms to priority: Red within 2 hours, Orange within 24 hours, Yellow within 3 days, Green within 7 days. When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.\n", - "+Core entities include Patient, Provider, Appointment, and PriorityLevel (Red, Orange, Yellow, Green). Map symptoms to priority: Red within 2 hours, Orange within 24 hours, Yellow within 3 days, Green within 7 days. When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step. \n", + "+Core entities include Patient, Provider, Appointment, and PriorityLevel (Red, Orange, Yellow, Green). Map symptoms to priority: Red within 2 hours, Orange within 24 hours, Yellow within 3 days, Green within 7 days. When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.\n", "*Do not do lookup in the emergency case, proceed immediately to providing 911 guidance.*\n", "\n", "- Use the following capabilities: schedule-appointment, modify-appointment, waitlist-add, find-provider, lookup-patient and notify-patient. Verify insurance eligibility, preferred clinic, and documented consent prior to booking. Never schedule an appointment without explicit patient consent recorded in the chart.\n", @@ -308,9 +309,9 @@ "Perhaps unsurprisingly, we recommend prompting patterns that are similar to [GPT-4.1 for best results](https://cookbook.openai.com/examples/gpt4-1_prompting_guide). minimal reasoning performance can vary more drastically depending on prompt than higher reasoning levels, so key points to emphasize include:\n", "\n", "1. Prompting the model to give a brief explanation summarizing its thought process at the start of the final answer, for example via a bullet point list, improves performance on tasks requiring higher intelligence.\n", - "2. Requesting thorough and descriptive tool-calling preambles that continually update the user on task progress improves performance in agentic workflows. \n", + "2. Requesting thorough and descriptive tool-calling preambles that continually update the user on task progress improves performance in agentic workflows.\n", "3. Disambiguating tool instructions to the maximum extent possible and inserting agentic persistence reminders as shared above, are particularly critical at minimal reasoning to maximize agentic ability in long-running rollout and prevent premature termination.\n", - "4. Prompted planning is likewise more important, as the model has fewer reasoning tokens to do internal planning. Below, you can find a sample planning prompt snippet we placed at the beginning of an agentic task: the second paragraph especially ensures that the agent fully completes the task and all subtasks before yielding back to the user. \n", + "4. Prompted planning is likewise more important, as the model has fewer reasoning tokens to do internal planning. Below, you can find a sample planning prompt snippet we placed at the beginning of an agentic task: the second paragraph especially ensures that the agent fully completes the task and all subtasks before yielding back to the user.\n", "\n", "```\n", "Remember, you are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Decompose the user's query into all required sub-request, and confirm that each is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure that the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.\n", @@ -337,7 +338,7 @@ "\n", "Here's a prompt: [PROMPT]\n", "\n", - "The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings? \n", + "The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?\n", "```" ] }, @@ -363,7 +364,7 @@ "IMPORTANT: not all tests are visible to you in the repository, so even on problems you think are relatively straightforward, you must double and triple check your solutions to ensure they pass any edge cases that are covered in the hidden tests, not just the visible ones.\n", "```\n", "\n", - "Agentic coding tool definitions \n", + "Agentic coding tool definitions\n", "```\n", "## Set 1: 4 functions, no terminal\n", "\n", From a3194db1b80c456d2bde8a73af183b38b8400179 Mon Sep 17 00:00:00 2001 From: Matt Habermehl Date: Thu, 7 Aug 2025 17:53:19 -0400 Subject: [PATCH 2/4] emdash consistency --- examples/gpt-5/gpt-5_prompting_guide.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/examples/gpt-5/gpt-5_prompting_guide.ipynb b/examples/gpt-5/gpt-5_prompting_guide.ipynb index d7dc4f9bb3..7831a89d18 100644 --- a/examples/gpt-5/gpt-5_prompting_guide.ipynb +++ b/examples/gpt-5/gpt-5_prompting_guide.ipynb @@ -25,7 +25,7 @@ "Agentic scaffolds can span a wide spectrum of control—some systems delegate the vast majority of decision-making to the underlying model, while others keep the model on a tight leash with heavy programmatic logical branching. GPT-5 is trained to operate anywhere along this spectrum, from making high-level decisions under ambiguous circumstances to handling focused, well-defined tasks. In this section we cover how to best calibrate GPT-5’s agentic eagerness: in other words, its balance between proactivity and awaiting explicit guidance.\n", "\n", "#### Prompting for less eagerness\n", - "GPT-5 is, by default, thorough and comprehensive when trying to gather context in an agentic environment to ensure it will produce a correct answer. To reduce the scope of GPT-5’s agentic behavior — including limiting tangential tool-calling and minimizing latency to reach a final answer — try the following:\n", + "GPT-5 is, by default, thorough and comprehensive when trying to gather context in an agentic environment to ensure it will produce a correct answer. To reduce the scope of GPT-5’s agentic behavior—including limiting tangential tool-calling and minimizing latency to reach a final answer—try the following:\n", "- Switch to a lower `reasoning_effort`. This reduces exploration depth but improves efficiency and latency. Many workflows can be accomplished with consistent results at medium or even low `reasoning_effort`.\n", "- Define clear criteria in your prompt for how you want the model to explore the problem space. This reduces the model’s need to explore and reason about too many ideas:\n", "\n", @@ -73,17 +73,17 @@ "\n", "- You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.\n", "- Only terminate your turn when you are sure that the problem is solved.\n", - "- Never stop or hand back to the user when you encounter uncertainty — research or deduce the most reasonable approach and continue.\n", - "- Do not ask the human to confirm or clarify assumptions, as you can always adjust later — decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting\n", + "- Never stop or hand back to the user when you encounter uncertainty—research or deduce the most reasonable approach and continue.\n", + "- Do not ask the human to confirm or clarify assumptions, as you can always adjust later. Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting\n", "\n", "```\n", "\n", "Generally, it can be helpful to clearly state the stop conditions of the agentic tasks, outline safe versus unsafe actions, and define when, if ever, it’s acceptable for the model to hand back to the user. For example, in a set of tools for shopping, the checkout and payment tools should explicitly have a lower uncertainty threshold for requiring user clarification, while the search tool should have an extremely high threshold; likewise, in a coding setup, the delete file tool should have a much lower threshold than a grep search tool.\n", "\n", "### Tool Preambles\n", - "We recognize that on agentic trajectories monitored by users, intermittent model updates on what it’s doing with its tool calls and why can provide for a much better interactive user experience - the longer the rollout, the bigger the difference these updates make. To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages. \n", + "We recognize that on agentic trajectories monitored by users, intermittent model updates on what it’s doing with its tool calls and why can provide for a much better interactive user experience-the longer the rollout, the bigger the difference these updates make. To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages. \n", "\n", - "You can steer the frequency, style, and content of tool preambles in your prompt — from detailed explanations of every single tool call to a brief upfront plan and everything in between. This is an example of a high-quality preamble prompt:\n", + "You can steer the frequency, style, and content of tool preambles in your prompt—from detailed explanations of every single tool call to a brief upfront plan and everything in between. This is an example of a high-quality preamble prompt:\n", "\n", "```\n", "\n", @@ -544,8 +544,8 @@ "\n", "\n", "You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.\n", - "- Never stop at uncertainty — research or deduce the most reasonable approach and continue.\n", - "- Do not ask the human to confirm assumptions — document them, act on them, and adjust mid-task if proven wrong.\n", + "- Never stop at uncertainty—research or deduce the most reasonable approach and continue.\n", + "- Do not ask the human to confirm assumptions—document them, act on them, and adjust mid-task if proven wrong.\n", "\n", "\n", "\n", From cb9fa45327beb05d9067f0e2d461c3e18a67e862 Mon Sep 17 00:00:00 2001 From: Matt Habermehl Date: Thu, 7 Aug 2025 17:56:58 -0400 Subject: [PATCH 3/4] further consistency fixes --- examples/gpt-5/gpt-5_prompting_guide.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/gpt-5/gpt-5_prompting_guide.ipynb b/examples/gpt-5/gpt-5_prompting_guide.ipynb index 7831a89d18..31c07580c3 100644 --- a/examples/gpt-5/gpt-5_prompting_guide.ipynb +++ b/examples/gpt-5/gpt-5_prompting_guide.ipynb @@ -71,7 +71,7 @@ "\n", "```\n", "\n", - "- You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.\n", + "- You are an agent—please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.\n", "- Only terminate your turn when you are sure that the problem is solved.\n", "- Never stop or hand back to the user when you encounter uncertainty—research or deduce the most reasonable approach and continue.\n", "- Do not ask the human to confirm or clarify assumptions, as you can always adjust later. Decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting\n", @@ -81,7 +81,7 @@ "Generally, it can be helpful to clearly state the stop conditions of the agentic tasks, outline safe versus unsafe actions, and define when, if ever, it’s acceptable for the model to hand back to the user. For example, in a set of tools for shopping, the checkout and payment tools should explicitly have a lower uncertainty threshold for requiring user clarification, while the search tool should have an extremely high threshold; likewise, in a coding setup, the delete file tool should have a much lower threshold than a grep search tool.\n", "\n", "### Tool Preambles\n", - "We recognize that on agentic trajectories monitored by users, intermittent model updates on what it’s doing with its tool calls and why can provide for a much better interactive user experience-the longer the rollout, the bigger the difference these updates make. To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages. \n", + "We recognize that on agentic trajectories monitored by users, intermittent model updates on what it’s doing with its tool calls and why can provide for a much better interactive user experience—the longer the rollout, the bigger the difference these updates make. To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages. \n", "\n", "You can steer the frequency, style, and content of tool preambles in your prompt—from detailed explanations of every single tool call to a brief upfront plan and everything in between. This is an example of a high-quality preamble prompt:\n", "\n", From 67525fe6da66dd566a07f08f0f1689f63f081823 Mon Sep 17 00:00:00 2001 From: Matt Habermehl Date: Thu, 7 Aug 2025 18:07:28 -0400 Subject: [PATCH 4/4] some I missed --- examples/gpt-5/gpt-5_prompting_guide.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/gpt-5/gpt-5_prompting_guide.ipynb b/examples/gpt-5/gpt-5_prompting_guide.ipynb index 31c07580c3..2ef137c4e1 100644 --- a/examples/gpt-5/gpt-5_prompting_guide.ipynb +++ b/examples/gpt-5/gpt-5_prompting_guide.ipynb @@ -171,7 +171,7 @@ "```\n", "\n", "#### Matching codebase design standards\n", - "When implementing incremental changes and refactors in existing apps, model-written code should adhere to existing style and design standards, and “blend in” to the codebase as neatly as possible. Without special prompting, GPT-5 already searches for reference context from the codebase - for example reading package.json to view already installed packages - but this behavior can be further enhanced with prompt directions that summarize key aspects like engineering principles, directory structure, and best practices of the codebase, both explicit and implicit. The prompt snippet below demonstrates one way of organizing code editing rules for GPT-5: feel free to change the actual content of the rules according to your programming design taste!\n", + "When implementing incremental changes and refactors in existing apps, model-written code should adhere to existing style and design standards, and “blend in” to the codebase as neatly as possible. Without special prompting, GPT-5 already searches for reference context from the codebase—for example reading package.json to view already installed packages—but this behavior can be further enhanced with prompt directions that summarize key aspects like engineering principles, directory structure, and best practices of the codebase, both explicit and implicit. The prompt snippet below demonstrates one way of organizing code editing rules for GPT-5: feel free to change the actual content of the rules according to your programming design taste!\n", "\n", "```\n", "\n",