Merge pull request #7237 from mrbullwinkle/mrb_09_23_2025_PM_requested_updates

prmerger-automator[bot] · web-flow · commit 33ded2af41c6 · 2025-09-23T19:34:52.000Z
[Azure OpenAI] Guidance updates
diff --git a/articles/ai-foundry/openai/concepts/prompt-engineering.md b/articles/ai-foundry/openai/concepts/prompt-engineering.md
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
 description: Learn how to use prompt engineering to optimize your work with Azure OpenAI.
 ms.service: azure-ai-openai
 ms.topic: conceptual
-ms.date: 07/02/2025
+ms.date: 09/23/2025
 ms.custom: references_regions, build-2023, build-2023-dataai
 manager: nitinme
 author: mrbullwinkle
@@ -14,11 +14,9 @@ recommendations: false
 
 # Prompt engineering techniques
 
-GPT-3, GPT-3.5, GPT-4, and GPT-4o models from OpenAI are prompt-based. With prompt-based models, the user interacts with the model by entering a text prompt, to which the model responds with a text completion. This completion is the model’s continuation of the input text. These techniques aren't recommended for o-series models.
+These techniques aren't recommended for reasoning models like gpt-5 and o-series models.
 
-While these models are extremely powerful, their behavior is also very sensitive to the prompt. This makes prompt construction an important skill to develop.
-
-Prompt construction can be difficult. In practice, the prompt acts to configure the model weights to complete the desired task, but it's more of an art than a science, often requiring experience and intuition to craft a successful prompt. The goal of this article is to help get you started with this learning process. It attempts to capture general concepts and patterns that apply to all GPT models. However it's important to understand that each model behaves differently, so the learnings might not apply equally to all models.
+Prompt construction can be difficult. In practice, the prompt acts assist the model complete the desired task, but it's more of an art than a science, often requiring experience and intuition to craft a successful prompt. The goal of this article is to help get you started with this learning process. It attempts to capture general concepts and patterns that apply to all GPT models. However it's important to understand that each model behaves differently, so the learnings might not apply equally to all models.
 
 ## Basics
 
diff --git a/articles/ai-foundry/openai/concepts/use-your-data.md b/articles/ai-foundry/openai/concepts/use-your-data.md
@@ -509,15 +509,10 @@ You can also change the model's output by defining a system message. For example
 
 Azure OpenAI On Your Data works by sending instructions to a large language model in the form of prompts to answer user queries using your data. If there is a certain behavior that is critical to the application, you can repeat the behavior in system message to increase its accuracy. For example, to guide the model to only answer from documents, you can add "*Please answer using retrieved documents only, and without using your knowledge. Please generate citations to retrieved documents for every claim in your answer. If the user question cannot be answered using retrieved documents, please explain the reasoning behind why documents are relevant to user queries. In any case, don't answer using your own knowledge."*
 
-**Prompt Engineering tricks**
-
-There are many tricks in prompt engineering that you can try to improve the output. One example is chain-of-thought prompting where you can add *"Let’s think step by step about information in retrieved documents to answer user queries. Extract relevant knowledge to user queries from documents step by step and form an answer bottom up from the extracted information from relevant documents."*
-
 > [!NOTE]
 > The system message is used to modify how GPT assistant responds to a user question based on retrieved documentation. It doesn't affect the retrieval process. If you'd like to provide instructions for the retrieval process, it is better to include them in the questions.
 > The system message is only guidance. The model might not adhere to every instruction specified because it has been primed with certain behaviors such as objectivity, and avoiding controversial statements. Unexpected behavior might occur if the system message contradicts with these behaviors. 
 
-
 ### Limit responses to your data 
 
 This option encourages the model to respond using your data only, and is selected by default. If you unselect this option, the model might more readily apply its internal knowledge to respond. Determine the correct selection based on your use case and scenario. 
diff --git a/articles/ai-foundry/openai/how-to/reasoning.md b/articles/ai-foundry/openai/how-to/reasoning.md
@@ -443,8 +443,9 @@ Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");
 
 When using the latest reasoning models with the [Responses API](./responses.md) you can use the reasoning summary parameter to receive summaries of the model's chain of thought reasoning. 
 
-> [!NOTE]
-> Even when enabled, reasoning summaries are not guaranteed to be generated for every step/request. This is expected behavior.
+> [!IMPORTANT]
+> Attempting to extract raw reasoning through methods other than the reasoning summary parameter are not supported, may violate the Acceptable Use Policy, and may result in throttling or suspension when detected.
+
 
 # [Python](#tab/py)
 
@@ -570,6 +571,9 @@ curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses" \
 }
 ```
 
+> [!NOTE]
+> Even when enabled, reasoning summaries are not guaranteed to be generated for every step/request. This is expected behavior.
+
 ## Python lark
 
 GPT-5 series reasoning models have the ability to call a new `custom_tool` called `lark_tool`. This tool is based on [Python lark](https://github.com/lark-parser/lark) and can be used for more flexible constraining of model output.
diff --git a/articles/ai-foundry/openai/includes/prompt-chat-completion.md b/articles/ai-foundry/openai/includes/prompt-chat-completion.md
@@ -98,6 +98,10 @@ One simple way to use an affordance is to stop generation once the affordance ca
 
 ## Chain of thought prompting
 
+> [!IMPORTANT]
+> This technique is only applicable non-reasoning models. Attempting to extract model reasoning through methods other than the reasoning summary parameter are not supported, may violate the Acceptable Use Policy, and may result in throttling or suspension when detected.
+
+
 This is a variation on the **break the task down** technique. Instead of splitting a task into smaller steps, in this approach, the model response is instructed to proceed step-by-step and present all the steps involved. Doing so reduces the possibility of inaccuracy of outcomes and makes assessing the model response easier.
 
 | System message |User     | Assistant  |