Merge pull request #5952 from PatrickFarley/openai-updates

prmerger-automator[bot] · web-flow · commit 131f350bbad2 · 2025-07-29T16:00:50.000Z
OpenAI updates
diff --git a/articles/ai-foundry/responsible-ai/agents/transparency-note.md b/articles/ai-foundry/responsible-ai/agents/transparency-note.md
@@ -108,6 +108,7 @@ Developers can connect an Agent to external systems, APIs, and services through
 * **Azure Functions** (a tool that enables an Agent to execute serverless code for synchronous, asynchronous, long-running, and event-driven actions)
 * **OpenAPI 3.0 specified tools** (a custom function defined with OpenAPI 3.0 specification to connect an Agent to external OpenAPI-based APIs securely)
 * **Model Context Protocol tools** (a custom service connected via Model Context Protocol through an existing remote MCP server to an Agent). 
+* **Deep Research tool**: (a tool that enables multi-step web-based research with the o3-deep-research model and Grounding with Bing Search.).
 
 #### Orchestrating multi-agent systems
 
@@ -128,6 +129,7 @@ Azure AI Agent Service is **flexible and use-case agnostic.** This presents mult
 * **Government: Citizen Request Triage and Community Event Coordination:** A city clerk uses an agent to categorize incoming service requests (for example, pothole repairs), assign them to the right departments, and compile simple status updates; officials review and finalize communications to maintain transparency and accuracy.
 * **Education: Assisting with Research and Reference Gathering:** A teacher relies on an agent to gather age-appropriate articles and resources from reputable sources for a planetary science lesson; the teacher verifies the materials for factual accuracy and adjusts them to fit the curriculum, ensuring students receive trustworthy content.
 * **Manufacturing: Inventory Oversight and Task Scheduling:** A factory supervisor deploys an agent to monitor inventory levels, schedule restocking when supplies run low, and optimize shift rosters; management confirms the agent’s suggestions and retains final decision-making authority.
+* **Deep research**: See the deep research section of the [Azure OpenAI transparency note](../openai/transparency-note.md#deep-research-use-cases) for examples of use cases for the deep research tool.
 
 Agent code samples have specific intended uses that are configurable by developers to carefully build upon, implement, and deploy agents. See [list of Agent code samples](/azure/ai-foundry/agents/overview#agent-catalog).
 
diff --git a/articles/ai-foundry/responsible-ai/openai/transparency-note.md b/articles/ai-foundry/responsible-ai/openai/transparency-note.md
@@ -41,6 +41,7 @@ Azure OpenAI provides customers with a fully managed AI service that lets develo
 | o3/o3-pro | ✅ | ✅| | 
 | o3-mini |✅  |  |  |
 | o4-mini/codex-mini<sup>1</sup> | ✅| ✅| |
+| o3-deep-research <br>o4-mini-deep-research| ✅ |  |  |
 | computer-use-preview |✅  | ✅ |  |
 
 <sup>1</sup>`codex-mini` is a fine-tuned version of `o4-mini` specifically for use in Codex CLI. For more information, please see [OpenAI's documentation](https://platform.openai.com/docs/models/codex-mini-latest). 
@@ -75,6 +76,7 @@ Learn more about the training and modeling techniques in OpenAI's [GPT-3](https:
 | Agentic AI systems | Autonomous AI systems that sense and act upon their environment to achieve goals. |
 | Autonomy  | The ability to independently execute actions and exercise control over system behavior with limited or no direct human supervision. |
 | Computer Use tool | A tool that when used with the Computer Use model captures mouse and keyboard actions generated by the mode and directly translates them into executable commands. This makes it possible for developers to automate computer use tasks. |
+| Deep research | A fine-tuned version of the o-series reasoning models that is designed for deep research tasks. It takes a high-level query and returns a structured, citation-rich report by leveraging an agentic model capable of decomposing the task, performing web searches, and synthesizing results. |
 
 #### [Vision models](#tab/image)
 
@@ -183,7 +185,7 @@ A:
 
 **Chain-of-thought** : Azure OpenAI's o-series reasoning models have new advanced reasoning capabilities using chain-of-thought (CoT) techniques. CoT techniques generate intermediate reasoning steps before providing a response, enabling them to address more complex challenges through step-by-step problem solving. o1 demonstrates improvements in benchmarks for reasoning-heavy domains such as research, strategy, science, coding and math, among others. These models have safety improvements from advanced reasoning capabilities, with the ability to reason through and apply safety rules more effectively. This results in better performance alongside safety benchmarks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. 
 
-For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
+For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/), and [Deep Research System Card](https://openai.com/index/deep-research-system-card/).
 
 **Azure OpenAI Evaluation** 
 
@@ -341,6 +343,16 @@ The advanced reasoning capabilities of the o-series reasoning models may be best
 
 For greater detail on intended uses, visit the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
 
+#### Deep research use cases
+
+Deep research models are fine-tuned versions of the o-series reasoning models that are designed to take a high-level query and return a structured, citation-rich report. The models create subqueries and gather information from web searches in several iterations before returning a final response. Use cases could include the following, with adequate human oversight:
+- **Complex research & literature review**: Synthesizing findings across hundreds of papers, identifying gaps or contradictions in research, proposing novel hypotheses or research directions.
+- **Scientific discovery & hypothesis generation**: Exploring connections between findings across disciplines, generating testable hypotheses or experimental designs, assisting in interpretation of raw experimental data.
+- **Advanced technical problem solving**: Debugging complex systems (for example, distributed software, robotics), designing novel algorithms or architectures, and solving advanced math or physics problems.
+- **Augmenting long-term planning**: Helping executives or researchers plan 10-year technology roadmaps, modeling long-range scenarios in AI safety, biosecurity, or climate, evaluating second- and third-order effects of decisions.
+
+Deep research models are available as a tool in the [Azure AI Agents](/azure/ai-foundry/agents/how-to/tools/deep-research) service. For greater detail on intended uses, see the [OpenAI Deep Research System Card](https://openai.com/index/deep-research-system-card/).
+
 #### Azure OpenAI evaluation use cases
 
 Azure OpenAI evaluation is a text-only feature and can't be used with models that support non-text inputs. Evals can be used in multiple scenarios including but not limited to: 
@@ -373,7 +385,7 @@ The capabilities of Computer Use are best suited for developing agentic AI syste
 
 ### Considerations when choosing a use case
 
-We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
+We encourage customers to use the Azure OpenAI GPT-4, o-series, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
 
 - **Not suitable for open-ended, unconstrained content generation.**  Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations.
 - **Not suitable for scenarios where up-to-date, factually accurate information is crucial**  unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service doesn't have information about events that occur after its training date, likely has missing knowledge about some topics, and may not always produce factually accurate information.
@@ -386,21 +398,18 @@ We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer
 - [!INCLUDE [regulatory-considerations](../includes/regulatory-considerations.md)]
 
 When choosing a use case for Computer Use, users should factor in the following considerations in addition to those listed above: 
-
 - Avoid scenarios where actions are irreversible or highly consequential: These include, but are not limited to, the ability to send an email (such as to the wrong recipient), ability to modify or delete files that are important to you, ability to make financial transactions or directly interacting with outside services, sharing sensitive information publicly, granting access to critical systems, or executing commands that could alter system functionality or security. 
-
 - Degradation of performance on advanced uses: Computer Use is best suited for use cases around completing tasks with GUIs, such as accessing websites and computer desktops. It may not perform well doing more advanced tasks such as editing code, writing extensive text, and making complex decisions. 
-
 - Ensure adequate human oversight and control. Consider including controls to help users verify, review and/or approve actions in a timely manner, which may include reviewing planned tasks or calls to external data sources, for example, as appropriate for your system. Consider including controls for adequate user remediation of system failures, particularly in high-risk scenarios and use cases. 
-
 - Clearly define actions and associated requirements. Clearly defining which actions are allowed (action boundaries), prohibited, or need explicit authorization may help  Computer Use operate as expected and with the appropriate level of human oversight. 
-
-- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively. 
-
+- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
 - Ensure appropriate intelligibility in decision making. Providing information to users before, during, and after actions are taken may help them understand action justification or why certain actions were taken or the application is behaving a certain way, where to intervene, and how to troubleshoot issues. 
-
 - For further information, consult the [Fostering appropriate reliance on Generative AI guide](/ai/playbook/technology-guidance/overreliance-on-ai/overreliance-on-ai). 
-- 
+
+When choosing a use case for deep research, users should factor in the following considerations in addition to those listed above: 
+- **Ensure adequate human oversight and control**: Provide mechanisms to help ensure that users review deep research reports and validate cited sources and content.
+- **Check citations for copyrighted content**: The deep research tool conducts web searches when preparing responses, and copyrighted materials may be cited. Check the source citations included in the report, and ensure you use and attribute copyrighted material appropriately.
+
 #### [Vision models](#tab/image)
 
 ### Intended use cases
@@ -549,7 +558,7 @@ To help mitigate the risks associated with advanced fine-tuned models, we have i
 
 - o-series reasoning models are best suited for use cases that involve heavy reasoning and may not perform well on some natural language tasks such as personal or creative writing when compared to earlier AOAI models. 
 - The new reasoning capabilities may increase certain types of risks, requiring refined methods and approaches towards risk management protocols and evaluating and monitoring system behavior. For example, o1's CoT reasoning capabilities have demonstrated improvements in persuasiveness, and simple in-context scheming.  
-- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications. 
+- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
 
 For greater detail on these limitations, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).