deep research tn updates

PatrickFarley · PatrickFarley · commit ad1fd8654d03 · 2025-07-09T22:27:15.000-04:00
diff --git a/articles/ai-foundry/responsible-ai/openai/transparency-note.md b/articles/ai-foundry/responsible-ai/openai/transparency-note.md
@@ -41,6 +41,7 @@ Azure OpenAI provides customers with a fully managed AI service that lets develo
 | o3/o3-pro | ✅ | ✅| | 
 | o3-mini |✅  |  |  |
 | o4-mini/codex-mini<sup>1</sup> | ✅| ✅| |
+| o3-deep-research <br>o4-mini-deep-research| ✅ |  |  |
 | computer-use-preview |✅  | ✅ |  |
 
 <sup>1</sup>`codex-mini` is a fine-tuned version of `o4-mini` specifically for use in Codex CLI. For more information, please see [OpenAI's documentation](https://platform.openai.com/docs/models/codex-mini-latest). 
@@ -75,6 +76,7 @@ Learn more about the training and modeling techniques in OpenAI's [GPT-3](https:
 | Agentic AI systems | Autonomous AI systems that sense and act upon their environment to achieve goals. |
 | Autonomy  | The ability to independently execute actions and exercise control over system behavior with limited or no direct human supervision. |
 | Computer Use tool | A tool that when used with the Computer Use model captures mouse and keyboard actions generated by the mode and directly translates them into executable commands. This makes it possible for developers to automate computer use tasks. |
+| Deep research | A fine-tuned version of the o-series reasoning models that is designed for deep research tasks. It takes a high-level query and returns a structured, citation-rich report by leveraging an agentic model capable of decomposing the task, performing web searches, and synthesizing results. |
 
 #### [Vision models](#tab/image)
 
@@ -183,7 +185,7 @@ A:
 
 **Chain-of-thought** : Azure OpenAI's o-series reasoning models have new advanced reasoning capabilities using chain-of-thought (CoT) techniques. CoT techniques generate intermediate reasoning steps before providing a response, enabling them to address more complex challenges through step-by-step problem solving. o1 demonstrates improvements in benchmarks for reasoning-heavy domains such as research, strategy, science, coding and math, among others. These models have safety improvements from advanced reasoning capabilities, with the ability to reason through and apply safety rules more effectively. This results in better performance alongside safety benchmarks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. 
 
-For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
+For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/), and [Deep Research System Card](https://openai.com/index/deep-research-system-card/).
 
 **Azure OpenAI Evaluation** 
 
@@ -341,6 +343,16 @@ The advanced reasoning capabilities of the o-series reasoning models may be best
 
 For greater detail on intended uses, visit the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
 
+#### Deep research use cases
+
+Deep research models are fine-tuned versions of the o-series reasoning models that are designed to take a high-level query and return a structured, citation-rich report. The models create subqueries and gather information from web searches in several iterations before returning a final response. Use cases could include:
+- **Complex research & literature review**: Synthesizing findings across hundreds of papers, identifying gaps or contradictions in research, proposing novel hypotheses or research directions.
+- **Scientific discovery & hypothesis generation**: Exploring connections between findings across disciplines, generating testable hypotheses or experimental designs, assisting in interpretation of raw experimental data
+- **Advanced technical problem solving**: Debugging complex systems (for example, distributed software, robotics), designing novel algorithms or architectures, and solving advanced math or physics problems.
+- **Augmenting long-term planning**: Helping executives or researchers plan 10-year technology roadmaps, modeling long-range scenarios in AI safety, biosecurity, or climate, evaluating second- and third-order effects of decisions.
+
+Deep research models are available as a tool in the [Azure AI Agents](/azure/ai-foundry/agents/how-to/tools/deep-research) service. For greater detail on intended uses, see the [OpenAI Deep Research System Card](https://openai.com/index/deep-research-system-card/).
+
 #### Azure OpenAI evaluation use cases
 
 Azure OpenAI evaluation is a text-only feature and can't be used with models that support non-text inputs. Evals can be used in multiple scenarios including but not limited to: 
@@ -373,7 +385,7 @@ The capabilities of Computer Use are best suited for developing agentic AI syste
 
 ### Considerations when choosing a use case
 
-We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
+We encourage customers to use the Azure OpenAI GPT-4, o-series, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
 
 - **Not suitable for open-ended, unconstrained content generation.**  Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations.
 - **Not suitable for scenarios where up-to-date, factually accurate information is crucial**  unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service doesn't have information about events that occur after its training date, likely has missing knowledge about some topics, and may not always produce factually accurate information.
@@ -386,21 +398,17 @@ We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer
 - [!INCLUDE [regulatory-considerations](../includes/regulatory-considerations.md)]
 
 When choosing a use case for Computer Use, users should factor in the following considerations in addition to those listed above: 
-
 - Avoid scenarios where actions are irreversible or highly consequential: These include, but are not limited to, the ability to send an email (such as to the wrong recipient), ability to modify or delete files that are important to you, ability to make financial transactions or directly interacting with outside services, sharing sensitive information publicly, granting access to critical systems, or executing commands that could alter system functionality or security. 
-
 - Degradation of performance on advanced uses: Computer Use is best suited for use cases around completing tasks with GUIs, such as accessing websites and computer desktops. It may not perform well doing more advanced tasks such as editing code, writing extensive text, and making complex decisions. 
-
 - Ensure adequate human oversight and control. Consider including controls to help users verify, review and/or approve actions in a timely manner, which may include reviewing planned tasks or calls to external data sources, for example, as appropriate for your system. Consider including controls for adequate user remediation of system failures, particularly in high-risk scenarios and use cases. 
-
 - Clearly define actions and associated requirements. Clearly defining which actions are allowed (action boundaries), prohibited, or need explicit authorization may help  Computer Use operate as expected and with the appropriate level of human oversight. 
-
-- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively. 
-
+- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
 - Ensure appropriate intelligibility in decision making. Providing information to users before, during, and after actions are taken may help them understand action justification or why certain actions were taken or the application is behaving a certain way, where to intervene, and how to troubleshoot issues. 
-
 - For further information, consult the [Fostering appropriate reliance on Generative AI guide](/ai/playbook/technology-guidance/overreliance-on-ai/overreliance-on-ai). 
-- 
+
+When choosing a use case for deep research, users should factor in the following considerations in addition to those listed above: 
+- Check citations for copyright: Deep research models conduct web searches to when preparing their responses, and it's possible they return information from copyrighted materials. Check the source citations (included automatically) of the information you plan to use.
+
 #### [Vision models](#tab/image)
 
 ### Intended use cases
@@ -549,7 +557,7 @@ To help mitigate the risks associated with advanced fine-tuned models, we have i
 
 - o-series reasoning models are best suited for use cases that involve heavy reasoning and may not perform well on some natural language tasks such as personal or creative writing when compared to earlier AOAI models. 
 - The new reasoning capabilities may increase certain types of risks, requiring refined methods and approaches towards risk management protocols and evaluating and monitoring system behavior. For example, o1's CoT reasoning capabilities have demonstrated improvements in persuasiveness, and simple in-context scheming.  
-- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications. 
+- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
 
 For greater detail on these limitations, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).