Skip to content

Commit 131f350

Browse files
Merge pull request #5952 from PatrickFarley/openai-updates
OpenAI updates
2 parents 9d1160d + d239d9a commit 131f350

File tree

2 files changed

+23
-12
lines changed

2 files changed

+23
-12
lines changed

articles/ai-foundry/responsible-ai/agents/transparency-note.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ Developers can connect an Agent to external systems, APIs, and services through
108108
* **Azure Functions** (a tool that enables an Agent to execute serverless code for synchronous, asynchronous, long-running, and event-driven actions)
109109
* **OpenAPI 3.0 specified tools** (a custom function defined with OpenAPI 3.0 specification to connect an Agent to external OpenAPI-based APIs securely)
110110
* **Model Context Protocol tools** (a custom service connected via Model Context Protocol through an existing remote MCP server to an Agent).
111+
* **Deep Research tool**: (a tool that enables multi-step web-based research with the o3-deep-research model and Grounding with Bing Search.).
111112

112113
#### Orchestrating multi-agent systems
113114

@@ -128,6 +129,7 @@ Azure AI Agent Service is **flexible and use-case agnostic.** This presents mult
128129
* **Government: Citizen Request Triage and Community Event Coordination:** A city clerk uses an agent to categorize incoming service requests (for example, pothole repairs), assign them to the right departments, and compile simple status updates; officials review and finalize communications to maintain transparency and accuracy.
129130
* **Education: Assisting with Research and Reference Gathering:** A teacher relies on an agent to gather age-appropriate articles and resources from reputable sources for a planetary science lesson; the teacher verifies the materials for factual accuracy and adjusts them to fit the curriculum, ensuring students receive trustworthy content.
130131
* **Manufacturing: Inventory Oversight and Task Scheduling:** A factory supervisor deploys an agent to monitor inventory levels, schedule restocking when supplies run low, and optimize shift rosters; management confirms the agent’s suggestions and retains final decision-making authority.
132+
* **Deep research**: See the deep research section of the [Azure OpenAI transparency note](../openai/transparency-note.md#deep-research-use-cases) for examples of use cases for the deep research tool.
131133

132134
Agent code samples have specific intended uses that are configurable by developers to carefully build upon, implement, and deploy agents. See [list of Agent code samples](/azure/ai-foundry/agents/overview#agent-catalog).
133135

articles/ai-foundry/responsible-ai/openai/transparency-note.md

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Azure OpenAI provides customers with a fully managed AI service that lets develo
4141
| o3/o3-pro ||| |
4242
| o3-mini || | |
4343
| o4-mini/codex-mini<sup>1</sup> ||| |
44+
| o3-deep-research <br>o4-mini-deep-research|| | |
4445
| computer-use-preview ||| |
4546

4647
<sup>1</sup>`codex-mini` is a fine-tuned version of `o4-mini` specifically for use in Codex CLI. For more information, please see [OpenAI's documentation](https://platform.openai.com/docs/models/codex-mini-latest).
@@ -75,6 +76,7 @@ Learn more about the training and modeling techniques in OpenAI's [GPT-3](https:
7576
| Agentic AI systems | Autonomous AI systems that sense and act upon their environment to achieve goals. |
7677
| Autonomy | The ability to independently execute actions and exercise control over system behavior with limited or no direct human supervision. |
7778
| Computer Use tool | A tool that when used with the Computer Use model captures mouse and keyboard actions generated by the mode and directly translates them into executable commands. This makes it possible for developers to automate computer use tasks. |
79+
| Deep research | A fine-tuned version of the o-series reasoning models that is designed for deep research tasks. It takes a high-level query and returns a structured, citation-rich report by leveraging an agentic model capable of decomposing the task, performing web searches, and synthesizing results. |
7880

7981
#### [Vision models](#tab/image)
8082

@@ -183,7 +185,7 @@ A:
183185

184186
**Chain-of-thought** : Azure OpenAI's o-series reasoning models have new advanced reasoning capabilities using chain-of-thought (CoT) techniques. CoT techniques generate intermediate reasoning steps before providing a response, enabling them to address more complex challenges through step-by-step problem solving. o1 demonstrates improvements in benchmarks for reasoning-heavy domains such as research, strategy, science, coding and math, among others. These models have safety improvements from advanced reasoning capabilities, with the ability to reason through and apply safety rules more effectively. This results in better performance alongside safety benchmarks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks.
185187

186-
For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
188+
For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/), and [Deep Research System Card](https://openai.com/index/deep-research-system-card/).
187189

188190
**Azure OpenAI Evaluation**
189191

@@ -341,6 +343,16 @@ The advanced reasoning capabilities of the o-series reasoning models may be best
341343

342344
For greater detail on intended uses, visit the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
343345

346+
#### Deep research use cases
347+
348+
Deep research models are fine-tuned versions of the o-series reasoning models that are designed to take a high-level query and return a structured, citation-rich report. The models create subqueries and gather information from web searches in several iterations before returning a final response. Use cases could include the following, with adequate human oversight:
349+
- **Complex research & literature review**: Synthesizing findings across hundreds of papers, identifying gaps or contradictions in research, proposing novel hypotheses or research directions.
350+
- **Scientific discovery & hypothesis generation**: Exploring connections between findings across disciplines, generating testable hypotheses or experimental designs, assisting in interpretation of raw experimental data.
351+
- **Advanced technical problem solving**: Debugging complex systems (for example, distributed software, robotics), designing novel algorithms or architectures, and solving advanced math or physics problems.
352+
- **Augmenting long-term planning**: Helping executives or researchers plan 10-year technology roadmaps, modeling long-range scenarios in AI safety, biosecurity, or climate, evaluating second- and third-order effects of decisions.
353+
354+
Deep research models are available as a tool in the [Azure AI Agents](/azure/ai-foundry/agents/how-to/tools/deep-research) service. For greater detail on intended uses, see the [OpenAI Deep Research System Card](https://openai.com/index/deep-research-system-card/).
355+
344356
#### Azure OpenAI evaluation use cases
345357

346358
Azure OpenAI evaluation is a text-only feature and can't be used with models that support non-text inputs. Evals can be used in multiple scenarios including but not limited to:
@@ -373,7 +385,7 @@ The capabilities of Computer Use are best suited for developing agentic AI syste
373385

374386
### Considerations when choosing a use case
375387

376-
We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
388+
We encourage customers to use the Azure OpenAI GPT-4, o-series, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
377389

378390
- **Not suitable for open-ended, unconstrained content generation.** Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations.
379391
- **Not suitable for scenarios where up-to-date, factually accurate information is crucial** unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service doesn't have information about events that occur after its training date, likely has missing knowledge about some topics, and may not always produce factually accurate information.
@@ -386,21 +398,18 @@ We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer
386398
- [!INCLUDE [regulatory-considerations](../includes/regulatory-considerations.md)]
387399

388400
When choosing a use case for Computer Use, users should factor in the following considerations in addition to those listed above:
389-
390401
- Avoid scenarios where actions are irreversible or highly consequential: These include, but are not limited to, the ability to send an email (such as to the wrong recipient), ability to modify or delete files that are important to you, ability to make financial transactions or directly interacting with outside services, sharing sensitive information publicly, granting access to critical systems, or executing commands that could alter system functionality or security.
391-
392402
- Degradation of performance on advanced uses: Computer Use is best suited for use cases around completing tasks with GUIs, such as accessing websites and computer desktops. It may not perform well doing more advanced tasks such as editing code, writing extensive text, and making complex decisions.
393-
394403
- Ensure adequate human oversight and control. Consider including controls to help users verify, review and/or approve actions in a timely manner, which may include reviewing planned tasks or calls to external data sources, for example, as appropriate for your system. Consider including controls for adequate user remediation of system failures, particularly in high-risk scenarios and use cases.
395-
396404
- Clearly define actions and associated requirements. Clearly defining which actions are allowed (action boundaries), prohibited, or need explicit authorization may help Computer Use operate as expected and with the appropriate level of human oversight.
397-
398-
- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
399-
405+
- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
400406
- Ensure appropriate intelligibility in decision making. Providing information to users before, during, and after actions are taken may help them understand action justification or why certain actions were taken or the application is behaving a certain way, where to intervene, and how to troubleshoot issues.
401-
402407
- For further information, consult the [Fostering appropriate reliance on Generative AI guide](/ai/playbook/technology-guidance/overreliance-on-ai/overreliance-on-ai).
403-
-
408+
409+
When choosing a use case for deep research, users should factor in the following considerations in addition to those listed above:
410+
- **Ensure adequate human oversight and control**: Provide mechanisms to help ensure that users review deep research reports and validate cited sources and content.
411+
- **Check citations for copyrighted content**: The deep research tool conducts web searches when preparing responses, and copyrighted materials may be cited. Check the source citations included in the report, and ensure you use and attribute copyrighted material appropriately.
412+
404413
#### [Vision models](#tab/image)
405414

406415
### Intended use cases
@@ -549,7 +558,7 @@ To help mitigate the risks associated with advanced fine-tuned models, we have i
549558

550559
- o-series reasoning models are best suited for use cases that involve heavy reasoning and may not perform well on some natural language tasks such as personal or creative writing when compared to earlier AOAI models.
551560
- The new reasoning capabilities may increase certain types of risks, requiring refined methods and approaches towards risk management protocols and evaluating and monitoring system behavior. For example, o1's CoT reasoning capabilities have demonstrated improvements in persuasiveness, and simple in-context scheming.
552-
- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
561+
- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
553562

554563
For greater detail on these limitations, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
555564

0 commit comments

Comments
 (0)