You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<sup>1</sup>`codex-mini` is a fine-tuned version of `o4-mini` specifically for use in Codex CLI. For more information, please see [OpenAI's documentation](https://platform.openai.com/docs/models/codex-mini-latest).
@@ -75,6 +76,7 @@ Learn more about the training and modeling techniques in OpenAI's [GPT-3](https:
75
76
| Agentic AI systems | Autonomous AI systems that sense and act upon their environment to achieve goals. |
76
77
| Autonomy | The ability to independently execute actions and exercise control over system behavior with limited or no direct human supervision. |
77
78
| Computer Use tool | A tool that when used with the Computer Use model captures mouse and keyboard actions generated by the mode and directly translates them into executable commands. This makes it possible for developers to automate computer use tasks. |
79
+
| Deep research | A fine-tuned version of the o-series reasoning models that is designed for deep research tasks. It takes a high-level query and returns a structured, citation-rich report by leveraging an agentic model capable of decomposing the task, performing web searches, and synthesizing results. |
78
80
79
81
#### [Vision models](#tab/image)
80
82
@@ -183,7 +185,7 @@ A:
183
185
184
186
**Chain-of-thought** : Azure OpenAI's o-series reasoning models have new advanced reasoning capabilities using chain-of-thought (CoT) techniques. CoT techniques generate intermediate reasoning steps before providing a response, enabling them to address more complex challenges through step-by-step problem solving. o1 demonstrates improvements in benchmarks for reasoning-heavy domains such as research, strategy, science, coding and math, among others. These models have safety improvements from advanced reasoning capabilities, with the ability to reason through and apply safety rules more effectively. This results in better performance alongside safety benchmarks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks.
185
187
186
-
For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
188
+
For greater detail on this family of models’ capabilities, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/), and [Deep Research System Card](https://openai.com/index/deep-research-system-card/).
187
189
188
190
**Azure OpenAI Evaluation**
189
191
@@ -341,6 +343,16 @@ The advanced reasoning capabilities of the o-series reasoning models may be best
341
343
342
344
For greater detail on intended uses, visit the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
343
345
346
+
#### Deep research use cases
347
+
348
+
Deep research models are fine-tuned versions of the o-series reasoning models that are designed to take a high-level query and return a structured, citation-rich report. The models create subqueries and gather information from web searches in several iterations before returning a final response. Use cases could include:
349
+
-**Complex research & literature review**: Synthesizing findings across hundreds of papers, identifying gaps or contradictions in research, proposing novel hypotheses or research directions.
350
+
-**Scientific discovery & hypothesis generation**: Exploring connections between findings across disciplines, generating testable hypotheses or experimental designs, assisting in interpretation of raw experimental data
351
+
-**Advanced technical problem solving**: Debugging complex systems (for example, distributed software, robotics), designing novel algorithms or architectures, and solving advanced math or physics problems.
352
+
-**Augmenting long-term planning**: Helping executives or researchers plan 10-year technology roadmaps, modeling long-range scenarios in AI safety, biosecurity, or climate, evaluating second- and third-order effects of decisions.
353
+
354
+
Deep research models are available as a tool in the [Azure AI Agents](/azure/ai-foundry/agents/how-to/tools/deep-research) service. For greater detail on intended uses, see the [OpenAI Deep Research System Card](https://openai.com/index/deep-research-system-card/).
355
+
344
356
#### Azure OpenAI evaluation use cases
345
357
346
358
Azure OpenAI evaluation is a text-only feature and can't be used with models that support non-text inputs. Evals can be used in multiple scenarios including but not limited to:
@@ -373,7 +385,7 @@ The capabilities of Computer Use are best suited for developing agentic AI syste
373
385
374
386
### Considerations when choosing a use case
375
387
376
-
We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
388
+
We encourage customers to use the Azure OpenAI GPT-4, o-series, GPT-3, Codex, and Computer Use models in their innovative solutions or applications as approved in their [Limited Access registration form](/azure/ai-foundry/responsible-ai/openai/limited-access). However, here are some considerations when choosing a use case:
377
389
378
390
-**Not suitable for open-ended, unconstrained content generation.** Scenarios where users can generate content on any topic are more likely to produce offensive or harmful text. The same is true of longer generations.
379
391
-**Not suitable for scenarios where up-to-date, factually accurate information is crucial** unless you have human reviewers or are using the models to search your own documents and have verified suitability for your scenario. The service doesn't have information about events that occur after its training date, likely has missing knowledge about some topics, and may not always produce factually accurate information.
@@ -386,21 +398,17 @@ We encourage customers to use the Azure OpenAI GPT-4, GPT-3, Codex, and Computer
When choosing a use case for Computer Use, users should factor in the following considerations in addition to those listed above:
389
-
390
401
- Avoid scenarios where actions are irreversible or highly consequential: These include, but are not limited to, the ability to send an email (such as to the wrong recipient), ability to modify or delete files that are important to you, ability to make financial transactions or directly interacting with outside services, sharing sensitive information publicly, granting access to critical systems, or executing commands that could alter system functionality or security.
391
-
392
402
- Degradation of performance on advanced uses: Computer Use is best suited for use cases around completing tasks with GUIs, such as accessing websites and computer desktops. It may not perform well doing more advanced tasks such as editing code, writing extensive text, and making complex decisions.
393
-
394
403
- Ensure adequate human oversight and control. Consider including controls to help users verify, review and/or approve actions in a timely manner, which may include reviewing planned tasks or calls to external data sources, for example, as appropriate for your system. Consider including controls for adequate user remediation of system failures, particularly in high-risk scenarios and use cases.
395
-
396
404
- Clearly define actions and associated requirements. Clearly defining which actions are allowed (action boundaries), prohibited, or need explicit authorization may help Computer Use operate as expected and with the appropriate level of human oversight.
397
-
398
-
- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
399
-
405
+
- Clearly define intended operating environments. Clearly define the intended operating environments (domain boundaries) where Computer Use is designed to perform effectively.
400
406
- Ensure appropriate intelligibility in decision making. Providing information to users before, during, and after actions are taken may help them understand action justification or why certain actions were taken or the application is behaving a certain way, where to intervene, and how to troubleshoot issues.
401
-
402
407
- For further information, consult the [Fostering appropriate reliance on Generative AI guide](/ai/playbook/technology-guidance/overreliance-on-ai/overreliance-on-ai).
403
-
-
408
+
409
+
When choosing a use case for deep research, users should factor in the following considerations in addition to those listed above:
410
+
- Check citations for copyright: Deep research models conduct web searches to when preparing their responses, and it's possible they return information from copyrighted materials. Check the source citations (included automatically) of the information you plan to use.
411
+
404
412
#### [Vision models](#tab/image)
405
413
406
414
### Intended use cases
@@ -549,7 +557,7 @@ To help mitigate the risks associated with advanced fine-tuned models, we have i
549
557
550
558
- o-series reasoning models are best suited for use cases that involve heavy reasoning and may not perform well on some natural language tasks such as personal or creative writing when compared to earlier AOAI models.
551
559
- The new reasoning capabilities may increase certain types of risks, requiring refined methods and approaches towards risk management protocols and evaluating and monitoring system behavior. For example, o1's CoT reasoning capabilities have demonstrated improvements in persuasiveness, and simple in-context scheming.
552
-
- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
560
+
- Users may experience that the reasoning family of models takes more time to reason through responses and should account for the additional time and latency in developing applications.
553
561
554
562
For greater detail on these limitations, see the [OpenAI o1 System Card](https://cdn.openai.com/o1-system-card-20241205.pdf), [o3-mini System Card](https://openai.com/index/o3-mini-system-card/), and [o3/o4-mini System Card](https://openai.com/index/o3-o4-mini-system-card/).
0 commit comments