Skip to content

Commit c014ad1

Browse files
Minor edits.
1 parent f26bde6 commit c014ad1

File tree

1 file changed

+13
-10
lines changed

1 file changed

+13
-10
lines changed

articles/ai-foundry/concepts/observability.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ ms.custom:
1313
- build-2025
1414
---
1515

16-
# Observability in generative AI with Azure AI Foundry
16+
# Observability in generative AI
1717

1818
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
1919

20-
In today's AI-driven world, Generative AI Operations (GenAIOps) is revolutionizing how organizations build and deploy intelligent systems. As companies increasingly use AI to transform decision-making, enhance customer experiences, and fuel innovation, one element stands paramount: robust evaluation frameworks. Evaluation isn't just a checkpoint. It's the foundation of trust in AI applications. Without rigorous assessment, AI systems can produce content that's:
20+
In today's AI-driven world, Generative AI Operations (GenAIOps) is revolutionizing how organizations build and deploy intelligent systems. As companies increasingly use AI to transform decision making, enhance customer experiences, and fuel innovation, one element stands paramount: robust evaluation frameworks. Evaluation isn't just a checkpoint. It's the foundation of trust in AI applications. Without rigorous assessment, AI systems can produce content that's:
2121

2222
- Fabricated or ungrounded in reality
2323
- Irrelevant or incoherent to user needs
@@ -115,7 +115,7 @@ GenAIOps uses the following three stages.
115115

116116
### Base model selection
117117

118-
Before building your application, you need to select the right foundation. This initial evaluation helps you compare different models based on:
118+
Before you build your application, select the right foundation. This initial evaluation helps you compare different models based on:
119119

120120
- Quality and accuracy: How relevant and coherent are the model's responses?
121121
- Task performance: Does the model handle your specific use cases efficiently?
@@ -126,7 +126,7 @@ Before building your application, you need to select the right foundation. This
126126

127127
### Pre-production evaluation
128128

129-
After you select a base model, the next step is to develop an AI application, such as an AI-powered chatbot, a retrieval-augmented generation (RAG) application, an agentic AI application, or any other generative AI tool. When development is complete, *pre-production evaluation* begins. Before you deploy to a production environment, thorough testing is essential to ensure the model is ready for real-world use.
129+
After you select a base model, the next step is to develop an AI application, such as an AI-powered chatbot, a retrieval-augmented generation (RAG) application, an agentic AI application, or any other generative AI tool. When development is complete, *pre-production evaluation* begins. Before you deploy to a production environment, thorough testing is essential to ensure that the model is ready for real-world use.
130130

131131
Pre-production evaluation involves:
132132

@@ -141,27 +141,30 @@ The pre-production stage acts as a final quality check, reducing the risk of dep
141141

142142
Evaluation Tools and Approaches:
143143

144-
- Bring your own data: You can evaluate your AI applications in pre-production using your own evaluation data with supported evaluators, including generation quality, safety, or custom evaluators. View results by using the Azure AI Foundry portal. Use Azure AI Foundry’s evaluation wizard or [Azure AI Evaluation SDK’s](../how-to/develop/evaluate-sdk.md) supported evaluators, including generation quality, safety, or [custom evaluators](./evaluation-evaluators/custom-evaluators.md). [View results by using the Azure AI Foundry portal](../how-to/evaluate-results.md).
145-
- Simulators and AI red teaming agent (preview): If you don’t have evaluation data (test data), [Azure AI Evaluation SDK’s simulators](..//how-to/develop/simulator-interaction-data.md) can help by generating topic-related or adversarial queries. These simulators test the model’s response to situation-appropriate or attack-like queries (edge cases).
144+
- **Bring your own data**: You can evaluate your AI applications in pre-production using your own evaluation data with supported evaluators, including generation quality, safety, or custom evaluators. View results by using the Azure AI Foundry portal.
145+
146+
Use Azure AI Foundry’s evaluation wizard or [Azure AI Evaluation SDK’s](../how-to/develop/evaluate-sdk.md) supported evaluators, including generation quality, safety, or [custom evaluators](./evaluation-evaluators/custom-evaluators.md). [View results by using the Azure AI Foundry portal](../how-to/evaluate-results.md).
147+
148+
- **Simulators and AI red teaming agent (preview)**: If you don’t have evaluation data or test data, [Azure AI Evaluation SDK’s simulators](..//how-to/develop/simulator-interaction-data.md) can help by generating topic-related or adversarial queries. These simulators test the model’s response to situation-appropriate or attack-like queries (edge cases).
146149

147150
- [Adversarial simulators](../how-to/develop/simulator-interaction-data.md#generate-adversarial-simulations-for-safety-evaluation) inject static queries that mimic potential safety risks or security attacks or attempted jailbreaks. The simulators help identify limitations to prepare the model for unexpected conditions.
148151
- [Context-appropriate simulators](../how-to/develop/simulator-interaction-data.md#generate-synthetic-data-and-simulate-non-adversarial-tasks) generate typical, relevant conversations you might expect from users to test quality of responses. With context-appropriate simulators, you can assess metrics such as groundedness, relevance, coherence, and fluency of generated responses.
149152
- [AI red teaming agent (preview)](../how-to/develop/run-scans-ai-red-teaming-agent.md) simulates complex adversarial attacks against your AI system using a broad range of safety and security attacks. It uses Microsoft’s open framework for Python Risk Identification Tool (PyRIT).
150153

151154
Automated scans using the AI red teaming agent enhance pre-production risk assessment by systematically testing AI applications for risks. This process involves simulated attack scenarios to identify weaknesses in model responses before real-world deployment.
152155

153-
By running AI red teaming scans, you can detect and mitigate potential safety issues before deployment. We recommend this tool to be used with human-in-the-loop processes such as conventional AI red teaming probing to help accelerate risk identification and aid in the assessment by a human expert.
156+
By running AI red teaming scans, you can detect and mitigate potential safety issues before deployment. We recommend that you use this tool along with human-in-the-loop processes, such as conventional AI red teaming probing, to help accelerate risk identification and aid in the assessment by a human expert.
154157

155158
Alternatively, you can also use [evaluation functionality](../how-to/evaluate-generative-ai-app.md) in the Azure AI Foundry portal for testing your generative AI applications.
156159

157-
After you achieve satisfactory results, you can deploy the AI application to production.
160+
After you get satisfactory results, you can deploy the AI application to production.
158161

159162
### Post-production monitoring
160163

161164
After deployment, continuous monitoring ensures your AI application maintains quality in real-world conditions.
162165

163-
- Performance tracking: Regular measurement of key metrics.
164-
- Incident response: Swift action when harmful or inappropriate outputs occur.
166+
- **Performance tracking**: Regular measurement of key metrics.
167+
- **Incident response**: Swift action when harmful or inappropriate outputs occur.
165168

166169
Effective monitoring helps maintain user trust and allows for rapid issue resolution.
167170

0 commit comments

Comments
 (0)