You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/evaluation-approach-gen-ai.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,17 +57,17 @@ The pre-production stage acts as a final quality check, reducing the risk of dep
57
57
58
58
- Bring your own data: You can evaluate your AI applications in pre-production using your own evaluation data with Azure AI Foundry or [Azure AI Evaluation SDK’s](../how-to/develop/evaluate-sdk.md) supported evaluators, including [generation quality, safety,](./evaluation-metrics-built-in.md) or [custom evaluators](../how-to/develop/evaluate-sdk.md#custom-evaluators), and [view results via the Azure AI Foundry portal](../how-to/evaluate-results.md).
59
59
- Simulators and AI red teaming agent (preview): If you don’t have evaluation data (test data), Azure AI [Evaluation SDK’s simulators](..//how-to/develop/simulator-interaction-data.md) can help by generating topic-related or adversarial queries. These simulators test the model’s response to situation-appropriate or attack-like queries (edge cases).
60
-
-The [adversarial simulator](../how-to/develop/simulator-interaction-data.md#generate-adversarial-simulations-for-safety-evaluation) injects queries that mimic potential safety risks or security attacks such as or attempt jailbreaks, helping identify limitations and preparing the model for unexpected conditions.
60
+
-[Adversarial simulators](../how-to/develop/simulator-interaction-data.md#generate-adversarial-simulations-for-safety-evaluation) injects static queries that mimic potential safety risks or security attacks such as or attempt jailbreaks, helping identify limitations and preparing the model for unexpected conditions.
61
61
-[Context-appropriate simulators](../how-to/develop/simulator-interaction-data.md#generate-synthetic-data-and-simulate-non-adversarial-tasks) generate typical, relevant conversations you’d expect from users to test quality of responses. With context-appropriate simulators you can assess metrics such as groundedness, relevance, coherence, and fluency of generated responses.
62
-
-[AI red teaming agent](../how-to/develop/run-scans-ai-red-teaming-agent.md) (preview) simulates adversarial attacks against your proactively stress-test models and applications against broad range of safety and security attacks using Microsoft’s open framework for Python Risk Identification Tool or [PyRIT](https://github.com/Azure/PyRIT). Automated scans using the AI red teaming agent enhances pre-production risk assessment by systematically testing AI applications for vulnerabilities. This process involves simulated attack scenarios to identify weaknesses in model responses before real-world deployment. By running AI red teaming scans, you can detect and mitigate potential security risks before deployment. This tool is recommended to be used in conjunction with human-in-the-loop processes such as conventional AI red teaming probing to help accelerate risk identification and aid in the assessment by a human expert.
62
+
-[AI red teaming agent](../how-to/develop/run-scans-ai-red-teaming-agent.md) (preview) simulates complex adversarial attacks against your AI system using a broad range of safety and security attacks using Microsoft’s open framework for Python Risk Identification Tool or [PyRIT](https://github.com/Azure/PyRIT). Automated scans using the AI red teaming agent enhances pre-production risk assessment by systematically testing AI applications for risks. This process involves simulated attack scenarios to identify weaknesses in model responses before real-world deployment. By running AI red teaming scans, you can detect and mitigate potential safety issues before deployment. This tool is recommended to be used in conjunction with human-in-the-loop processes such as conventional AI red teaming probing to help accelerate risk identification and aid in the assessment by a human expert.
63
63
64
-
Alternatively, you can also use [Azure AI Foundry’s evaluation widget](../how-to/evaluate-generative-ai-app.md) for testing your generative AI applications.
64
+
Alternatively, you can also use [Azure AI Foundry portal's evaluation widget](../how-to/evaluate-generative-ai-app.md) for testing your generative AI applications.
65
65
66
66
Once satisfactory results are achieved, the AI application can be deployed to production.
67
67
68
68
## Post-production monitoring
69
69
70
-
After deployment, the AI application enters the post-production evaluation phase, also known as online evaluation or monitoring. At this stage, the model is embedded within a real-world product and responds to actual user queries. Monitoring ensures that the model continues to behave as expected and adapts to any changes in user behavior or content.
70
+
After deployment, the AI application enters the post-production evaluation phase, also known as online evaluation or monitoring. At this stage, the model is embedded within a real-world product and responds to actual user queries in production. Monitoring ensures that the model continues to behave as expected and adapts to any changes in user behavior or content.
71
71
72
72
-**Ongoing performance tracking**: Regularly measuring AI application’s response using key metrics to ensure consistent output quality.
73
73
-**Incident response**: Quickly responding to any harmful, unfair, or inappropriate outputs that might arise during real-world use.
0 commit comments