Skip to content

Commit 68fa117

Browse files
committed
safety evaluation in a multi-modal scenario (text + images)
1 parent 0098da1 commit 68fa117

File tree

4 files changed

+11
-1
lines changed

4 files changed

+11
-1
lines changed

articles/ai-studio/how-to/evaluate-results.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.custom:
99
- build-2024
1010
- ignite-2024
1111
ms.topic: how-to
12-
ms.date: 11/19/2024
12+
ms.date: 12/18/2024
1313
ms.reviewer: wenxwei
1414
ms.author: lagayhar
1515
author: lgayhardt
@@ -92,6 +92,16 @@ For multi-turn conversation scenario, you can select “View evaluation results
9292

9393
:::image type="content" source="../media/evaluations/view-results/metric-per-turn.png" alt-text="Screenshot of evaluation results per turn." lightbox="../media/evaluations/view-results/metric-per-turn.png":::
9494

95+
For a safety evaluation in a multi-modal scenario (text + images), you can review the images from both the input and output in the detailed metrics result table to better understand the evaluation result. Since multi-modal evaluation is currently supported only for conversation scenarios, you can select "View evaluation results per turn" to examine the input and output for each turn.
96+
97+
:::image type="content" source="../media/evaluations/view-results/image-detail-table.png" alt-text="Screenshot of detailed metrics results." lightbox="../media/evaluations/view-results/image-detail-table.png":::
98+
99+
:::image type="content" source="../media/evaluations/view-results/image-per-turn-pop-up.png" alt-text="Screenshot of the image popup from conversation column." lightbox="../media/evaluations/view-results/image-per-turn-pop-up.png":::
100+
101+
Select the image to expand and view it. By default, all images are blurred to protect you from potentially harmful content. To view the image clearly, turn on the "Check Blur Image" toggle.
102+
103+
:::image type="content" source="../media/evaluations/view-results/image-check-blur-image.png" alt-text="Screenshot of blurred image that shows the check blue image toggle." lightbox="../media/evaluations/view-results/image-check-blur-image.png":::
104+
95105
For risk and safety metrics, the evaluation provides a severity score and reasoning for each score. Here are some examples of risk and safety metrics results for the question answering scenario:
96106

97107
:::image type="content" source="../media/evaluations/view-results/risk-safety-metric-example.png" alt-text="Screenshot of risk and safety metrics results for question answering scenario." lightbox="../media/evaluations/view-results/risk-safety-metric-example.png":::
402 KB
Loading
105 KB
Loading
163 KB
Loading

0 commit comments

Comments
 (0)