You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/evaluate-generative-ai-app.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,9 +33,9 @@ An evaluation run allows you to generate metric outputs for each data row in you
33
33
34
34
### From the evaluate page
35
35
36
-
From the collapsible left menu, select **Evaluation** > **+ New evaluation**.
36
+
From the collapsible left menu, select **Evaluation** > **+ Create a new evaluation**.
37
37
38
-
:::image type="content" source="../media/evaluations/evaluate/evaluation-list-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/evaluation-list-new-evaluation.png":::
38
+
:::image type="content" source="../media/evaluations/evaluate/create-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/create-new-evaluation.png":::
39
39
40
40
### From the model catalog page
41
41
@@ -45,9 +45,9 @@ From the collapsible left menu, select **Model catalog** > go to specific model
45
45
46
46
### From the flow page
47
47
48
-
From the collapsible left menu, select **Prompt flow** > **Evaluate** > **Built-in evaluation**.
48
+
From the collapsible left menu, select **Prompt flow** > **Evaluate** > **Automated evaluation**.
49
49
50
-
:::image type="content" source="../media/evaluations/evaluate/new-evaluation-flow-page.png" alt-text="Screenshot of how to select builtin evaluation." lightbox="../media/evaluations/evaluate/new-evaluation-flow-page.png":::
50
+
:::image type="content" source="../media/evaluations/evaluate/automated-evaluation.png" alt-text="Screenshot of how to select builtin evaluation." lightbox="../media/evaluations/evaluate/automated-evaluation.png":::
51
51
52
52
#### Evaluation target
53
53
@@ -97,6 +97,8 @@ We support three types of metrics curated by Microsoft to facilitate a comprehen
97
97
- AI quality (NLP): These NLP metrics are mathematical based, and they also evaluate the overall quality of the generated content. They often require ground truth data, but they don't require model deployment as judge.
98
98
- Risk and safety metrics: These metrics focus on identifying potential content risks and ensuring the safety of the generated content.
99
99
100
+
:::image type="content" source="../media/evaluations/evaluate/select-metric-category.png" alt-text="Screenshot of the Choose what you'd like to evaluate with AI quality and safety selected." lightbox="../media/evaluations/evaluate/select-metric-category.png":::
101
+
100
102
You can refer to the table for the complete list of metrics we offer support for in each scenario. For more in-depth information on each metric definition and how it's calculated, see [Evaluation and monitoring metrics](../concepts/evaluation-metrics-built-in.md).
101
103
102
104
| AI quality (AI assisted) | AI quality (NLP) | Risk and safety metrics |
@@ -105,11 +107,11 @@ You can refer to the table for the complete list of metrics we offer support for
105
107
106
108
When running AI assisted quality evaluation, you must specify a GPT model for the calculation process. Choose an Azure OpenAI connection and a deployment with either GPT-3.5, GPT-4, or the Davinci model for our calculations.
107
109
108
-
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png" alt-text="Screenshot of the AI quality (AI assisted) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png":::
110
+
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png" alt-text="Screenshot of the AI quality (AI assisted) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png":::
109
111
110
112
AI Quality (NLP) metrics are mathematically based measurements that assess your application's performance. They often require ground truth data for calculation. ROUGE is a family of metrics. You can select the ROUGE type to calculate the scores. Various types of ROUGE metrics offer ways to evaluate the quality of text generation. ROUGE-N measures the overlap of n-grams between the candidate and reference texts.
111
113
112
-
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png" alt-text="Screenshot of the AI quality (NLP) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png":::
114
+
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png" alt-text="Screenshot of the AI quality (NLP) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png":::
113
115
114
116
For risk and safety metrics, you don't need to provide a connection and deployment. The Azure AI Studio safety evaluations back-end service provisions a GPT-4 model that can generate content risk severity scores and reasoning to enable you to evaluate your application for content harms.
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/evaluate-results.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,11 +58,11 @@ We break down the aggregate views with different types of your metrics by AI Qua
58
58
- For AI Quality (NLP) metrics, we show histogram of the metric distribution between 0 and 1. We aggregate by calculating an average across all the scores for each metric.
59
59
:::image type="content" source="../media/evaluations/view-results/ai-quality-nlp-chart.png" alt-text="Screenshot of AI Quality (NLP) dashboard tab." lightbox="../media/evaluations/view-results/ai-quality-nlp-chart.png":::
60
60
- For custom metrics, you can select **Add custom chart**, to create a custom chart with your chosen metrics or to view a metric against selected input parameters.
61
-
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of create custom chart popup." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
61
+
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of create custom chart pop-up." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
62
62
63
63
You can also customize existing charts for built-in metrics by changing the chart type.
64
64
65
-
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of changing chart type." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
65
+
:::image type="content" source="../media/evaluations/view-results/change-chart-type.png" alt-text="Screenshot of changing chart type." lightbox="../media/evaluations/view-results/change-chart-type.png":::
0 commit comments